Fix segment fault for custom function #8331
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description: Fix segment fault for custom function.
In recent master, when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON, once the onnxruntime_training package is built and installed. If we run programs containing 'import onnxruntime', we will see a 'segment fault' errors printed in stdout. This bug seems to be there for a while, not sure why we did not see it earlier.
The call stack looks like this:
The reason is, singleton static instance of OrtTorchFunctionPool is destroyed after Python modules/functions are released.
There are two ways to fix it.
This PR is the a fix following the 1st approach. Please ignore the first and second commits (approach 2 and revert changes).
Motivation and Context