-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dist_optim] serialize compilation when creating dist_optim #45871
Conversation
Attempt to fix #45845 [ghstack-poisoned]
Attempt to fix #45845 [ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/wanchaol/136/base #45871 +/- ##
========================================================
- Coverage 68.32% 68.32% -0.01%
========================================================
Files 410 410
Lines 52992 52997 +5
========================================================
+ Hits 36208 36210 +2
- Misses 16784 16787 +3
Continue to review full report at Codecov.
|
Attempt to fix #45845 Differential Revision: [D24125209](https://our.internmc.facebook.com/intern/diff/D24125209) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Can you update the PR summary to include details of why the test was flaky and how this PR fixed it.
Do we need this to be cherry-picked into v1.7? |
Yes I will submit the cherry-pick PR soon :) |
Stack from ghstack:
This PR fixes #45845
The issue to the case is where request_callback spawn threads in the same process and when instantiating ScriptModule, it tries to compile concurrently. TorchScript compilation is not thread safe, so it results in the following failure:
This PR introduce a lock in ScriptLocalOptimizer to protect/serialize the compilation.
Differential Revision: D24125209