-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[dynamo] Relax guard introduced when tracing __call__
on user defined object
#152395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed object This relaxes the guard introduced in #100444 (which aggressively guard on the object id, despite Dynamo is just tracing its `__call__` method. This allows users to bypass the high compilation time issue in #150706 by compiling transformer blocks only. Without this patch, we'd get lots of unnecessary recompilation, as the block has difference attention processor instances. Compiling blocks only _significantly_ speeds up compilation process (from ~310s to ~32s), and even speeds up e2e performance for some reason (7.83s to 7.67s). [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152395
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 06a2da1 with merge base 6e5e9dc ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ed object This relaxes the guard introduced in #100444 (which aggressively guard on the object id, despite Dynamo is just tracing its `__call__` method. This allows users to bypass the high compilation time issue in #150706 by compiling transformer blocks only. Without this patch, we'd get lots of unnecessary recompilation, as the block has difference attention processor instances. Compiling blocks only _significantly_ speeds up compilation process (from ~310s to ~32s), and even speeds up e2e performance for some reason (7.83s to 7.67s). ghstack-source-id: d3208ca Pull Request resolved: #152395
if method is torch.nn.Module.__init__: | ||
method = unpatched_nn_module_init | ||
if source: | ||
install_guard(source.make_guard(GuardBuilder.FUNCTION_MATCH)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that this will add a lot of new guards (even though this might be the right thing to do). Maybe skip this one in this PR, and then follow up with another PR with some more benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme localize the fix to that one site instead, that should cover us in practice without introducing too many new guards.
… user defined object" This relaxes the guard introduced in #100444 (which aggressively guard on the object id, despite Dynamo is just tracing its `__call__` method. This allows users to bypass the high compilation time issue in #150706 by compiling transformer blocks only. Without this patch, we'd get lots of unnecessary recompilation, as the block has difference attention processor instances. Compiling blocks only _significantly_ speeds up compilation process (from ~310s to ~32s), and even speeds up e2e performance for some reason (7.83s to 7.67s). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…ed object This relaxes the guard introduced in #100444 (which aggressively guard on the object id, despite Dynamo is just tracing its `__call__` method. This allows users to bypass the high compilation time issue in #150706 by compiling transformer blocks only. Without this patch, we'd get lots of unnecessary recompilation, as the block has difference attention processor instances. Compiling blocks only _significantly_ speeds up compilation process (from ~310s to ~32s), and even speeds up e2e performance for some reason (7.83s to 7.67s). ghstack-source-id: 1e98603 Pull Request resolved: #152395
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge -f "unrelated failure" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
__call__
on user defined object #152395torch.Tensor
and a non-torch op from einops #152369This relaxes the guard introduced in #100444 (which aggressively guard
on the object id, despite Dynamo is just tracing its
__call__
method.This allows users to bypass the high compilation time issue in #150706
by compiling transformer blocks only. Without this patch, we'd get lots
of unnecessary recompilation, as the block has difference attention
processor instances.
Compiling blocks only significantly speeds up compilation process
(from ~310s to ~32s), and even speeds up e2e performance for some reason
(7.83s to 7.67s).
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames