-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DONT MERGE YET][dynamo][guards] Do not dict pop in framelocals dict creation #164038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164038
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 044eaff with merge base 55840fb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really cool speedup!
…ion" For this program ``` import torch torch._dynamo.config.enable_cpp_symbolic_shape_guards = False def fn(x): for _ in range(20000): x = x + 1 return x mod = torch.fx.symbolic_trace(fn) print(mod) opt_mod = torch.compile(mod, backend="eager", dynamic=True) x = torch.randn(2, 2) opt_mod(x) ``` It reduces the guard overhead from 1.8 ms to 172 us. The dict pop is not required as explained in the comment. It can reduce to 46 us if we just remove the _dict.attr pop branch. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
…ion" For this program ``` import torch torch._dynamo.config.enable_cpp_symbolic_shape_guards = False def fn(x): for _ in range(20000): x = x + 1 return x mod = torch.fx.symbolic_trace(fn) print(mod) opt_mod = torch.compile(mod, backend="eager", dynamic=True) x = torch.randn(2, 2) opt_mod(x) ``` It reduces the guard overhead from 1.8 ms to 72 us. The dict pop is not required as explained in the comment. It can reduce to 46 us if we just remove the _dict.attr pop branch. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
} | ||
} else { | ||
_dict[framelocals_names[i]] = value; | ||
seen.insert(name_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is there a guarantee that entries in framelocals_names with the same string value are the string object? If not, then the unordered_set should use Python string compare for its hash/equality? (We should also try to craft a test case that fails if this update is done incorrectly - I recall encountering failing tests when I made mistakes in the initial implementation, so I'm confused as to why the tests today don't seem to catch incorrect updates.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was my concern as well. I don't know how to make that test case though.
Not merging yet. @williamwen42 has concerns and he will take over this PR. |
Followup to #164038 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
Followup to #164038 Pull Request resolved: #164316 Approved by: https://github.com/anijain2305
) Followup to pytorch#164038 Pull Request resolved: pytorch#164316 Approved by: https://github.com/anijain2305
Stack from ghstack (oldest at bottom):
For this program
It reduces the guard overhead from 1.8 ms to 72 us. The dict pop is not required as explained in the comment.
It can reduce to 46 us if we just remove the _dict.attr pop branch.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela