-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Introduce torch.sym_sum #136429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce torch.sym_sum #136429
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136429
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 5ce951c with merge base 59cdd8d ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Partially addresses #128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 190dc46 Pull Request resolved: #136429
torch/utils/_sympy/interp.py
Outdated
return getattr(analysis, handler_name)(*args, index_dtype) | ||
|
||
# Fastpath for n-ary integral addition | ||
if expr.func is sympy.Add and expr.is_integer and hasattr(analysis, "sym_add"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider is_integer
? It can be expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some other code that is simpler if I assume the n-ary only happens for ints. I don't have to make this assumption, but then I need to make sure I do float promotion correctly.
This is all kind of annoying because is_integer should be cheap; I'm really using is_integer as a stand in for "would I infer integer under some simple static typing rules that don't involve complicated questions about assumptions" lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isuruf@b30aa11 was my attempt at this last week, but it didn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very similar, I am sorry I did not see it before doing mine lol. I wonder if need to find more spots where operator.add is being called and short circuit it faster.
Unfortunately this does not have as much improvement on the end to end as I would have hoped 🤔 |
OK it turns out this makes a big difference, but I have to wire up userland sum() to actually use it, something like:
|
Partially addresses pytorch/pytorch#128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: f2fb1c5 Pull Request resolved: pytorch/pytorch#136429
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cprofile of the benchmark when N=3000 looks great.
sympy/sympy#27136 should help in the benchmark when N is large. |
@pytorchbot merge -r |
Rebased |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Partially addresses #128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 48ab30d Pull Request resolved: #136429
Instead, callback to a missing handler when needed. This greatly speeds things up with the value ranges dict is large. The missing handler is needed because nested ints don't have VRs, but symbolic sizes involving them occasionally show up in compute. ``` TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s11" TORCH_LOGS=dynamic PYTORCH_TEST_WITH_DYNAMO=1 python test/test_nestedtensor.py TestNestedTensorAutogradCPU.test_dropout_backward_jagged_cpu ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #136667 Approved by: https://github.com/isuruf ghstack dependencies: #136429
@pytorchbot revert -c nosignal -m "fails internal stuff" |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 90bed32. Reverted #136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](#136429 (comment)))
@ezyang your PR has been successfully reverted. |
@pytorchbot successfully started a revert job. Check the current status here. |
Reverting PR 136429 failedReason: Comment with id 2403335834 not found Details for Dev Infra teamRaised by workflow job |
is failing test false alarm at https://fb.workplace.com/groups/1293299944866616/permalink/1626810738182200/ looks like some replacement isn't happening (or is happening too aggressively now) |
#138660 just landed, see if it helps |
Stack from ghstack (oldest at bottom):
Partially addresses #128150
When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation. Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments. Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.
update_hint_regression benchmark, before and after:
Signed-off-by: Edward Z. Yang ezyang@meta.com
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec