Skip to content

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 23, 2024

Stack from ghstack (oldest at bottom):

Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation. Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments. Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

update_hint_regression benchmark, before and after:

update_hint_regression,compile_time_instruction_count,2648328980
update_hint_regression,compile_time_instruction_count,2563748678

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Sep 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136429

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5ce951c with merge base 59cdd8d (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Sep 23, 2024
Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

ghstack-source-id: 190dc46
Pull Request resolved: #136429
@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: fx release notes category labels Sep 23, 2024
return getattr(analysis, handler_name)(*args, index_dtype)

# Fastpath for n-ary integral addition
if expr.func is sympy.Add and expr.is_integer and hasattr(analysis, "sym_add"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider is_integer? It can be expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some other code that is simpler if I assume the n-ary only happens for ints. I don't have to make this assumption, but then I need to make sure I do float promotion correctly.

This is all kind of annoying because is_integer should be cheap; I'm really using is_integer as a stand in for "would I infer integer under some simple static typing rules that don't involve complicated questions about assumptions" lol

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isuruf@b30aa11 was my attempt at this last week, but it didn't help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very similar, I am sorry I did not see it before doing mine lol. I wonder if need to find more spots where operator.add is being called and short circuit it faster.

@ezyang
Copy link
Contributor Author

ezyang commented Sep 23, 2024

Unfortunately this does not have as much improvement on the end to end as I would have hoped 🤔

@ezyang
Copy link
Contributor Author

ezyang commented Sep 23, 2024

OK it turns out this makes a big difference, but I have to wire up userland sum() to actually use it, something like:

diff --git a/torch/_dynamo/variables/functions.py b/torch/_dynamo/variables/functions.py
index 6c1e4a13c94..8b2ee014f7e 100644
--- a/torch/_dynamo/variables/functions.py
+++ b/torch/_dynamo/variables/functions.py
@@ -1,5 +1,6 @@
 # mypy: ignore-errors
 
+import builtins
 import collections
 import functools
 import inspect
@@ -17,6 +18,7 @@ from ..source import AttrSource, ConstantSource, DefaultsSource, GetItemSource
 from ..utils import (
     check_constant_args,
     check_unspec_or_constant_args,
+    proxy_args_kwargs,
     identity,
     is_function,
     is_wrapper_or_member_descriptor,
@@ -1002,6 +1004,20 @@ class PolyfilledFunctionVariable(VariableTracker):
     ) -> "VariableTracker":
         from torch._dynamo.variables.builder import SourcelessBuilder
 
+        # Special case for sum on tuple of floats and ints
+        if self.fn is builtins.sum and len(args) == 1 and not kwargs and isinstance(args[0], (variables.ListVariable, variables.TupleVariable)) and all(
+            (isinstance(x, variables.ConstantVariable) and isinstance(x.value, int))
+            or (isinstance(x, variables.SymNodeVariable) and x.python_type() is int)
+            for x in args[0].items
+        ):
+            return variables.SymNodeVariable.create(
+                tx,
+                tx.output.create_proxy(
+                    "call_function", torch.sym_add, *proxy_args_kwargs([x for x in args[0].items], {}),
+                ),
+                sym_num=torch.sym_add(*(x.value if isinstance(x, variables.ConstantVariable) else x.sym_num for x in args[0].items)),
+            )
+
         if self.can_constant_fold_through() and check_unspec_or_constant_args(
             args, kwargs
         ):

@albanD albanD removed their request for review September 23, 2024 18:12
[ghstack-poisoned]
[ghstack-poisoned]
@ezyang ezyang requested review from bdhirsh and isuruf September 29, 2024 03:19
injiiiiil pushed a commit to injiiiiil/654 that referenced this pull request Oct 1, 2024
Partially addresses pytorch/pytorch#128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

ghstack-source-id: f2fb1c5
Pull Request resolved: pytorch/pytorch#136429
Copy link
Collaborator

@isuruf isuruf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cprofile of the benchmark when N=3000 looks great.

@isuruf
Copy link
Collaborator

isuruf commented Oct 4, 2024

sympy/sympy#27136 should help in the benchmark when N is large.

@isuruf isuruf mentioned this pull request Oct 7, 2024
@ezyang
Copy link
Contributor Author

ezyang commented Oct 8, 2024

@pytorchbot merge -r

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Rebased gh/ezyang/2941/orig onto refs/remotes/origin/viable/strict because #136667 was rebased, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/136429)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #136429, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot pushed a commit that referenced this pull request Oct 8, 2024
Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

ghstack-source-id: 48ab30d
Pull Request resolved: #136429
pytorchmergebot pushed a commit that referenced this pull request Oct 8, 2024
Instead, callback to a missing handler when needed. This greatly speeds things up with the value ranges dict is large. The missing handler is needed because nested ints don't have VRs, but symbolic sizes involving them occasionally show up in compute.

```
TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s11" TORCH_LOGS=dynamic PYTORCH_TEST_WITH_DYNAMO=1 python test/test_nestedtensor.py TestNestedTensorAutogradCPU.test_dropout_backward_jagged_cpu
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: #136667
Approved by: https://github.com/isuruf
ghstack dependencies: #136429
@ezyang
Copy link
Contributor Author

ezyang commented Oct 9, 2024

@pytorchbot revert -c nosignal -m "fails internal stuff"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 9, 2024
This reverts commit 90bed32.

Reverted #136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](#136429 (comment)))
@pytorchmergebot
Copy link
Collaborator

@ezyang your PR has been successfully reverted.

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 136429 failed

Reason: Comment with id 2403335834 not found

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor Author

ezyang commented Oct 12, 2024

buck2 test '@fbcode//mode/opt' fbcode//deeplearning/aot_inductor/test:test_aoti_forward_compat_with_predictor -- --exact 'deeplearning/aot_inductor/test:test_aoti_forward_compat_with_predictor - test_ifr_cint_ip_predictor_prod (deeplearning.aot_inductor.test.test_forward_compat.TestAOTInductorForwardCompatibility)'

is failing test

false alarm at https://fb.workplace.com/groups/1293299944866616/permalink/1626810738182200/

looks like some replacement isn't happening (or is happening too aggressively now)

@ezyang
Copy link
Contributor Author

ezyang commented Oct 24, 2024

#138660 just landed, see if it helps

@ezyang ezyang closed this Oct 24, 2024
@github-actions github-actions bot deleted the gh/ezyang/2934/head branch November 24, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: fx release notes category Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants