-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inductor] Use shape env bounds in inductor bounds.py (#106175) #106568
Conversation
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106568
Note: Links to docs will display an error until the docs builds have been completed. ✅ 3 Unrelated FailuresAs of commit 19d1452: UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: c14e787da45d219bc537148a7e60f3d65d4684c1 Pull Request resolved: #106568
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome 🚀 I have a couple comments, but looks good!
torch/_inductor/bounds.py
Outdated
tracing_context = torch._guards.TracingContext.get() | ||
if tracing_context: | ||
subs = [ | ||
(k, v.upper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have separate substitutions for upper and lower bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison - thanks; meant to ask you about the lower bound handling. I might not understand exactly what we're doing here. If we have a range that's a symbolic expression of 4 * s0
, for example, but we also have a constrained var_range
available like s0: ValueRanges(lower=2, upper=100)
... I guess I assumed that only the upper bound was relevant for the int64 -> int32 optimization and we could use 4 * 100 = 400
for the upper bound. How is the lower bound relevant exactly? Do you just mean that we can also further constrain the lower bound on this line (and not assume 0)?
ValueRanges(0, s - 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If x
has bounds [a, b]
, then -x
has bounds [-b, -a]
, so you need to track both.
test/inductor/test_torchinductor.py
Outdated
return a[y.to(torch.int64)] | ||
|
||
def fn2(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor: | ||
from torch._export.constraints import constrain_as_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this can be a top level import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same should be done in bound_sympy
, as that will also affect the bounds we use to elide guards.
Even more, if var_ranges
can be an expression, then we can probably just implement this in bound_sympy
and delegate the calculation of the bounds in replacement_vars
to bound_sympy
torch/_inductor/bounds.py
Outdated
] | ||
self.replacement_vals = {} | ||
for k, v in loop_body.var_ranges.items(): | ||
s = v.subs(subs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is var_ranges
just a symint, or can it be a expression? If it can be an expression, you should use bound_sympy
to compute the actual bounds given the subs
. If it's not, we can simply look up v
in subs
and get its range if it has free_symbols
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lezcano (Pardon my being a noob here). loop_body.var_ranges
can contain expressions, e.g., I see 4 * s0
. Given that, are you simply suggesting to use the bound_sympy
helper to evaluate the bounds instead of the subs
method I've used here? Playing around with that helper function, it seems like it indeed probably does what I want. But it sounds like you anticipate some modifications are needed in that helper first? If so, I don't understand that part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at how bound_sympy
is implemented. Before calling into the bulk of the logic, it tries to do something very similar to what you're doing here. Just move the logic you implemented here to bound_sympy
and you should be able to avoid repeating it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I might be naive, but looks like bound_sympy
might "just work". I'll put up a new rev to show you what I mean.
…ounds.py (#106175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: 07391fa49b1a8a2266428854080c4870bef7bc42 Pull Request resolved: #106568
torch/_inductor/bounds.py
Outdated
context = torch._guards.TracingContext.get() | ||
ranges = context.fake_mode.shape_env.var_to_range if context else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to move this to bound_sympy
so that other users of bound_sympy
also get these better bounds for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lezcano I don't understand this code base (obviously), so I just want to extra clarify: Are you sure that's always appropriate? Purely from an API perspective, it seems a little strange to silently add new ValueRanges beyond what were explicitly provided by the caller. Do all usages of bound_sympy
want these ranges from the tracing context included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tracing context should only be set if its valid to use. If it is set, we can use it.
…6175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: 99dbbba40c768555f3835378bf3f3387a6e293bd Pull Request resolved: #106568
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has some compilation time issues but I think those may already exist, so maybe it's worth addressing those in a follow up. will defer to @lezcano
torch/utils/_sympy/value_ranges.py
Outdated
|
||
# If there's a tracing context, augment available constrained ranges. | ||
context = torch._guards.TracingContext.get() | ||
if context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be O(n^2) if we do it for every symbol. Although it looks like bound_sympy already has O(n^2) behavior, e.g.
pytorch/torch/utils/_sympy/value_ranges.py
Lines 559 to 562 in 3d0c767
# Give some bounds to the free variables via their SymPy assumptions | |
# TODO A better way of doing this would be to assign them a range upon creation, as | |
# size variables can come with a lower bound of 2, as we specialise on 0 and 1 | |
ranges = deepcopy(ranges) |
and the calculating of bounds happens recursively without caching the calculation (as far as i can tell):
https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/interp.py#L89-L93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you mean by "calculating the bounds happens recursively". As far as I can tell, I don't think it does.
Something that can be done better is avoiding to do the deepcopy
, and simply merging the dictionaries as {**x, **y}
. This would avoid a few unnecessary copies. Same same for the unbounded_vars
bit, where we can avoid creating a new dictionary if we created it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursive case
args = [sympy_interp(analysis, env, arg) for arg in expr.args] # type: ignore[arg-type]
handler_name = handlers()[expr.func]
handler = getattr(analysis, handler_name)
We're recursively computing the bounds of symbols without storing them in the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that doesn't call bound_sympy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small comments, otherwise it LGTM
torch/utils/_sympy/value_ranges.py
Outdated
|
||
# If there's a tracing context, augment available constrained ranges. | ||
context = torch._guards.TracingContext.get() | ||
if context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you mean by "calculating the bounds happens recursively". As far as I can tell, I don't think it does.
Something that can be done better is avoiding to do the deepcopy
, and simply merging the dictionaries as {**x, **y}
. This would avoid a few unnecessary copies. Same same for the unbounded_vars
bit, where we can avoid creating a new dictionary if we created it here.
@@ -4520,6 +4520,14 @@ def fn(*args, **kwargs): | |||
register_inplace(aten.__ixor__, aten.__xor__) | |||
|
|||
|
|||
@register_lowering(aten.sym_constrain_range) | |||
def sym_constrain_range(a, min, max): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lezcano , I believe this part is needed. Otherwise we get:
...
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
MissingOperatorWithoutDecomp: missing lowering
target: aten.sym_constrain_range.default
…nds.py (#106175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: 2ecd073c871df1aaa1d0837a84a32152d47a13b5 Pull Request resolved: #106568
…6175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: bd9fd39e9593651948e44d2d2bfca7094909a1b7 Pull Request resolved: #106568
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition! Thank you!
… bounds.py (#106175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: a9315817e1dd14905d3f728252ba4ca04fb486e9 Pull Request resolved: #106568
…6175)" Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) [ghstack-poisoned]
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization). Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175) ghstack-source-id: fe546e1b50430f2648ab9724dc0b21d0380bf2f0 Pull Request resolved: #106568
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).
Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov