Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inductor] Use shape env bounds in inductor bounds.py (#106175) #106568

Closed
wants to merge 7 commits into from

Conversation

masnesral
Copy link
Contributor

@masnesral masnesral commented Aug 3, 2023

Stack from ghstack (oldest at bottom):

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106568

Note: Links to docs will display an error until the docs builds have been completed.

✅ 3 Unrelated Failures

As of commit 19d1452:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

masnesral added a commit that referenced this pull request Aug 3, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: c14e787da45d219bc537148a7e60f3d65d4684c1
Pull Request resolved: #106568
@masnesral masnesral requested a review from eellison August 3, 2023 19:19
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome 🚀 I have a couple comments, but looks good!

tracing_context = torch._guards.TracingContext.get()
if tracing_context:
subs = [
(k, v.upper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have separate substitutions for upper and lower bound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison - thanks; meant to ask you about the lower bound handling. I might not understand exactly what we're doing here. If we have a range that's a symbolic expression of 4 * s0, for example, but we also have a constrained var_range available like s0: ValueRanges(lower=2, upper=100) ... I guess I assumed that only the upper bound was relevant for the int64 -> int32 optimization and we could use 4 * 100 = 400 for the upper bound. How is the lower bound relevant exactly? Do you just mean that we can also further constrain the lower bound on this line (and not assume 0)?

ValueRanges(0, s - 1)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If x has bounds [a, b], then -x has bounds [-b, -a], so you need to track both.

torch/_inductor/lowering.py Show resolved Hide resolved
return a[y.to(torch.int64)]

def fn2(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
from torch._export.constraints import constrain_as_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this can be a top level import

@eellison eellison requested a review from lezcano August 3, 2023 19:50
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same should be done in bound_sympy, as that will also affect the bounds we use to elide guards.

Even more, if var_ranges can be an expression, then we can probably just implement this in bound_sympy and delegate the calculation of the bounds in replacement_vars to bound_sympy

]
self.replacement_vals = {}
for k, v in loop_body.var_ranges.items():
s = v.subs(subs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is var_ranges just a symint, or can it be a expression? If it can be an expression, you should use bound_sympy to compute the actual bounds given the subs. If it's not, we can simply look up v in subs and get its range if it has free_symbols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano (Pardon my being a noob here). loop_body.var_ranges can contain expressions, e.g., I see 4 * s0. Given that, are you simply suggesting to use the bound_sympy helper to evaluate the bounds instead of the subs method I've used here? Playing around with that helper function, it seems like it indeed probably does what I want. But it sounds like you anticipate some modifications are needed in that helper first? If so, I don't understand that part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at how bound_sympy is implemented. Before calling into the bulk of the logic, it tries to do something very similar to what you're doing here. Just move the logic you implemented here to bound_sympy and you should be able to avoid repeating it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I might be naive, but looks like bound_sympy might "just work". I'll put up a new rev to show you what I mean.

@lezcano lezcano requested a review from ysiraichi August 3, 2023 20:43
…ounds.py (#106175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 4, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: 07391fa49b1a8a2266428854080c4870bef7bc42
Pull Request resolved: #106568
Comment on lines 28 to 29
context = torch._guards.TracingContext.get()
ranges = context.fake_mode.shape_env.var_to_range if context else {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to move this to bound_sympy so that other users of bound_sympy also get these better bounds for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano I don't understand this code base (obviously), so I just want to extra clarify: Are you sure that's always appropriate? Purely from an API perspective, it seems a little strange to silently add new ValueRanges beyond what were explicitly provided by the caller. Do all usages of bound_sympy want these ranges from the tracing context included?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracing context should only be set if its valid to use. If it is set, we can use it.

…6175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 7, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: 99dbbba40c768555f3835378bf3f3387a6e293bd
Pull Request resolved: #106568
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has some compilation time issues but I think those may already exist, so maybe it's worth addressing those in a follow up. will defer to @lezcano


# If there's a tracing context, augment available constrained ranges.
context = torch._guards.TracingContext.get()
if context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be O(n^2) if we do it for every symbol. Although it looks like bound_sympy already has O(n^2) behavior, e.g.

# Give some bounds to the free variables via their SymPy assumptions
# TODO A better way of doing this would be to assign them a range upon creation, as
# size variables can come with a lower bound of 2, as we specialise on 0 and 1
ranges = deepcopy(ranges)

and the calculating of bounds happens recursively without caching the calculation (as far as i can tell):

https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/interp.py#L89-L93

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean by "calculating the bounds happens recursively". As far as I can tell, I don't think it does.

Something that can be done better is avoiding to do the deepcopy, and simply merging the dictionaries as {**x, **y}. This would avoid a few unnecessary copies. Same same for the unbounded_vars bit, where we can avoid creating a new dictionary if we created it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive case
args = [sympy_interp(analysis, env, arg) for arg in expr.args] # type: ignore[arg-type]
handler_name = handlers()[expr.func]
handler = getattr(analysis, handler_name)

We're recursively computing the bounds of symbols without storing them in the environment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that doesn't call bound_sympy

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments, otherwise it LGTM


# If there's a tracing context, augment available constrained ranges.
context = torch._guards.TracingContext.get()
if context:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean by "calculating the bounds happens recursively". As far as I can tell, I don't think it does.

Something that can be done better is avoiding to do the deepcopy, and simply merging the dictionaries as {**x, **y}. This would avoid a few unnecessary copies. Same same for the unbounded_vars bit, where we can avoid creating a new dictionary if we created it here.

@@ -4520,6 +4520,14 @@ def fn(*args, **kwargs):
register_inplace(aten.__ixor__, aten.__xor__)


@register_lowering(aten.sym_constrain_range)
def sym_constrain_range(a, min, max):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano , I believe this part is needed. Otherwise we get:

...
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
MissingOperatorWithoutDecomp: missing lowering
  target: aten.sym_constrain_range.default

…nds.py (#106175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 9, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: 2ecd073c871df1aaa1d0837a84a32152d47a13b5
Pull Request resolved: #106568
…6175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 9, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: bd9fd39e9593651948e44d2d2bfca7094909a1b7
Pull Request resolved: #106568
@masnesral masnesral requested a review from lezcano August 9, 2023 01:48
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition! Thank you!

… bounds.py (#106175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 9, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: a9315817e1dd14905d3f728252ba4ca04fb486e9
Pull Request resolved: #106568
…6175)"

Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Aug 10, 2023
Summary: If constrained ranges are available, use them in bounds.py before value range analysis (to enable Int64 -> Int32 optimization).

Test Plan: New unit test in test_torchinductor.py to mark a tensor as dynamic, then constrain with constrain_as_size (as outlined in #106175)

ghstack-source-id: fe546e1b50430f2648ab9724dc0b21d0380bf2f0
Pull Request resolved: #106568
@masnesral masnesral added the topic: not user facing topic category label Aug 10, 2023
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 10, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/masnesral/1/head branch August 14, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants