-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[inductor] simplify expr when looking up size hint #123140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123140
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 35fcbbb with merge base f15fd65 ( UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D55619331 |
9da2268
to
3f6acf6
Compare
Summary: Pull Request resolved: pytorch#123140 Test Plan: tbd Differential Revision: D55619331
This pull request was exported from Phabricator. Differential Revision: D55619331 |
3f6acf6
to
4b0f72f
Compare
This pull request was exported from Phabricator. Differential Revision: D55619331 |
Hi @peterbell10, should this change be okay? or does this look like a fix that should be happening in Dynamo? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the bmm
's shape is wrong in the comment? Should be [s0, 16, 32]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops forgot to update lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment above this line mentioning that here s0
and u0
are unified?
I think, it's fine to do the replacements inside |
4b0f72f
to
0e3d87e
Compare
This pull request was exported from Phabricator. Differential Revision: D55619331 |
Summary: ## Context Suppose we have two symbols: `u0` and `s0` where we know that `u0 = s0`. Now, let's say we tried to look up the size hint for `u0 + 1`. * Before this PR, we would use a fallback hint if one was provided. https://github.com/pytorch/pytorch/blob/3f6acf65fd9b6094513cf28898a42b90dd1169a0/torch/_inductor/sizevars.py#L406-L407 * With this PR, we would try to replace `u0` with `s0` via `simplify()` before using a fallback hint. https://github.com/pytorch/pytorch/blob/3f6acf65fd9b6094513cf28898a42b90dd1169a0/torch/_inductor/sizevars.py#L46-L47 ## Concrete Example A scenario where this is useful is when we're running autotuning benchmarking on bmm with two input nodes: one who has `s0` as the batch size and one who has `u0` as the batch size. During benchmarking, we'll create two example input tensors where the input with `u0` has to use a fallback hint for batch size. This will lead to a mismatch. https://github.com/pytorch/pytorch/blob/e3d80f2fa98d7ab02f88023d381b2e5981dd99ff/torch/_inductor/select_algorithm.py#L991-L997 Using the fallback hint (i.e. 8192) leads to a batch size mismatch. ```python # Note: s0 = 7 and u0 = 7 and fallback hint is 8192. LoweringException: ErrorFromChoice: Expected size for first two dimensions of batch2 tensor to be: [7, 30] but got: [8192, 30]. From choice ExternKernelCaller(extern_kernels.bmm) ``` Test Plan: CI ``` $ CUDA_VISIBLE_DEVICES=0 python test/inductor/test_unbacked_symints.py -k test_equivalent_backed_unbacked_cuda ### Before ### File "torch/_inductor/select_algorithm.py", line 964, in __call__ timings = do_autotuning(precompile_fn) File "torch/_inductor/select_algorithm.py", line 911, in do_autotuning timings = self.lookup( File "torch/_inductor/codecache.py", line 306, in lookup raise e File "torch/_inductor/codecache.py", line 297, in lookup timings = benchmark(choices) File "torch/_inductor/select_algorithm.py", line 897, in autotune return make_benchmark_fn()(choices) File "torch/_inductor/select_algorithm.py", line 1068, in benchmark_in_current_process raise ErrorFromChoice(msg, choice, debug_str()) # noqa: TRY200 torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: LoweringException: ErrorFromChoice: Expected size for first two dimensions of batch2 tensor to be: [7, 30] but got: [8192, 30]. From choice ExternKernelCaller(extern_kernels.bmm) inputs = [ torch.empty_strided((7, 30, 16), (480, 16, 1), dtype=torch.float32, device='cuda'), torch.empty_strided((30, 32), (32, 1), dtype=torch.float32, device='cuda'), ] ### After ### ---------------------------------------------------------------------- Ran 1 test in 4.627s OK ``` Reviewed By: tissue3, aakhundov Differential Revision: D55619331
0e3d87e
to
35fcbbb
Compare
This pull request was exported from Phabricator. Differential Revision: D55619331 |
CI passes in OSS and internally. Also, we were worried that |
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
return sympy_subs(expr, self.var_to_val) | ||
|
||
def size_hint(self, expr: Expr, *, fallback: Optional[int] = None) -> int: | ||
expr = self.simplify(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in symbolic_hint
but otherwise this LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColinPeppler this wasn't fixed
@pytorchbot merge -f "merged internally" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
## Context Suppose we have two symbols: `u0` and `s0` where we know that `u0 = s0`. Now, let's say we tried to look up the size hint for `u0 + 1`. * Before this PR, we would use a fallback hint if one was provided. https://github.com/pytorch/pytorch/blob/3f6acf65fd9b6094513cf28898a42b90dd1169a0/torch/_inductor/sizevars.py#L406-L407 * With this PR, we would try to replace `u0` with `s0` via `simplify()` before using a fallback hint. https://github.com/pytorch/pytorch/blob/3f6acf65fd9b6094513cf28898a42b90dd1169a0/torch/_inductor/sizevars.py#L46-L47 ## Concrete Example A scenario where this is useful is when we're running autotuning benchmarking on bmm with two input nodes: one who has `s0` as the batch size and one who has `u0` as the batch size. During benchmarking, we'll create two example input tensors where the input with `u0` has to use a fallback hint for batch size. This will lead to a mismatch. https://github.com/pytorch/pytorch/blob/e3d80f2fa98d7ab02f88023d381b2e5981dd99ff/torch/_inductor/select_algorithm.py#L991-L997 Using the fallback hint (i.e. 8192) leads to a batch size mismatch. ``` # Note: s0 = 7 and u0 = 7 and fallback hint is 8192. LoweringException: ErrorFromChoice: Expected size for first two dimensions of batch2 tensor to be: [7, 30] but got: [8192, 30]. From choice ExternKernelCaller(extern_kernels.bmm) ``` Differential Revision: D55619331 Pull Request resolved: pytorch#123140 Approved by: https://github.com/aakhundov
Context
Suppose we have two symbols:
u0
ands0
where we know thatu0 = s0
. Now, let's say we tried to look up the size hint foru0 + 1
.Before this PR, we would use a fallback hint if one was provided.
pytorch/torch/_inductor/sizevars.py
Lines 406 to 407 in 3f6acf6
With this PR, we would try to replace
u0
withs0
viasimplify()
before using a fallback hint.pytorch/torch/_inductor/sizevars.py
Lines 46 to 47 in 3f6acf6
Concrete Example
A scenario where this is useful is when we're running autotuning benchmarking on bmm with two input nodes: one who has
s0
as the batch size and one who hasu0
as the batch size. During benchmarking, we'll create two example input tensors where the input withu0
has to use a fallback hint for batch size. This will lead to a mismatch.pytorch/torch/_inductor/select_algorithm.py
Lines 991 to 997 in e3d80f2
Using the fallback hint (i.e. 8192) leads to a batch size mismatch.
Differential Revision: D55619331
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @amjames @desertfire @chauhang