-
Notifications
You must be signed in to change notification settings - Fork 25.7k
torch._int_mm: fix triton kernel caching #99283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99283
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 FailuresAs of commit e11ce6b: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is needed, since I recently changed the way things are cached in #98351. Tl;dr; the caching is not longer based on the default choice's name but rather is hardcored into the autotuning call.
here is the line using choice[0] name as the cache key: pytorch/torch/_inductor/select_algorithm.py Line 742 in c2fd198
I guess we can just use that instead of this hack then, I wasn't sure if there was some reason it had to be |
Oh good catch, we never actually pass the hardcoded name to the lookup function. I think if we just pass |
5cfea1d
to
49cd98b
Compare
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following: |
@pytorchbot merge -f 'rule requiring approvers from an allowlist seems unrelated' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Approval needed from one of the following: |
I guess each failed merge unmasks 5 more reviewers, so I can keep failing to merge until I see a familiar name... |
@pytorchmergebot merge -f 'give me 5 more names, bot' |
I'm not sure of the full list, but Jason can definitely stamp this as he leads the autotuning work. |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Approval needed from one of the following: |
@pytorchbot merge -f 'gimme 5 more names plz' |
thx! I'll see who manages to stamp this first. I gave some feedback to the job maintainers so hopefully we can make it less cryptic for next time. |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Approval needed from one of the following: |
@pytorchbot merge -f 'gimme 5 more names plz' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Approval needed from one of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a stamp
Summary: A hacky fix to ensure that kernels generated for `torch._int_mm` can be cached. We can remove this hack one eager mode `torch._int_mm` is better supported. Let me know if something more proper is needed instead of the hack. Test plan: ``` // running the script below led to two compilations of triton // int8,int8->int32 kernel before this PR, and only has // one compilcation which is reused after this PR import torch import torch.nn as nn x = torch.randint(-128, 127, (32, 32), device='cuda', dtype=torch.int8) y = torch.randint(-128, 127, (32, 32), device='cuda', dtype=torch.int8) class M(nn.Module): def forward(self, x): x = torch._int_mm(x, y) x = x.to(torch.int8) x = torch._int_mm(x, y) return x m = M().cuda().half() m = torch.compile(m, options={"max-autotune": True}) z = m(x) z = m(x) ```
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f 'CI failure seems unrelated' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
A fix to ensure that kernels generated for
torch._int_mm
can be cached. We can remove this hack one eager modetorch._int_mm
is better supported.Let me know if something more proper is needed instead of the hack.
Test plan:
Fixes #ISSUE_NUMBER
cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire