-
Notifications
You must be signed in to change notification settings - Fork 560
Revert "mul: remove opmath cast sequence (pytorch#9663)" #9701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Is the torchax test failure expected? |
ysiraichi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The TorchAX CI could be fixed by this change
- Instead of reverting the commit, I think it's better to make it backend specific, and land it on
masterand cherry-pick it tor2.9?
What do you think?
|
While I understand the suggestion from #8545 to make this backend-specific, I believe a full revert is more appropriate atleast for release branch r2.9:
Let me know your thoughts? |
|
Thank you for your detailed analysis. I'm sorry, but I still think that the best way to go about this is to make it backend specific. Actually, now that I'm thinking about it, I think it would make more sense to do it in the lowering step. That said, since we don't really want to introduce a lot of changes here, and the infrastructure is already in place, I would say that we should make that It's a simple change (something like this condition for TPU) that fixes this problem for the Neuron backend, while leaving the other backends with the "backend-independent" choice of not doing that.
That "seeming unnecessary upcast/downcast" is exactly what we are talking about, here. And, no, PyTorch autocast does not solve this issue (see this comment). |
9eb03b7 to
ddfe243
Compare
ysiraichi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
Let me know if you have any questions.
|
Could you rebase this PR, so that there are only your commits? |
50f33f2 to
214f075
Compare
updated |
ysiraichi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Let's just wait for the CI.
|
@ysiraichi CI is completed. |
|
Thanks @rajkthakur @ysiraichi . It is merged now and ready to go. @bhavya01 will make a final 2.9 release candidate. |
Commit 2a9138a removed
.use_opmathtype_for_compute()from element-wise 'mul' operation, this breaks mixed-precision accumulation behavior expected by the Neuron compiler that traces/compile on CPU and later execute the binary on neuron hardwares, causing significant accuracy degradation in:Reverts: commit 2a9138a, other changes are result of rebase from r2.9
Fixes: Model accuracy failures with mixed-precision accumulation #9699