-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add prelu into Autocast CPU whitelist #95366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95366
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 64b70f9: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@yanbing-j May I kindly know why we have to put |
I have updated the description, which will answer your questions. |
Thanks for the comments. It looks like |
This is also align with CUDA whitelist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what is the design philosophy of adding ops to the autocast whitelist. My understanding is that we only explicitly downcast ops for those that we know they are very likely get perf benefit from low-precision compute even if there is downcast cost. These compute-intensive dot-product ops that have HW acceleration support. From the autocast lists of CPU and CUDA, it seems "prelu" is the only exception. In my opinion, "prelu" is more like batch-norm and better fit "fall-through" policy instead.
Can we do type conversion on weights inside prelu just like batch-norm without throwing errors on type mismatch? @lezcano
I didn't add support for type promotion in prelu in that PR just because it's a bit annoying to get that right so when implementing 2nd order derivatives by hand. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with autocast then as per #95366 (comment)
a2a003c
to
502e3c1
Compare
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
502e3c1
to
64b70f9
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
### Motivation Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 : Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError. After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type. Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`. After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`. And this also alighs with Autocast CUDA whitelist. Pull Request resolved: pytorch/pytorch#95366 Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
### Motivation Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 : Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError. After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type. Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`. After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`. And this also alighs with Autocast CUDA whitelist. Pull Request resolved: pytorch/pytorch#95366 Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
### Motivation Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 : Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError. After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type. Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`. After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`. And this also alighs with Autocast CUDA whitelist. Pull Request resolved: pytorch/pytorch#95366 Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
This reverts commit 71ad100.
Motivation
Add
prelu
to lower precision cast policy on AutocastCPU to fix #95365 :Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) ,
prelu
cannot address the scenario of different datatypes ofinput
andweight
, will get a RuntimeError. This scenario is common in autocast, e.g, withautocast
tobf16
, if theop
beforeprelu
comes out abf16
output, which is the input ofprelu
, andprelu's
weight isfp32
, then it will get a RuntimeError.After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with
bf16
data type.Before #91238, when input is
bf16
, weight will be forced to cast tobf16
. After #91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting tobf16
.And this also alighs with Autocast CUDA whitelist.
cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10