Skip to content

Conversation

yanbing-j
Copy link
Collaborator

@yanbing-j yanbing-j commented Feb 23, 2023

Motivation

Add prelu to lower precision cast policy on AutocastCPU to fix #95365 :

Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu cannot address the scenario of different datatypes of input and weight, will get a RuntimeError. This scenario is common in autocast, e.g, with autocast to bf16, if the op before prelu comes out a bf16 output, which is the input of prelu, and prelu's weight is fp32, then it will get a RuntimeError.

After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with bf16 data type.

Before #91238, when input is bf16, weight will be forced to cast to bf16. After #91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to bf16.

And this also alighs with Autocast CUDA whitelist.

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 23, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95366

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 64b70f9:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@leslie-fang-intel
Copy link
Collaborator

@yanbing-j May I kindly know why we have to put prelu into whitelist? How much will it affect some model's performance?

@yanbing-j
Copy link
Collaborator Author

@yanbing-j May I kindly know why we have to put prelu into whitelist? How much will it affect some model's performance?

I have updated the description, which will answer your questions.

@leslie-fang-intel
Copy link
Collaborator

Thanks for the comments. It looks like PReLU should support mixed precision input like BN instead of add it into whitellist?

@yanbing-j
Copy link
Collaborator Author

This is also align with CUDA whitelist.
Since #91238 rewrites all the code of PReLU, and merge CPU and CUDA code together, I think we don't need extra effert to split CPU and GUDA code to add the mixed precision support.

Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what is the design philosophy of adding ops to the autocast whitelist. My understanding is that we only explicitly downcast ops for those that we know they are very likely get perf benefit from low-precision compute even if there is downcast cost. These compute-intensive dot-product ops that have HW acceleration support. From the autocast lists of CPU and CUDA, it seems "prelu" is the only exception. In my opinion, "prelu" is more like batch-norm and better fit "fall-through" policy instead.

Can we do type conversion on weights inside prelu just like batch-norm without throwing errors on type mismatch? @lezcano

@lezcano
Copy link
Collaborator

lezcano commented Feb 27, 2023

I didn't add support for type promotion in prelu in that PR just because it's a bit annoying to get that right so when implementing 2nd order derivatives by hand.
Should I implement this within the operation itself, or should this be done at the autocast level? @ngimel

@ngimel
Copy link
Collaborator

ngimel commented Feb 27, 2023

prelu doesn't follow standard type promotion rules, so it's indeed cumbersome to add the support at the op level. I don't think there's much harm done when prelu is just added to autocast allowlist (at worst an extra conversion of activations to low precision, but likely not that, because activations typically come out in low precision from the preceding gemm/convolution)

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with autocast then as per #95366 (comment)

@yanbing-j yanbing-j force-pushed the yanbing/add_prelu_whitelist branch from a2a003c to 502e3c1 Compare February 28, 2023 02:30
@yanbing-j yanbing-j added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Feb 28, 2023
@yanbing-j yanbing-j added the intel This tag is for PR from Intel label Feb 28, 2023
@yanbing-j
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased yanbing/add_prelu_whitelist onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout yanbing/add_prelu_whitelist && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the yanbing/add_prelu_whitelist branch from 502e3c1 to 64b70f9 Compare February 28, 2023 07:05
@yanbing-j
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 2, 2023
### Motivation
Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 :

Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError.

After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type.

Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`.  After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`.

And this also alighs with Autocast CUDA whitelist.

Pull Request resolved: pytorch/pytorch#95366
Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
### Motivation
Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 :

Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError.

After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type.

Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`.  After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`.

And this also alighs with Autocast CUDA whitelist.

Pull Request resolved: pytorch/pytorch#95366
Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
### Motivation
Add `prelu` to lower precision cast policy on AutocastCPU to fix pytorch/pytorch#95365 :

Before: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `prelu` cannot address the scenario of different datatypes of `input` and `weight`, will get a RuntimeError. This scenario is common in autocast, e.g, with `autocast` to `bf16`, if the `op` before `prelu` comes out a `bf16` output, which is the input of `prelu`, and `prelu's` weight is `fp32`, then it will get a RuntimeError.

After: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , prelu be forced to run with `bf16` data type.

Before pytorch/pytorch#91238, when input is `bf16`, weight will be forced to cast to `bf16`.  After pytorch/pytorch#91238, this kind of test scenario will raise a RuntimeError. There is no precision loss since the workable one is also casting to `bf16`.

And this also alighs with Autocast CUDA whitelist.

Pull Request resolved: pytorch/pytorch#95366
Approved by: https://github.com/ngimel, https://github.com/lezcano, https://github.com/leslie-fang-intel
pruthvistony added a commit to ROCm/pytorch that referenced this pull request May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: amp (automated mixed precision) autocast open source topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

prelu: type promoting not supported

7 participants