Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

Closed
jgong5 opened this issue Nov 17, 2022 · 5 comments
Assignees
Labels
intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jgong5
Copy link
Collaborator

jgong5 commented Nov 17, 2022

Issue description

BF16 cannot accurately represent all integer values outside the range of [-255,255]. When there is a cast from an integer value from BF16 data type, there would be potential accuracy loss. Typically, such a type cast is used in pixel index calculation in ops such as upsample, pooling etc. FP32 is preferred in these cases. See #88939 for a related functionality issue.

Code example

area_pixel_compute_scale (https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L263) is one of the examples and used in a lot of places. There are also direct usage of static_cast<scalar_t> from integer to bf16, such as https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/FractionalMaxPool2d.cpp#L139

cc @VitalyFedyunin @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@jgong5 jgong5 added module: cpu CPU specific problem (e.g., perf, algorithm) intel This tag is for PR from Intel labels Nov 17, 2022
pytorchmergebot pushed a commit that referenced this issue Nov 17, 2022
…loat16 representation of large integer (#89210)

Fixes #88939

The root cause of the issue is that BF16 cannot accurately represent big integer values. In the test case below, `539` as one of the corner pixel index is wrongly represented as `540` (from https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L271) and then the access out of the range with this index. Thanks to @malfet for the investigation and initial fix. I also reported an issue #89212 to track the issue of inaccurate integer representation of bf16 that need to be addressed in other places of PyTorch.
```python
import torch

def test():
    arg_1 = torch.rand([1, 10, 540, 540], dtype=torch.bfloat16).clone()
    res = torch.nn.functional.interpolate(arg_1,2,mode='bilinear',align_corners=True)

test()
```

Pull Request resolved: #89210
Approved by: https://github.com/malfet
@malfet
Copy link
Contributor

malfet commented Nov 17, 2022

Well, float can not accurately represent large integers either, so imo a proper fix would be to add some sort of index = std::min(index, input_size - 1) guard

@jgong5
Copy link
Collaborator Author

jgong5 commented Nov 18, 2022

Well, float can not accurately represent large integers either, so imo a proper fix would be to add some sort of index = std::min(index, input_size - 1) guard

Yes, I just planned to add the check as well. Thanks for the reminder. The reason I didn't add it the first place is that the size would be very big and the size would be too big to get allocated. I reported this issue mainly for the accuracy concern BTW.

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 18, 2022
@jgong5 jgong5 assigned CaoE and unassigned jingxu10 Nov 21, 2022
pytorchmergebot pushed a commit that referenced this issue Nov 29, 2022
kulinseth pushed a commit to kulinseth/pytorch that referenced this issue Dec 10, 2022
…loat16 representation of large integer (pytorch#89210)

Fixes pytorch#88939

The root cause of the issue is that BF16 cannot accurately represent big integer values. In the test case below, `539` as one of the corner pixel index is wrongly represented as `540` (from https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L271) and then the access out of the range with this index. Thanks to @malfet for the investigation and initial fix. I also reported an issue pytorch#89212 to track the issue of inaccurate integer representation of bf16 that need to be addressed in other places of PyTorch.
```python
import torch

def test():
    arg_1 = torch.rand([1, 10, 540, 540], dtype=torch.bfloat16).clone()
    res = torch.nn.functional.interpolate(arg_1,2,mode='bilinear',align_corners=True)

test()
```

Pull Request resolved: pytorch#89210
Approved by: https://github.com/malfet
kulinseth pushed a commit to kulinseth/pytorch that referenced this issue Dec 10, 2022
@CaoE
Copy link
Collaborator

CaoE commented Dec 12, 2022

There is also a lot of accumulations in bfloat16 for backward, e.g. :

grad_input_data[input_offset] += grad_output_data[output_offset];
. When scale_factor is large the accumulation in bfloat16 will get wrong results. We should use float buffers to accumulate.

@CaoE
Copy link
Collaborator

CaoE commented Dec 23, 2022

BF16 is not supported by FractionalMaxPool yet.

pytorchmergebot pushed a commit that referenced this issue May 29, 2023
### Description
- Fix precision issue for BFloat16 upsampling: #89212
- Improve performance for BFloat16 upsampling.
### Testing
data type: BFloat16

- Single core

contiguous:
mode | scale_factor | shape  | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 14.47 | 8.34
linear | 2 | [3, 200, 200] | 3.69 | 2.74
bilinear | 2 | [3, 5, 200, 200] | 87.99 | 49.05
trilinear | 2 | [3, 3, 3, 100, 100]  | 171.02 | 72.53
bicubic | 2 | [3, 3, 200, 200 ] | 176.29 | 78

channels last:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 17.70 | 10.30
linear | 2 | [3, 200, 200] | \ | \
bilinear | 2 | [3, 5, 200, 200] | 50.90 | 18.83
trilinear | 2 | [3, 3, 3, 100, 100] | 121.56 | 42.60
bicubic | 2 | [3, 3, 200, 200 ] | 179.40 | 80

- 20 cores

contiguous:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 1.17 | 1.01
linear | 2 | [3, 200, 200] | 0.41 | 0.26
bilinear | 2 | [3, 5, 200, 200] | 7.19 | 4.07
trilinear | 2 | [3, 3, 3, 100, 100]  | 21.32 | 9.33
bicubic | 2 | [3, 3, 200, 200 ] | 178.67 | 10

channels last:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] |  2.25 | 1.55
linear | 2 | [3, 200, 200] | \ | \
bilinear | 2 | [3, 5, 200, 200] |  20.17 | 7.20
trilinear | 2 | [3, 3, 3, 100, 100] | 43.33 | 15.66
bicubic | 2 | [3, 3, 200, 200 ] | 176.76 | 10

Pull Request resolved: #91169
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/Skylion007
@jgong5
Copy link
Collaborator Author

jgong5 commented Dec 1, 2023

It was marked as done.

@jgong5 jgong5 closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

No branches or pull requests

5 participants