[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

jgong5 · 2022-11-17T14:32:57Z

Issue description

BF16 cannot accurately represent all integer values outside the range of [-255,255]. When there is a cast from an integer value from BF16 data type, there would be potential accuracy loss. Typically, such a type cast is used in pixel index calculation in ops such as upsample, pooling etc. FP32 is preferred in these cases. See #88939 for a related functionality issue.

Code example

area_pixel_compute_scale (https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L263) is one of the examples and used in a lot of places. There are also direct usage of static_cast<scalar_t> from integer to bf16, such as https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/FractionalMaxPool2d.cpp#L139

cc @VitalyFedyunin @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

The text was updated successfully, but these errors were encountered:

@malfet

…loat16 representation of large integer (#89210) Fixes #88939 The root cause of the issue is that BF16 cannot accurately represent big integer values. In the test case below, `539` as one of the corner pixel index is wrongly represented as `540` (from https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L271) and then the access out of the range with this index. Thanks to @malfet for the investigation and initial fix. I also reported an issue #89212 to track the issue of inaccurate integer representation of bf16 that need to be addressed in other places of PyTorch. ```python import torch def test(): arg_1 = torch.rand([1, 10, 540, 540], dtype=torch.bfloat16).clone() res = torch.nn.functional.interpolate(arg_1,2,mode='bilinear',align_corners=True) test() ``` Pull Request resolved: #89210 Approved by: https://github.com/malfet

malfet · 2022-11-17T16:43:37Z

Well, float can not accurately represent large integers either, so imo a proper fix would be to add some sort of index = std::min(index, input_size - 1) guard

jgong5 · 2022-11-18T00:12:26Z

Well, float can not accurately represent large integers either, so imo a proper fix would be to add some sort of index = std::min(index, input_size - 1) guard

Yes, I just planned to add the check as well. Thanks for the reminder. The reason I didn't add it the first place is that the size would be very big and the size would be too big to get allocated. I reported this issue mainly for the accuracy concern BTW.

…da (#89252) Improve the fix in #89210 See discussion in #89212 (comment) Pull Request resolved: #89252 Approved by: https://github.com/mingfeima, https://github.com/weiwangmeta

@malfet

…loat16 representation of large integer (pytorch#89210) Fixes pytorch#88939 The root cause of the issue is that BF16 cannot accurately represent big integer values. In the test case below, `539` as one of the corner pixel index is wrongly represented as `540` (from https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L271) and then the access out of the range with this index. Thanks to @malfet for the investigation and initial fix. I also reported an issue pytorch#89212 to track the issue of inaccurate integer representation of bf16 that need to be addressed in other places of PyTorch. ```python import torch def test(): arg_1 = torch.rand([1, 10, 540, 540], dtype=torch.bfloat16).clone() res = torch.nn.functional.interpolate(arg_1,2,mode='bilinear',align_corners=True) test() ``` Pull Request resolved: pytorch#89210 Approved by: https://github.com/malfet

…da (pytorch#89252) Improve the fix in pytorch#89210 See discussion in pytorch#89212 (comment) Pull Request resolved: pytorch#89252 Approved by: https://github.com/mingfeima, https://github.com/weiwangmeta

CaoE · 2022-12-12T10:36:08Z

There is also a lot of accumulations in bfloat16 for backward, e.g. :

pytorch/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp

Line 53 in 5adc18d

grad_input_data[input_offset] += grad_output_data[output_offset];

. When scale_factor is large the accumulation in bfloat16 will get wrong results. We should use float buffers to accumulate.

CaoE · 2022-12-23T06:02:22Z

BF16 is not supported by FractionalMaxPool yet.

### Description - Fix precision issue for BFloat16 upsampling: #89212 - Improve performance for BFloat16 upsampling. ### Testing data type: BFloat16 - Single core contiguous: mode | scale_factor | shape | before backward / ms | after backward / ms -- | -- | -- | -- | -- nearest | 2 | [10, 3, 200, 200] | 14.47 | 8.34 linear | 2 | [3, 200, 200] | 3.69 | 2.74 bilinear | 2 | [3, 5, 200, 200] | 87.99 | 49.05 trilinear | 2 | [3, 3, 3, 100, 100] | 171.02 | 72.53 bicubic | 2 | [3, 3, 200, 200 ] | 176.29 | 78 channels last: mode | scale_factor | shape | before backward / ms | after backward / ms -- | -- | -- | -- | -- nearest | 2 | [10, 3, 200, 200] | 17.70 | 10.30 linear | 2 | [3, 200, 200] | \ | \ bilinear | 2 | [3, 5, 200, 200] | 50.90 | 18.83 trilinear | 2 | [3, 3, 3, 100, 100] | 121.56 | 42.60 bicubic | 2 | [3, 3, 200, 200 ] | 179.40 | 80 - 20 cores contiguous: mode | scale_factor | shape | before backward / ms | after backward / ms -- | -- | -- | -- | -- nearest | 2 | [10, 3, 200, 200] | 1.17 | 1.01 linear | 2 | [3, 200, 200] | 0.41 | 0.26 bilinear | 2 | [3, 5, 200, 200] | 7.19 | 4.07 trilinear | 2 | [3, 3, 3, 100, 100] | 21.32 | 9.33 bicubic | 2 | [3, 3, 200, 200 ] | 178.67 | 10 channels last: mode | scale_factor | shape | before backward / ms | after backward / ms -- | -- | -- | -- | -- nearest | 2 | [10, 3, 200, 200] | 2.25 | 1.55 linear | 2 | [3, 200, 200] | \ | \ bilinear | 2 | [3, 5, 200, 200] | 20.17 | 7.20 trilinear | 2 | [3, 3, 3, 100, 100] | 43.33 | 15.66 bicubic | 2 | [3, 3, 200, 200 ] | 176.76 | 10 Pull Request resolved: #91169 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/Skylion007

jgong5 · 2023-12-01T05:28:43Z

It was marked as done.

jgong5 added module: cpu CPU specific problem (e.g., perf, algorithm) intel This tag is for PR from Intel labels Nov 17, 2022

jgong5 mentioned this issue Nov 17, 2022

torch.nn.functional.interpolate Trigger heap-buffer-overflow with AddressSanitizer #88939

Closed

jgong5 assigned jingxu10 Nov 17, 2022

jgong5 mentioned this issue Nov 17, 2022

Fix buffer overflow from AddressSanitizer checks due to inaccurate bfloat16 representation of large integer #89210

Closed

jgong5 mentioned this issue Nov 18, 2022

Guard the boundary of index computed in compute_source_index_and_lambda #89252

Closed

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 18, 2022

jgong5 assigned CaoE and unassigned jingxu10 Nov 21, 2022

CaoE mentioned this issue Dec 22, 2022

Improve precision and performance for BFloat16 upsampling #91169

Closed

jgong5 closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

jgong5 commented Nov 17, 2022 •

edited

malfet commented Nov 17, 2022

jgong5 commented Nov 18, 2022 •

edited

CaoE commented Dec 12, 2022 •

edited

CaoE commented Dec 23, 2022 •

edited

jgong5 commented Dec 1, 2023

[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

[BF16] Visit all the type cast from integer to BF16 type for potential accuracy loss #89212

Comments

jgong5 commented Nov 17, 2022 • edited

Issue description

Code example

malfet commented Nov 17, 2022

jgong5 commented Nov 18, 2022 • edited

CaoE commented Dec 12, 2022 • edited

CaoE commented Dec 23, 2022 • edited

jgong5 commented Dec 1, 2023

jgong5 commented Nov 17, 2022 •

edited

jgong5 commented Nov 18, 2022 •

edited

CaoE commented Dec 12, 2022 •

edited

CaoE commented Dec 23, 2022 •

edited