-
Notifications
You must be signed in to change notification settings - Fork 25.7k
fix upsample bf16 issue for channels last path by using high pricsion to compute index #83847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix upsample bf16 issue for channels last path by using high pricsion to compute index #83847
Conversation
🔗 Helpful links
✅ No Failures (0 Pending)As of commit f3ad87e (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
… to compute index
8fc5a14 to
f3ad87e
Compare
|
thanks for fixing this issue. Does it also happen to other interpolation mode? Maybe try with more config combinations to make sure we did not miss anything. |
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here. |
Currently, I didn't find it happening to other modes, but I will do more tests to check them at the next step. |
|
Hey @XiaobingSuper. |
… to compute index (#83847) (#83847) Summary: Given the following case: ``` import torch a = torch.ones(1, 3, 320, 480).bfloat16().to(memory_format=torch.channels_last) out_bf16 = torch.nn.functional.interpolate(a, size = (640, 960), scale_factor = None, mode = 'bilinear', align_corners = False, recompute_scale_factor= None, antialias = False) out_fp32= torch.nn.functional.interpolate(a.float(), size = (640, 960), scale_factor = None, mode = 'bilinear', align_corners = False, recompute_scale_factor= None, antialias = False) print(out_bf16[0, 2, :, :]) print(out_fp32[0, 2, :, :]) ``` the boundary of bfloat16 output gets a wrong value: ``` tensor([[1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], ..., [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [0.0000e+00, 0.0000e+00, 1.8367e-40, ..., 0.0000e+00, 0.0000e+00, 0.0000e+00]], dtype=torch.bfloat16) tensor([[1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], ..., [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.]]) ``` the expected behavior is that the bfloat16 output value should also be one. The main reason is that we use low precision to compute the index, see https://github.com/pytorch/pytorch/blob/fcb124406bdf86bc2d15e999d5a3e09b86238bba/aten/src/ATen/native/UpSample.h#L448, we should use a high precison to do the computation as GPU path: https://github.com/pytorch/pytorch/blob/fcb124406bdf86bc2d15e999d5a3e09b86238bba/aten/src/ATen/native/cuda/UpSample.cuh#L123 Pull Request resolved: #83847 Approved by: https://github.com/frank-wei Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/658f958bc4bb314d9c6030eeaf3e1784792b5d15 Reviewed By: weiwangmeta Differential Revision: D38947080 fbshipit-source-id: eef6bfe50a4becd4550b20a88b119da1e1fc46c0
Given the following case:
the boundary of bfloat16 output gets a wrong value:
the expected behavior is that the bfloat16 output value should also be one. The main reason is that we use low precision to compute the index, see
pytorch/aten/src/ATen/native/UpSample.h
Line 448 in fcb1244
pytorch/aten/src/ATen/native/cuda/UpSample.cuh
Line 123 in fcb1244