-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Description
🐛 Describe the bug
The torch.asin backward function in bfloat16 of cpu will show up to 2 ulp error when compared with gpu. The identical compution in float32 shows that the result of gpu is more accurate.
import torch
print(torch.__version__)
def test_asin_backward_bf16(device):
print(f"{device}:")
tensor_input = torch.tensor([0x3e84], dtype=torch.uint16).view(torch.bfloat16).to(device).requires_grad_(True)
asin_output = torch.asin(tensor_input)
external_grad = torch.tensor([0x3f0e], dtype=torch.uint16).view(torch.bfloat16).to(device).requires_grad_(True)
asin_output.backward(external_grad)
print(tensor_input.view(torch.uint16))
print(asin_output.view(torch.uint16))
print(external_grad.view(torch.uint16))
print(tensor_input.grad.view(torch.uint16))
test_asin_backward_bf16('cpu')
test_asin_backward_bf16('cuda')
def test_asin_backward_fp32(device):
print(f"{device}:")
tensor_input = torch.tensor([0x3e840000], dtype=torch.uint32).view(torch.float32).to(device).requires_grad_(True)
asin_output = torch.asin(tensor_input)
external_grad = torch.tensor([0x3f0e0000], dtype=torch.uint32).view(torch.float32).to(device).requires_grad_(True)
asin_output.backward(external_grad)
print(tensor_input.view(torch.uint32))
print(asin_output.view(torch.uint32))
print(external_grad.view(torch.uint32))
print(tensor_input.grad.view(torch.uint32))
test_asin_backward_fp32('cpu')
test_asin_backward_fp32('cuda')
output
2.8.0+cu126
cpu:
tensor([16004], dtype=torch.uint16)
tensor([16006], dtype=torch.uint16)
tensor([16142], dtype=torch.uint16)
tensor([16148], dtype=torch.uint16) # this is 0x3f14
cuda:
tensor([16004], device='cuda:0', dtype=torch.uint16)
tensor([16006], device='cuda:0', dtype=torch.uint16)
tensor([16142], device='cuda:0', dtype=torch.uint16)
tensor([16146], device='cuda:0', dtype=torch.uint16) # this is 0x3f12
cpu:
tensor([1048838144], dtype=torch.uint32)
tensor([1048936961], dtype=torch.uint32)
tensor([1057882112], dtype=torch.uint32)
tensor([1058207712], dtype=torch.uint32)
cuda:
tensor([1048838144], device='cuda:0', dtype=torch.uint32)
tensor([1048936961], device='cuda:0', dtype=torch.uint32)
tensor([1057882112], device='cuda:0', dtype=torch.uint32)
tensor([1058207712], device='cuda:0', dtype=torch.uint32) # this is 0x3f12f7e0
Versions
Collecting environment information...
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable; SMT Host state unknown
Vulnerability Meltdown: Vulnerable
Vulnerability Mmio stale data: Vulnerable
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Vulnerable
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Vulnerable
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable
Versions of relevant libraries:
[pip3] intel-cmplr-lib-ur==2025.2.1
[pip3] intel-openmp==2025.2.1
[pip3] mkl==2025.2.0
[pip3] numpy==2.0.2
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] nvtx==0.2.13
[pip3] optree==0.17.0
[pip3] pynvjitlink-cu12==0.7.0
[pip3] tbb==2022.2.0
[pip3] tcmlib==1.4.0
[pip3] torch==2.8.0+cu126
[pip3] torchao==0.10.0
[pip3] torchaudio==2.8.0+cu126
[pip3] torchdata==0.11.0
[pip3] torchsummary==1.5.1
[pip3] torchtune==0.6.1
[pip3] torchvision==0.23.0+cu126
[pip3] triton==3.4.0
[pip3] umf==0.11.0
[conda] Could not collect
cc @ezyang @albanD @gqchen @nikitaved @soulitzer @Varal7 @xmfan @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168
Metadata
Metadata
Assignees
Labels
Type
Projects
Status