🐛 Describe the bug
Description
test_iou[xyxyxyxy-dtype0-cpu] fails on arm64 (aarch64) Linux with float32 dtype. The test passes on x86_64 and ppc64el.
The existing tolerance handling in the test accounts for macOS (atol=0.5) and CUDA (xfail), but the else fallback uses atol=1e-4 which is too tight for arm64 Linux where the observed absolute difference is 0.5058.
log.gz
Error
FAILED test_ops.py::TestRotatedBoxIou::test_iou[xyxyxyxy-dtype0-cpu]
Mismatched elements: 2 / 64 (3.1%)
Greatest absolute difference: 0.5057777166366577 at index (3, 0) (up to 0.0001 allowed)
Greatest relative difference: 0.5057777166366577 at index (3, 0) (up to 0.0001 allowed)
The cause
Guessed cause is that torch.cos and torch.sin on tensors use architecture-specific SIMD implementation. The Sleef NEON backend (arm64) produces slightly different float32 rounding than the AVX backend (x86_64), similar to the already handled macOS arm64 case which uses Apple's vDSP implementation.
Versions
PyTorch version: 2.12.0+debian
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux forky/sid (aarch64)
GCC version: (Debian 15.2.0-17) 15.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.42
Python version: 3.13.12 (main, Feb 4 2026, 15:06:39) [GCC 15.2.0] (64-bit runtime)
Python platform: Linux-6.12.88+deb13-arm64-aarch64-with-glibc2.42
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
...
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 4 MiB (64 instances)
L1i cache: 4 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 64 MiB (2 instances)
Versions of relevant libraries:
[pip3] numpy==2.4.4
[pip3] torch==2.12.0+debian
[pip3] torchvision==0.27.0
[conda] Could not collect
🐛 Describe the bug
Description
test_iou[xyxyxyxy-dtype0-cpu]fails on arm64 (aarch64) Linux withfloat32dtype. The test passes on x86_64 and ppc64el.The existing tolerance handling in the test accounts for macOS (
atol=0.5) and CUDA (xfail), but theelsefallback usesatol=1e-4which is too tight for arm64 Linux where the observed absolute difference is 0.5058.log.gz
Error
The cause
Guessed cause is that
torch.cosandtorch.sinon tensors use architecture-specific SIMD implementation. The Sleef NEON backend (arm64) produces slightly different float32 rounding than the AVX backend (x86_64), similar to the already handled macOS arm64 case which uses Apple's vDSP implementation.Versions
PyTorch version: 2.12.0+debian
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux forky/sid (aarch64)
GCC version: (Debian 15.2.0-17) 15.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.42
Python version: 3.13.12 (main, Feb 4 2026, 15:06:39) [GCC 15.2.0] (64-bit runtime)
Python platform: Linux-6.12.88+deb13-arm64-aarch64-with-glibc2.42
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
...
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 4 MiB (64 instances)
L1i cache: 4 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 64 MiB (2 instances)
Versions of relevant libraries:
[pip3] numpy==2.4.4
[pip3] torch==2.12.0+debian
[pip3] torchvision==0.27.0
[conda] Could not collect