Skip to content

torch.triu() may returns wrong values using MPS #100005

@TheCluster

Description

@TheCluster

🐛 Describe the bug

Using MPS, torch.triu() may returns tensor with incorrect values.

It works as expected on CPU:

>>> import torch
>>> 
>>> mask = torch.full((1, 1, 10, 10), float("-inf"), device=torch.device("cpu"))
>>> result = torch.triu(mask, diagonal=1)
>>> print(result)
tensor([[[[0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., 0., -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]])

But on MPS, it return tensor with NaN:

>>> import torch
>>> 
>>> mask = torch.full((1, 1, 10, 10), float("-inf"), device=torch.device("mps"))
>>> result = torch.triu(mask, diagonal=1)
>>> print(result)
tensor([[[[nan, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [nan, nan, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [nan, nan, nan, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [nan, nan, nan, nan, -inf, -inf, -inf, -inf, -inf, -inf],
          [nan, nan, nan, nan, nan, -inf, -inf, -inf, -inf, -inf],
          [nan, nan, nan, nan, nan, nan, -inf, -inf, -inf, -inf],
          [nan, nan, nan, nan, nan, nan, nan, -inf, -inf, -inf],
          [nan, nan, nan, nan, nan, nan, nan, nan, -inf, -inf],
          [nan, nan, nan, nan, nan, nan, nan, nan, nan, -inf],
          [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]]]],
       device='mps:0')

Also, it works correct on CUDA:

>>> import torch
>>> 
>>> mask = torch.full((1, 1, 10, 10), float("-inf"), device=torch.device("cuda"))
>>> result = torch.triu(mask, diagonal=1)
>>> print(result)
tensor([[[[0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., -inf, -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., -inf, -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., -inf, -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., -inf, -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., 0., -inf],
          [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]], device='cuda:0')

This behavior is reproducible on both pytorch 2.0 and pytorch 1.12.x

Versions

PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.26.3
Libc version: N/A

Python version: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:12:31) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Ultra

Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==2.0.0
[conda] numpy 1.24.2 pypi_0 pypi
[conda] pytorch 2.0.0 py3.10_0 pytorch

cc @ezyang @gchanan @zou3519 @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: NaNs and InfsProblems related to NaN and Inf handling in floating pointmodule: correctness (silent)issue that returns an incorrect result silentlymodule: mpsRelated to Apple Metal Performance Shaders frameworktriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions