unary ops on strided int tensors fails on MPS #105284

Mr4k · 2023-07-16T03:42:07Z

🐛 Describe the bug

I encountered this problem when I pulled main and saw a new failing test around torch.vander in my pr #104688 (the pr technically enables torch.vander as a side effect which is why I was testing it). However everything below is about main to isolate it from my changes.

import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)

currently results in:

Python 3.11.3 (main, May 15 2023, 18:01:31) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)
>>> x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
>>> x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)
/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si16 but received si64'
zsh: abort      python

The same thing happens with other unary operators as well:

import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).logit(-1)

results in

Python 3.11.3 (main, May 15 2023, 18:01:31) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
>>> x.unsqueeze(-1).expand(x.shape + (3,)).logit(-1)
/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si16 but received f32'
zsh: abort      python

I believe this error is due to this line in this recent pr. By changing the line:

self_ = at::empty_like(output);

to

self_ = at::empty_like(self);

I resolved this issue. The reason this is a problem is because the logic in cumsum currently assumes the input scalar type is represented by the scalar type of the input tensor not the output tensor. See here. This logic makes me think that the previously mentioned line:

self_ = at::empty_like(output);

is incorrect. However I don't know the intention behind adding that line and if it is correct I can just alter the cumsum logic in my pr #104688. So I wanted to file a separate issue to discuss this specifically without tying it to #104688

Versions

Collecting environment information...
PyTorch version: 2.1.0.dev20230526
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: version 3.24.2
Libc version: N/A

Python version: 3.9.12 (main, Apr 5 2022, 01:52:34) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230526
[pip3] torchaudio==2.1.0.dev20230527
[pip3] torchvision==0.16.0.dev20230526
[conda] numpy 1.24.3 pypi_0 pypi
[conda] torch 2.1.0.dev20230526 pypi_0 pypi
[conda] torchaudio 2.1.0.dev20230527 pypi_0 pypi
[conda] torchvision 0.16.0.dev20230526 pypi_0 pypi

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

The text was updated successfully, but these errors were encountered:

Mr4k · 2023-07-16T18:15:10Z

Okay it does look like resolving this issue is critical to merging #104688 because the test runner crashes when hitting this error so I can't just add variants of linalg.vander to the xfail list

malfet · 2023-07-17T23:55:36Z

@Mr4k changed title a bit, as it works fine for regular unary ops, so cumsum seems to be the only problematic one

And here is a one line reproducer:

% python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].cumsum(0))"

Mr4k · 2023-07-18T04:37:51Z

@malfet thanks for checking this out and thanks for the elegant repro and issue naming help! Did you get a chance to try my other repo. I tried pulling the latest main and I got the following errors for non cumsum unary operators using your repro. It's possible that my pytorch build process is not correct and these other repros are a side effect of that but it's seemed to have worked so far:

python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].exp())"

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'

python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].logit())"

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'

python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].sqrt())"

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'

version info:

torch==2.1.0a0+git2ba9b56

If these do repro I'm curious why the tensor is set to the output scalar type instead of input and would be happy to make a pr changing that tomorrow if it's a good idea. However I may be way off base as I'm pretty new to the code and this would be a good learning experience!

malfet · 2023-07-19T01:34:28Z

@Mr4k thank you for the investigation, looks like I've found the problem: there is a conflict between reshaping tensor to the output type and casting input tensor to a different dtype...

diff --git a/aten/src/ATen/native/mps/operations/UnaryOps.mm b/aten/src/ATen/native/mps/operations/UnaryOps.mm
index 2bbd238b1e7..431d97f0836 100644
--- a/aten/src/ATen/native/mps/operations/UnaryOps.mm
+++ b/aten/src/ATen/native/mps/operations/UnaryOps.mm
@@ -84,7 +84,7 @@ void unary_op(const Tensor& self,
     // If self is densely mapped in storage, create a dense output-like representation
     at::Tensor self_;
     if (!is_dense_in_storage(self)) {
-      self_ = at::empty_like(output);
+      self_ = at::empty_like(output, self.scalar_type());
       mps::mps_copy_(self_, self, false);
     } else {
       self_ = self;

Mr4k changed the title ~~mps device torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type~~ Mps device operation torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type Jul 16, 2023

Mr4k changed the title ~~Mps device operation torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type~~ MPS device unary operators are broken for non dense tensors because unary op casts to output type instead of input type Jul 16, 2023

mikaylagawarecki added module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 17, 2023

malfet self-assigned this Jul 17, 2023

malfet changed the title ~~MPS device unary operators are broken for non dense tensors because unary op casts to output type instead of input type~~ cumsum of strides int tensor fails on MPS Jul 17, 2023

Mr4k changed the title ~~cumsum of strides int tensor fails on MPS~~ unary ops on strided int tensors fails on MPS Jul 19, 2023

malfet added the module: regression It used to work, and now it doesn't label Jul 19, 2023

malfet mentioned this issue Jul 19, 2023

Fix mps unary op issue on non densely stored tensors #105512

Closed

pytorchmergebot closed this as completed in d2c24ec Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unary ops on strided int tensors fails on MPS #105284

unary ops on strided int tensors fails on MPS #105284

Mr4k commented Jul 16, 2023 •

edited by pytorch-bot bot

Mr4k commented Jul 16, 2023

malfet commented Jul 17, 2023 •

edited

Mr4k commented Jul 18, 2023 •

edited

malfet commented Jul 19, 2023 •

edited

unary ops on strided int tensors fails on MPS #105284

unary ops on strided int tensors fails on MPS #105284

Comments

Mr4k commented Jul 16, 2023 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

Mr4k commented Jul 16, 2023

malfet commented Jul 17, 2023 • edited

Mr4k commented Jul 18, 2023 • edited

malfet commented Jul 19, 2023 • edited

Mr4k commented Jul 16, 2023 •

edited by pytorch-bot bot

malfet commented Jul 17, 2023 •

edited

Mr4k commented Jul 18, 2023 •

edited

malfet commented Jul 19, 2023 •

edited