Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

unary ops on strided int tensors fails on MPS #105284

Closed
Mr4k opened this issue Jul 16, 2023 · 4 comments
Closed

unary ops on strided int tensors fails on MPS #105284

Mr4k opened this issue Jul 16, 2023 · 4 comments
Assignees
Labels
module: mps Related to Apple Metal Performance Shaders framework module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Mr4k
Copy link
Contributor

Mr4k commented Jul 16, 2023

馃悰 Describe the bug

I encountered this problem when I pulled main and saw a new failing test around torch.vander in my pr #104688 (the pr technically enables torch.vander as a side effect which is why I was testing it). However everything below is about main to isolate it from my changes.

import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)

currently results in:

Python 3.11.3 (main, May 15 2023, 18:01:31) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)
>>> x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
>>> x.unsqueeze(-1).expand(x.shape + (3,)).cumsum(-1)
/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si16 but received si64'
zsh: abort      python

The same thing happens with other unary operators as well:

import torch
x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
x.unsqueeze(-1).expand(x.shape + (3,)).logit(-1)

results in

Python 3.11.3 (main, May 15 2023, 18:01:31) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.tensor([1, 2, 3, 5, 6], device="mps", dtype=torch.int16)
>>> x.unsqueeze(-1).expand(x.shape + (3,)).logit(-1)
/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si16 but received f32'
zsh: abort      python

I believe this error is due to this line in this recent pr. By changing the line:

self_ = at::empty_like(output);

to

self_ = at::empty_like(self);

I resolved this issue. The reason this is a problem is because the logic in cumsum currently assumes the input scalar type is represented by the scalar type of the input tensor not the output tensor. See here. This logic makes me think that the previously mentioned line:

self_ = at::empty_like(output);

is incorrect. However I don't know the intention behind adding that line and if it is correct I can just alter the cumsum logic in my pr #104688. So I wanted to file a separate issue to discuss this specifically without tying it to #104688

Versions

Collecting environment information...
PyTorch version: 2.1.0.dev20230526
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: version 3.24.2
Libc version: N/A

Python version: 3.9.12 (main, Apr 5 2022, 01:52:34) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230526
[pip3] torchaudio==2.1.0.dev20230527
[pip3] torchvision==0.16.0.dev20230526
[conda] numpy 1.24.3 pypi_0 pypi
[conda] torch 2.1.0.dev20230526 pypi_0 pypi
[conda] torchaudio 2.1.0.dev20230527 pypi_0 pypi
[conda] torchvision 0.16.0.dev20230526 pypi_0 pypi

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

@Mr4k Mr4k changed the title mps device torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type Mps device operation torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type Jul 16, 2023
@Mr4k
Copy link
Contributor Author

Mr4k commented Jul 16, 2023

Okay it does look like resolving this issue is critical to merging #104688 because the test runner crashes when hitting this error so I can't just add variants of linalg.vander to the xfail list

@Mr4k Mr4k changed the title Mps device operation torch.cumsum is broken for non dense tensors because unary op casts to output type instead of input type MPS device unary operators are broken for non dense tensors because unary op casts to output type instead of input type Jul 16, 2023
@mikaylagawarecki mikaylagawarecki added module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 17, 2023
@malfet malfet self-assigned this Jul 17, 2023
@malfet malfet changed the title MPS device unary operators are broken for non dense tensors because unary op casts to output type instead of input type cumsum of strides int tensor fails on MPS Jul 17, 2023
@malfet
Copy link
Contributor

malfet commented Jul 17, 2023

@Mr4k changed title a bit, as it works fine for regular unary ops, so cumsum seems to be the only problematic one

And here is a one line reproducer:

% python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].cumsum(0))"

@Mr4k
Copy link
Contributor Author

Mr4k commented Jul 18, 2023

@malfet thanks for checking this out and thanks for the elegant repro and issue naming help! Did you get a chance to try my other repo. I tried pulling the latest main and I got the following errors for non cumsum unary operators using your repro. It's possible that my pytorch build process is not correct and these other repros are a side effect of that but it's seemed to have worked so far:

python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].exp())" 

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'
python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].logit())" 

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'
python -c "import torch;print(torch.arange(64, device='mps',dtype=torch.int32).reshape(8, 8)[:,::2].sqrt())" 

gives me:

/AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1377: failed assertion `Incompatible element type for parameter at index 0, mlir module expected element type si32 but received f32'

version info:

torch==2.1.0a0+git2ba9b56

If these do repro I'm curious why the tensor is set to the output scalar type instead of input and would be happy to make a pr changing that tomorrow if it's a good idea. However I may be way off base as I'm pretty new to the code and this would be a good learning experience!

@Mr4k Mr4k changed the title cumsum of strides int tensor fails on MPS unary ops on strided int tensors fails on MPS Jul 19, 2023
@malfet malfet added the module: regression It used to work, and now it doesn't label Jul 19, 2023
@malfet
Copy link
Contributor

malfet commented Jul 19, 2023

@Mr4k thank you for the investigation, looks like I've found the problem: there is a conflict between reshaping tensor to the output type and casting input tensor to a different dtype...

diff --git a/aten/src/ATen/native/mps/operations/UnaryOps.mm b/aten/src/ATen/native/mps/operations/UnaryOps.mm
index 2bbd238b1e7..431d97f0836 100644
--- a/aten/src/ATen/native/mps/operations/UnaryOps.mm
+++ b/aten/src/ATen/native/mps/operations/UnaryOps.mm
@@ -84,7 +84,7 @@ void unary_op(const Tensor& self,
     // If self is densely mapped in storage, create a dense output-like representation
     at::Tensor self_;
     if (!is_dense_in_storage(self)) {
-      self_ = at::empty_like(output);
+      self_ = at::empty_like(output, self.scalar_type());
       mps::mps_copy_(self_, self, false);
     } else {
       self_ = self;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mps Related to Apple Metal Performance Shaders framework module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants