Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692

Tripton · 2023-11-28T19:33:18Z

🐛 Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device

Performing in-place zeroing on a sliced PyTorch tensor located on the MPS device leads to unexpected results, which differs from the behavior observed on CPU or CUDA device. The issue is demonstrated in the following code snippet:

import torch

# For reference we get the result on cpu
a = torch.ones(3).to('cpu')
a[1].zero_()
print(a) # -> tensor([1., 0., 1.])

# Now we do the same on MPS
b = torch.ones(3).to('mps')
b[1].zero_()
print(b) # -> tensor([1., 0., 0.], device='mps:0')

In this example, tensor a on the CPU is correctly zeroed in-place, where only index 1 is set to zero, resulting in the expected output: tensor([1., 0., 1.]). However, tensor b on the MPS device produces unexpected results. The in-place zeroing operation on the sliced tensor b correctly sets index 1 to 0, but unexpectedly also sets index 2 to 0, leading to the output: tensor([1, 0., 0.], device='mps:0').

In addition to the previously mentioned potential bug, we observed that performing in-place filling with zero values using torch.fill_ on a PyTorch tensor located on the MPS device also produces unexpected results. The issue can be seen in the following code snippet:

import torch

c = torch.ones(3).to('mps')
c[1].fill_(2)
print(c)  # -> tensor([1., 2., 1.])

d = torch.ones(3).to('mps')
d[1].fill_(0)
print(d)  # -> tensor([1., 0., 0.], device='mps:0')

It is interesting that the unexpected behavior with torch.fill_ seems to be specific to 0 and doesn't seem to occur for other values.

Note: The issue was initially noticed when incorrect results for Swin Transformer V2 on MPS were observed. Further investigation revealed that in the torchvision implementation of the shifted_window_attention function, the qkv_bias is obtained with in-place zeroing similar to the provided example.

Versions

PyTorch version: 2.1.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.1.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.27.6
Libc version: N/A

Python version: 3.8.18 (default, Sep 11 2023, 08:17:16) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.1.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] onnx==1.13.1
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchmetrics==0.10.2
[pip3] torchvision==0.16.1
[conda] nomkl 3.0 0
[conda] numpy 1.24.3 py38h1398885_0
[conda] numpy-base 1.24.3 py38h90707a3_0
[conda] numpydoc 1.5.0 py38hca03da5_0
[conda] pytorch 2.1.1 py3.8_0 pytorch
[conda] torchaudio 2.1.1 py38_cpu pytorch
[conda] torchmetrics 0.10.2 pypi_0 pypi
[conda] torchvision 0.16.1 py38_cpu pytorch

cc @ezyang @gchanan @zou3519 @kadeng @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr

The text was updated successfully, but these errors were encountered:

mikaylagawarecki · 2023-11-28T20:18:04Z

Marking as high priority for silent correctness issues

malfet · 2023-11-30T01:55:03Z

I think I know what is going on :(

pytorch/aten/src/ATen/native/mps/operations/ConstantOps.mm

Lines 103 to 105 in 34ea0a2

    
           // check if it's possible to use fillBuffer() to fill the Tensor's storage 
        
           if (value.toDouble() == 0.0 && fill_mps_tensor_(self, 0) == true) 
        
             return self;

malfet · 2023-11-30T02:09:18Z

And here is the fix, PR is coming:

diff --git a/aten/src/ATen/native/mps/operations/ConstantOps.mm b/aten/src/ATen/native/mps/operations/ConstantOps.mm
index 0a137aa86d7..52c74c3637e 100644
--- a/aten/src/ATen/native/mps/operations/ConstantOps.mm
+++ b/aten/src/ATen/native/mps/operations/ConstantOps.mm
@@ -79,7 +79,7 @@ static bool fill_mps_tensor_(Tensor& self, uint8_t value) {
   if (self.is_contiguous()) {
     MPSStream* stream = getCurrentMPSStream();
     auto storage_byte_offset = self.storage_offset() * self.itemsize();
-    stream->fill(mps::getMTLBufferStorage(self), 0, self.storage().nbytes(), storage_byte_offset);
+    stream->fill(mps::getMTLBufferStorage(self), value, self.nbytes(), storage_byte_offset);
     return true;
   }
   return false;

This fixes regression introduced by #81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes #114692

Tripton · 2023-11-30T09:32:30Z

Thanks for the quick fix! It seems to resolve my problem and also the test case looks good. However, I noticed the removal of .storage(), previously added in another PR fixing another issue. I'm not really sure, why the .storage() was added in the first place, but maybe It's necessary to fix the other issue?

This fixes regression introduced by pytorch#81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes pytorch#114692 Pull Request resolved: pytorch#114838 Approved by: https://github.com/atalman, https://github.com/kulinseth

This fixes regression introduced by #81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes #114692 Cherry pick of #114838 into release/2.1 branch Co-authored-by: Nikita Shulga <nshulga@meta.com>

This fixes regression introduced by pytorch#81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes pytorch#114692 Pull Request resolved: pytorch#114838 Approved by: https://github.com/atalman, https://github.com/kulinseth

mikaylagawarecki added module: mps Related to Apple Metal Performance Shaders framework high priority labels Nov 28, 2023

pytorch-bot bot added the triage review label Nov 28, 2023

mikaylagawarecki added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 28, 2023

mikaylagawarecki added the triage review label Nov 28, 2023

malfet self-assigned this Nov 30, 2023

malfet mentioned this issue Nov 30, 2023

[MPS] Fix out-of-bounds fill to sliced tensor #114838

Closed

pytorchmergebot closed this as completed in 1b27eae Dec 1, 2023

atalman mentioned this issue Dec 1, 2023

[MPS] Fix out-of-bounds fill to sliced tensor (#114838) #114958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692

Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692

Tripton commented Nov 28, 2023 •

edited by pytorch-bot bot

mikaylagawarecki commented Nov 28, 2023 •

edited

malfet commented Nov 30, 2023 •

edited

malfet commented Nov 30, 2023

Tripton commented Nov 30, 2023

Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692

Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692

Comments

Tripton commented Nov 28, 2023 • edited by pytorch-bot bot