-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device #114692
Comments
Marking as high priority for silent correctness issues |
I think I know what is going on :( pytorch/aten/src/ATen/native/mps/operations/ConstantOps.mm Lines 103 to 105 in 34ea0a2
|
And here is the fix, PR is coming: diff --git a/aten/src/ATen/native/mps/operations/ConstantOps.mm b/aten/src/ATen/native/mps/operations/ConstantOps.mm
index 0a137aa86d7..52c74c3637e 100644
--- a/aten/src/ATen/native/mps/operations/ConstantOps.mm
+++ b/aten/src/ATen/native/mps/operations/ConstantOps.mm
@@ -79,7 +79,7 @@ static bool fill_mps_tensor_(Tensor& self, uint8_t value) {
if (self.is_contiguous()) {
MPSStream* stream = getCurrentMPSStream();
auto storage_byte_offset = self.storage_offset() * self.itemsize();
- stream->fill(mps::getMTLBufferStorage(self), 0, self.storage().nbytes(), storage_byte_offset);
+ stream->fill(mps::getMTLBufferStorage(self), value, self.nbytes(), storage_byte_offset);
return true;
}
return false; |
This fixes regression introduced by #81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes #114692
Thanks for the quick fix! It seems to resolve my problem and also the test case looks good. However, I noticed the removal of |
This fixes regression introduced by pytorch#81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes pytorch#114692 Pull Request resolved: pytorch#114838 Approved by: https://github.com/atalman, https://github.com/kulinseth
This fixes regression introduced by #81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes #114692 Cherry pick of #114838 into release/2.1 branch Co-authored-by: Nikita Shulga <nshulga@meta.com>
This fixes regression introduced by pytorch#81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes pytorch#114692 Pull Request resolved: pytorch#114838 Approved by: https://github.com/atalman, https://github.com/kulinseth
This fixes regression introduced by pytorch#81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes pytorch#114692 Pull Request resolved: pytorch#114838 Approved by: https://github.com/atalman, https://github.com/kulinseth
馃悰 Unexpected Results of In-Place Zeroing on Sliced Tensor with MPS Device
Performing in-place zeroing on a sliced PyTorch tensor located on the MPS device leads to unexpected results, which differs from the behavior observed on CPU or CUDA device. The issue is demonstrated in the following code snippet:
In this example, tensor
a
on the CPU is correctly zeroed in-place, where only index 1 is set to zero, resulting in the expected output: tensor([1., 0., 1.]). However, tensorb
on the MPS device produces unexpected results. The in-place zeroing operation on the sliced tensorb
correctly sets index 1 to0
, but unexpectedly also sets index 2 to0
, leading to the output: tensor([1, 0., 0.], device='mps:0').In addition to the previously mentioned potential bug, we observed that performing in-place filling with zero values using torch.fill_ on a PyTorch tensor located on the MPS device also produces unexpected results. The issue can be seen in the following code snippet:
It is interesting that the unexpected behavior with torch.fill_ seems to be specific to
0
and doesn't seem to occur for other values.Note: The issue was initially noticed when incorrect results for Swin Transformer V2 on MPS were observed. Further investigation revealed that in the torchvision implementation of the
shifted_window_attention
function, the qkv_bias is obtained with in-place zeroing similar to the provided example.Versions
PyTorch version: 2.1.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.1.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.27.6
Libc version: N/A
Python version: 3.8.18 (default, Sep 11 2023, 08:17:16) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.1.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1 Pro
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] onnx==1.13.1
[pip3] torch==2.1.1
[pip3] torchaudio==2.1.1
[pip3] torchmetrics==0.10.2
[pip3] torchvision==0.16.1
[conda] nomkl 3.0 0
[conda] numpy 1.24.3 py38h1398885_0
[conda] numpy-base 1.24.3 py38h90707a3_0
[conda] numpydoc 1.5.0 py38hca03da5_0
[conda] pytorch 2.1.1 py3.8_0 pytorch
[conda] torchaudio 2.1.1 py38_cpu pytorch
[conda] torchmetrics 0.10.2 pypi_0 pypi
[conda] torchvision 0.16.1 py38_cpu pytorch
cc @ezyang @gchanan @zou3519 @kadeng @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr
The text was updated successfully, but these errors were encountered: