-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Description
🐛 Describe the bug
On the MPS device, clamp_
and clamp
operations on tensors produce inconsistent results, unlike on the CPU device where they behave as expected. Specifically, clamp_
appears to not correctly modify the tensor in-place on MPS, leading to unexpected values in the output tensor. This issue has been observed to affect bounding box transformations in torchvision v2.
Discovery Context and Minimal Reproduction Refinement:
This bug was discovered while investigating unexpected outputs from affine transformations of bounding boxes using torchvision transforms v2. During the investigation, it was found that the clamp_bounding_boxes
function in torchvision, which is used during coordinate transformations, utilizes clamp_
. This led to the suspicion that the discrepancy between clamp_
and clamp
on MPS might be the root cause of the issue with bounding box transformations. This issue also echoes a similar problem previously encountered in YOLO, related to coordinate clamping (see ultralytics/ultralytics#5817 ).
The relevant code in torchvision that uses clamp_
within clamp_bounding_boxes
can be found here: torchvision/transforms/v2/functional/_meta.py#L249-L250 ).
To reproduce the core bug with clamp_
and clamp
, run the following code:
import torch
print(torch.__version__)
# --- Reproduction with unsliced arange ---
print("--- Unsliced arange ---")
torch.set_default_device("cpu")
cpu_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
cpu_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"CPU clamp_ result (unsliced): {cpu_unsliced_clamp_in_place}")
print(f"CPU clamp result (unsliced): {cpu_unsliced_clamp_out_place}")
torch.set_default_device("mps")
mps_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
mps_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"MPS clamp_ result (unsliced): {mps_unsliced_clamp_in_place}")
print(f"MPS clamp result (unsliced): {mps_unsliced_clamp_out_place}")
# --- Reproduction with sliced arange ---
print("\n--- Sliced arange ---")
torch.set_default_device("cpu")
cpu_sliced_clamp_in_place, cpu_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"CPU clamp_ result: {cpu_sliced_clamp_in_place}")
print(f"CPU clamp result: {cpu_sliced_clamp_out_place}")
torch.set_default_device("mps")
mps_sliced_clamp_in_place, mps_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"MPS clamp_ result: {mps_sliced_clamp_in_place}")
print(f"MPS clamp result: {mps_sliced_clamp_out_place}")
Observed results:
2.6.0
--- Unsliced arange ---
CPU clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
CPU clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
MPS clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')
MPS clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')
--- Sliced arange ---
CPU clamp_ result: tensor([0, 1, 1, 1, 1])
CPU clamp result: tensor([0, 1, 1, 1, 1])
MPS clamp_ result: tensor([0, 1, 1, 6, 8], device='mps:0')
MPS clamp result: tensor([0, 1, 1, 1, 1], device='mps:0')
As you can see from the "Unsliced arange" results, when clamp_
and clamp
are applied to an unsliced arange
tensor, both operations produce correct and consistent results across both CPU and MPS devices: the values are correctly clamped to the range [0, 1], resulting in tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
.
However, the "Sliced arange" results highlight the bug: when applied to a sliced tensor, clamp_
produces incorrect results specifically on the MPS device: tensor([0, 1, 1, 6, 8], device='mps:0')
. In contrast, clamp
correctly clamps the sliced tensor on MPS, producing the expected tensor([0, 1, 1, 1, 1], device='mps:0')
, and both clamp_
and clamp
behave correctly for sliced tensors on the CPU.
This inconsistency demonstrates that clamp_
has a bug on MPS when operating on sliced tensors, while clamp
and clamp_
on CPU, and clamp
on MPS, all function as expected.
Expected results:
Both clamp_
and clamp
should produce the same output on both CPU and MPS devices, correctly clamping the tensor values to the range [0, 1]
, regardless of whether the tensor is sliced or not. Specifically, clamp_
should modify the tensor in-place to tensor([0, 1, 1, 1, 1])
for sliced tensor and correctly clamp
(or not clamp
if already within range) for unsliced tensor on MPS, just like it does on CPU and like clamp
does on MPS.
Versions
PyTorch version: 2.6.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M3 Max
Versions of relevant libraries:
[pip3] numpy==2.1.1
[pip3] onnxruntime==1.20.1
[pip3] torch==2.6.0
[pip3] torchvision==0.21.0
[conda] numpy 1.26.4 py312h7f4fdc5_0
[conda] numpy-base 1.26.4 py312he047099_0
[conda] numpydoc 1.7.0 py312hca03da5_0