Skip to content

clamp_ and clamp behave differently on MPS device. #147510

@ornew

Description

@ornew

🐛 Describe the bug

On the MPS device, clamp_ and clamp operations on tensors produce inconsistent results, unlike on the CPU device where they behave as expected. Specifically, clamp_ appears to not correctly modify the tensor in-place on MPS, leading to unexpected values in the output tensor. This issue has been observed to affect bounding box transformations in torchvision v2.

Discovery Context and Minimal Reproduction Refinement:

This bug was discovered while investigating unexpected outputs from affine transformations of bounding boxes using torchvision transforms v2. During the investigation, it was found that the clamp_bounding_boxes function in torchvision, which is used during coordinate transformations, utilizes clamp_. This led to the suspicion that the discrepancy between clamp_ and clamp on MPS might be the root cause of the issue with bounding box transformations. This issue also echoes a similar problem previously encountered in YOLO, related to coordinate clamping (see ultralytics/ultralytics#5817 ).

The relevant code in torchvision that uses clamp_ within clamp_bounding_boxes can be found here: torchvision/transforms/v2/functional/_meta.py#L249-L250 ).

To reproduce the core bug with clamp_ and clamp, run the following code:

import torch

print(torch.__version__)

# --- Reproduction with unsliced arange ---
print("--- Unsliced arange ---")
torch.set_default_device("cpu")
cpu_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
cpu_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"CPU clamp_ result (unsliced): {cpu_unsliced_clamp_in_place}")
print(f"CPU clamp result (unsliced): {cpu_unsliced_clamp_out_place}")

torch.set_default_device("mps")
mps_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
mps_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"MPS clamp_ result (unsliced): {mps_unsliced_clamp_in_place}")
print(f"MPS clamp result (unsliced): {mps_unsliced_clamp_out_place}")

# --- Reproduction with sliced arange ---
print("\n--- Sliced arange ---")
torch.set_default_device("cpu")
cpu_sliced_clamp_in_place, cpu_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"CPU clamp_ result: {cpu_sliced_clamp_in_place}")
print(f"CPU clamp result: {cpu_sliced_clamp_out_place}")

torch.set_default_device("mps")
mps_sliced_clamp_in_place, mps_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"MPS clamp_ result: {mps_sliced_clamp_in_place}")
print(f"MPS clamp result: {mps_sliced_clamp_out_place}")

Observed results:

2.6.0
--- Unsliced arange ---
CPU clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
CPU clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
MPS clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')
MPS clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')

--- Sliced arange ---
CPU clamp_ result: tensor([0, 1, 1, 1, 1])
CPU clamp result: tensor([0, 1, 1, 1, 1])
MPS clamp_ result: tensor([0, 1, 1, 6, 8], device='mps:0')
MPS clamp result: tensor([0, 1, 1, 1, 1], device='mps:0')

As you can see from the "Unsliced arange" results, when clamp_ and clamp are applied to an unsliced arange tensor, both operations produce correct and consistent results across both CPU and MPS devices: the values are correctly clamped to the range [0, 1], resulting in tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1]).

However, the "Sliced arange" results highlight the bug: when applied to a sliced tensor, clamp_ produces incorrect results specifically on the MPS device: tensor([0, 1, 1, 6, 8], device='mps:0'). In contrast, clamp correctly clamps the sliced tensor on MPS, producing the expected tensor([0, 1, 1, 1, 1], device='mps:0'), and both clamp_ and clamp behave correctly for sliced tensors on the CPU.

This inconsistency demonstrates that clamp_ has a bug on MPS when operating on sliced tensors, while clamp and clamp_ on CPU, and clamp on MPS, all function as expected.

Expected results:

Both clamp_ and clamp should produce the same output on both CPU and MPS devices, correctly clamping the tensor values to the range [0, 1], regardless of whether the tensor is sliced or not. Specifically, clamp_ should modify the tensor in-place to tensor([0, 1, 1, 1, 1]) for sliced tensor and correctly clamp (or not clamp if already within range) for unsliced tensor on MPS, just like it does on CPU and like clamp does on MPS.

Versions

PyTorch version: 2.6.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Max

Versions of relevant libraries:
[pip3] numpy==2.1.1
[pip3] onnxruntime==1.20.1
[pip3] torch==2.6.0
[pip3] torchvision==0.21.0
[conda] numpy 1.26.4 py312h7f4fdc5_0
[conda] numpy-base 1.26.4 py312he047099_0
[conda] numpydoc 1.7.0 py312hca03da5_0

cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: correctness (silent)issue that returns an incorrect result silentlymodule: mpsRelated to Apple Metal Performance Shaders frameworktriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions