`clamp_` and `clamp` behave differently on MPS device.

### 🐛 Describe the bug

On the MPS device, `clamp_` and `clamp` operations on tensors produce inconsistent results, unlike on the CPU device where they behave as expected.  Specifically, `clamp_` appears to not correctly modify the tensor in-place on MPS, leading to unexpected values in the output tensor. This issue has been observed to affect bounding box transformations in torchvision v2.

**Discovery Context and Minimal Reproduction Refinement:**

This bug was discovered while investigating unexpected outputs from affine transformations of bounding boxes using torchvision transforms v2. During the investigation, it was found that the `clamp_bounding_boxes` function in torchvision, which is used during coordinate transformations, utilizes `clamp_`. This led to the suspicion that the discrepancy between `clamp_` and `clamp` on MPS might be the root cause of the issue with bounding box transformations. This issue also echoes a similar problem previously encountered in YOLO, related to coordinate clamping (see https://github.com/ultralytics/ultralytics/issues/5817 ).

The relevant code in torchvision that uses `clamp_` within `clamp_bounding_boxes` can be found here: [torchvision/transforms/v2/functional/_meta.py#L249-L250](https://github.com/pytorch/vision/blob/b5c7443ec28292627351dde53dcd2613fedf1cdb/torchvision/transforms/v2/functional/_meta.py#L249-L250) ).

To reproduce the core bug with `clamp_` and `clamp`, run the following code:

```python
import torch

print(torch.__version__)

# --- Reproduction with unsliced arange ---
print("--- Unsliced arange ---")
torch.set_default_device("cpu")
cpu_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
cpu_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"CPU clamp_ result (unsliced): {cpu_unsliced_clamp_in_place}")
print(f"CPU clamp result (unsliced): {cpu_unsliced_clamp_out_place}")

torch.set_default_device("mps")
mps_unsliced_clamp_in_place = torch.arange(10).clamp_(0, 1)
mps_unsliced_clamp_out_place = torch.arange(10).clamp(0, 1)
print(f"MPS clamp_ result (unsliced): {mps_unsliced_clamp_in_place}")
print(f"MPS clamp result (unsliced): {mps_unsliced_clamp_out_place}")

# --- Reproduction with sliced arange ---
print("\n--- Sliced arange ---")
torch.set_default_device("cpu")
cpu_sliced_clamp_in_place, cpu_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"CPU clamp_ result: {cpu_sliced_clamp_in_place}")
print(f"CPU clamp result: {cpu_sliced_clamp_out_place}")

torch.set_default_device("mps")
mps_sliced_clamp_in_place, mps_sliced_clamp_out_place = torch.arange(10)[::2].clamp_(0, 1), torch.arange(10)[::2].clamp(0, 1)
print(f"MPS clamp_ result: {mps_sliced_clamp_in_place}")
print(f"MPS clamp result: {mps_sliced_clamp_out_place}")
```

**Observed results:**

```
2.6.0
--- Unsliced arange ---
CPU clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
CPU clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
MPS clamp_ result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')
MPS clamp result (unsliced): tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='mps:0')

--- Sliced arange ---
CPU clamp_ result: tensor([0, 1, 1, 1, 1])
CPU clamp result: tensor([0, 1, 1, 1, 1])
MPS clamp_ result: tensor([0, 1, 1, 6, 8], device='mps:0')
MPS clamp result: tensor([0, 1, 1, 1, 1], device='mps:0')
```

As you can see from the "Unsliced arange" results, when `clamp_` and `clamp` are applied to an **unsliced** `arange` tensor, both operations produce **correct and consistent results across both CPU and MPS devices**:  the values are correctly clamped to the range [0, 1], resulting in `tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])`.

However, the "Sliced arange" results highlight the bug: when applied to a **sliced tensor**,  `clamp_` produces **incorrect results *specifically on the MPS device***: `tensor([0, 1, 1, 6, 8], device='mps:0')`. In contrast, `clamp` correctly clamps the sliced tensor on MPS, producing the expected `tensor([0, 1, 1, 1, 1], device='mps:0')`, and both `clamp_` and `clamp` behave correctly for sliced tensors on the CPU.

This inconsistency demonstrates that `clamp_` has a bug on MPS **when operating on sliced tensors**, while `clamp` and `clamp_` on CPU, and `clamp` on MPS, all function as expected.

**Expected results:**

Both `clamp_` and `clamp` should produce the same output on both CPU and MPS devices, correctly clamping the tensor values to the range `[0, 1]`, regardless of whether the tensor is sliced or not. Specifically, `clamp_` should modify the tensor in-place to `tensor([0, 1, 1, 1, 1])` for sliced tensor and correctly `clamp` (or not `clamp` if already within range) for unsliced tensor on MPS, just like it does on CPU and like `clamp` does on MPS.

### Versions

PyTorch version: 2.6.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Max

Versions of relevant libraries:
[pip3] numpy==2.1.1
[pip3] onnxruntime==1.20.1
[pip3] torch==2.6.0
[pip3] torchvision==0.21.0
[conda] numpy                     1.26.4          py312h7f4fdc5_0  
[conda] numpy-base                1.26.4          py312he047099_0  
[conda] numpydoc                  1.7.0           py312hca03da5_0  

cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`clamp_` and `clamp` behave differently on MPS device. #147510

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

clamp_ and clamp behave differently on MPS device. #147510

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`clamp_` and `clamp` behave differently on MPS device. #147510