[MPS][Inductor] AdaptiveMaxPool{1,2}d produces incorrect results (numerical correctness)

### 🐛 Describe the bug

When using `torch.compile(backend="inductor")` on MPS devices, `F.adaptive_max_pool1d` and `F.adaptive_max_pool2d` produce incorrect results with significant numerical divergence (diff > 1.0) compared to Eager mode.

This behavior is observed in both **1D** and **2D** cases.

### Reproduce script
```python
import torch
import torch.nn.functional as F


def fn(x):
    return F.adaptive_max_pool1d(x, output_size=3)


x = torch.randn(4, 10, 8, device="mps")

# def fn(x):
#     return F.adaptive_max_pool2d(x, output_size=(3, 3))

# x = torch.randn(4, 10, 8, 8,device='mps')

eager_out = fn(x)

opt_fn = torch.compile(fn, backend="inductor")

try:
    compiled_out = opt_fn(x)
    diff = (eager_out - compiled_out).abs().max().item()
    print(f"Max Difference: {diff}")

except Exception as e:
    print(f"Crashed during execution: {e}")
```

output
```bash
Max Difference: 2.4042608737945557
```

### What's more
I found 
https://github.com/pytorch/pytorch/blob/d3944da7f71207ce254cdfd66ad04b4ffe9a2e96/test/inductor/test_torchinductor.py#L5030-L5065

### Versions

PyTorch version: 2.10.0.dev20251202
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 26.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.4.4.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ] (64-bit runtime)
Python platform: macOS-26.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M4

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen @chauhang @penguinwu 

	@xfail_if_mps # Non-divisible input sizes are not implemented on MPS device
	def test_adaptive_avg_pool2d2(self):
	# Big kernel size, use fallback
	def fn(x):
	return aten._adaptive_avg_pool2d(x, (4, 4))

	torch._inductor.metrics.generated_kernel_count = 0
	self.common(
	fn,
	(torch.randn(2, 4, 21, 21),),
	check_lowp=False,
	)
	assertGeneratedKernelCountEqual(self, 0)

	@xfail_if_mps
	@skip_if_gpu_halide # slow
	def test_adaptive_max_pool2d1(self):
	def fn(x):
	return aten.adaptive_max_pool2d(x, (6, 6))

	self.common(
	fn,
	(torch.randn(2, 4, 16, 16),),
	check_lowp=False,
	)

	self.common(
	fn,
	(torch.randn(2, 4, 3, 3),),
	)

	# no-op case
	self.common(
	fn,
	(torch.randn(2, 4, 6, 6),),
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPS][Inductor] AdaptiveMaxPool{1,2}d produces incorrect results (numerical correctness) #169738

🐛 Describe the bug

Reproduce script

What's more

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MPS][Inductor] AdaptiveMaxPool{1,2}d produces incorrect results (numerical correctness) #169738

Description

🐛 Describe the bug

Reproduce script

What's more

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions