MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

okuvshynov · 2023-10-10T19:35:53Z

🐛 Describe the bug

At some point, most likely after macOS update to Sonoma, torch mps backend started utilizing ANE instead of GPU for matrix multiplication in fp16.

Unfortunately, for large enough matrices it fails:

import torch

dim = 2 ** 14

torch.manual_seed(123)

# always uses GPU
a32 = torch.rand((dim, dim)).to('mps').to(torch.float32)
b32 = torch.rand((dim, dim)).to('mps').to(torch.float32)
c32 = a32 @ b32

# started using ANE
a16 = torch.rand((dim, dim)).to('mps').to(torch.float16)
b16 = torch.rand((dim, dim)).to('mps').to(torch.float16)
c16 = a16 @ b16 # <----- fails here

print(torch.mean(torch.abs(c32 - c16.to(torch.float32))))

I have two apple machines, one updated to Sonoma and one running Ventura.
Sonoma version fails fp16 multiplication:

2023-10-10 15:21:03.226 Python[19282:1283545] ANE Evaluation Error = Error Domain=com.apple.appleneuralengine Code=3 "processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out}
tensor(nan, device='mps:0')

fp32 succeeds in both versions.

Here're some screenshots illustrating that ANE being utilized in sonoma (but it's also evident from the error above).

Ventura - GPU only:

Sonoma - tries using ANE.

Does pytorch have any control over these details?

Versions

Uses ANE and fails at

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.0 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.20.20210516-g4997535
Libc version: N/A

Python version: 3.9.6 (default, Aug 11 2023, 19:44:49)  [Clang 15.0.0 (clang-1500.0.40.1)] (64-bit runtime)
Python platform: macOS-14.0-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==2.0.1
[conda] Could not collect

Uses GPU and succeeds

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.4.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.25.1
Libc version: N/A

Python version: 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.4
[pip3] torch==2.0.1
[pip3] torchvision==0.15.2
[pip3] torchviz==0.0.2
[conda] numpy                     1.23.4          py310hb93e574_0
[conda] numpy-base                1.23.4          py310haf87e8b_0
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi
[conda] torchviz                  0.0.2                    pypi_0    pypi

cc @seemethere @malfet @osalpekar @atalman @kulinseth @albanD @DenisVieriu97 @razarmehr @abhudev

The text was updated successfully, but these errors were encountered:

malfet · 2023-10-11T01:14:47Z

[Edit] Can reproduce it on M1 using torch-2.0.1 build, but no warning using torch-2.1.0:

% pip install torch==2.1.0;python -c "import torch;dim=2**14;a=torch.rand((dim, dim), device='mps', dtype=torch.float16);print(torch.mm(a, a).max(), torch.__version__)"
...
Successfully installed torch-2.1.0
tensor(4244., device='mps:0', dtype=torch.float16) 2.1.0
% pip install torch==2.0.1;python -c "import torch;dim=2**14;a=torch.rand((dim, dim), device='mps', dtype=torch.float16);print(torch.mm(a, a).max(), torch.__version__)"
Collecting torch==2.0.1
...
Successfully installed torch-2.0.1
2023-10-10 18:25:51.991 python[96197:10948082] ANE Evaluation Error = Error Domain=com.apple.appleneuralengine Code=3 "processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out}
tensor(4240., device='mps:0', dtype=torch.float16) 2.0.1

But also, error does not seem to be fatal, does it?

kulinseth · 2023-10-11T06:59:30Z

Thanks @malfet and @okuvshynov for the issue. This feature is new and not fully tested, so it was disabled in 2.1.0. We are going to do more testing and enable it in Inference mode. Also as you noticed FP32 runs with full precision on GPU by default.

okuvshynov · 2023-10-11T11:06:19Z

awesome, thank you, upgraded to 2.1.0

okuvshynov · 2023-10-11T15:34:41Z

ok to close?

malfet added module: binaries Anything related to official binaries that we release to users and removed needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Oct 11, 2023

malfet removed the module: binaries Anything related to official binaries that we release to users label Oct 11, 2023

okuvshynov closed this as completed Oct 15, 2023

okuvshynov mentioned this issue Oct 26, 2023

finetune.py segmentation fault okuvshynov/slowllama#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

okuvshynov commented Oct 10, 2023 •

edited by pytorch-bot bot

malfet commented Oct 11, 2023 •

edited

kulinseth commented Oct 11, 2023

okuvshynov commented Oct 11, 2023

okuvshynov commented Oct 11, 2023

MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

Comments

okuvshynov commented Oct 10, 2023 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

malfet commented Oct 11, 2023 • edited

kulinseth commented Oct 11, 2023

okuvshynov commented Oct 11, 2023

okuvshynov commented Oct 11, 2023

okuvshynov commented Oct 10, 2023 •

edited by pytorch-bot bot

malfet commented Oct 11, 2023 •

edited