Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS backend started using neural engine and fails to multiply matrices in fp16 #110975

Closed
okuvshynov opened this issue Oct 10, 2023 · 4 comments
Closed
Labels
module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@okuvshynov
Copy link

okuvshynov commented Oct 10, 2023

馃悰 Describe the bug

At some point, most likely after macOS update to Sonoma, torch mps backend started utilizing ANE instead of GPU for matrix multiplication in fp16.

Unfortunately, for large enough matrices it fails:

import torch

dim = 2 ** 14

torch.manual_seed(123)

# always uses GPU
a32 = torch.rand((dim, dim)).to('mps').to(torch.float32)
b32 = torch.rand((dim, dim)).to('mps').to(torch.float32)
c32 = a32 @ b32

# started using ANE
a16 = torch.rand((dim, dim)).to('mps').to(torch.float16)
b16 = torch.rand((dim, dim)).to('mps').to(torch.float16)
c16 = a16 @ b16 # <----- fails here

print(torch.mean(torch.abs(c32 - c16.to(torch.float32))))

I have two apple machines, one updated to Sonoma and one running Ventura.
Sonoma version fails fp16 multiplication:

2023-10-10 15:21:03.226 Python[19282:1283545] ANE Evaluation Error = Error Domain=com.apple.appleneuralengine Code=3 "processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out}
tensor(nan, device='mps:0')

fp32 succeeds in both versions.

Here're some screenshots illustrating that ANE being utilized in sonoma (but it's also evident from the error above).

Ventura - GPU only:
sonoma_fail

Sonoma - tries using ANE.
ventura_success

Does pytorch have any control over these details?

Versions

Uses ANE and fails at

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.0 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.0.40.1)
CMake version: version 3.20.20210516-g4997535
Libc version: N/A

Python version: 3.9.6 (default, Aug 11 2023, 19:44:49)  [Clang 15.0.0 (clang-1500.0.40.1)] (64-bit runtime)
Python platform: macOS-14.0-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==2.0.1
[conda] Could not collect

Uses GPU and succeeds

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.4.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.25.1
Libc version: N/A

Python version: 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.4
[pip3] torch==2.0.1
[pip3] torchvision==0.15.2
[pip3] torchviz==0.0.2
[conda] numpy                     1.23.4          py310hb93e574_0
[conda] numpy-base                1.23.4          py310haf87e8b_0
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi
[conda] torchviz                  0.0.2                    pypi_0    pypi

cc @seemethere @malfet @osalpekar @atalman @kulinseth @albanD @DenisVieriu97 @razarmehr @abhudev

@malfet malfet added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Oct 11, 2023
@malfet
Copy link
Contributor

malfet commented Oct 11, 2023

[Edit] Can reproduce it on M1 using torch-2.0.1 build, but no warning using torch-2.1.0:

% pip install torch==2.1.0;python -c "import torch;dim=2**14;a=torch.rand((dim, dim), device='mps', dtype=torch.float16);print(torch.mm(a, a).max(), torch.__version__)"
...
Successfully installed torch-2.1.0
tensor(4244., device='mps:0', dtype=torch.float16) 2.1.0
% pip install torch==2.0.1;python -c "import torch;dim=2**14;a=torch.rand((dim, dim), device='mps', dtype=torch.float16);print(torch.mm(a, a).max(), torch.__version__)"
Collecting torch==2.0.1
...
Successfully installed torch-2.0.1
2023-10-10 18:25:51.991 python[96197:10948082] ANE Evaluation Error = Error Domain=com.apple.appleneuralengine Code=3 "processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out" UserInfo={NSLocalizedDescription=processRequest:model:qos:qIndex:modelStringID:options:error:: ANEProgramProcessRequestDirect() Failed with status=0xf : statusType=0x11: Program Inference timeout: timed out}
tensor(4240., device='mps:0', dtype=torch.float16) 2.0.1

But also, error does not seem to be fatal, does it?

@malfet malfet added module: binaries Anything related to official binaries that we release to users and removed needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Oct 11, 2023
@kulinseth
Copy link
Collaborator

Thanks @malfet and @okuvshynov for the issue. This feature is new and not fully tested, so it was disabled in 2.1.0. We are going to do more testing and enable it in Inference mode. Also as you noticed FP32 runs with full precision on GPU by default.

@okuvshynov
Copy link
Author

awesome, thank you, upgraded to 2.1.0

@malfet malfet removed the module: binaries Anything related to official binaries that we release to users label Oct 11, 2023
@okuvshynov
Copy link
Author

ok to close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants