Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5: MPS on Macbook Air M1 NotImplementedError: Could not run 'aten::empty.memory_format' with arguments... #77748

Closed
glenn-jocher opened this issue May 18, 2022 · 13 comments
Labels
module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@glenn-jocher
Copy link

glenn-jocher commented May 18, 2022

🐛 Describe the bug

Following official example here causes an error.
https://pytorch.org/docs/master/notes/mps.html

import torch
x = torch.ones(5, device="mps")

NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [Dense, Conjugate, Negative, VmapMode, FuncTorchGradWrapper, MPS, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

Versions

Collecting environment information...
PyTorch version: 1.12.0.dev20220518
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (x86_64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23)  [Clang 6.0 (clang-600.0.57)] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0.dev20220518
[pip3] torchvision==0.13.0.dev20220518
[conda] numpy                     1.22.3           py39h690d673_0    conda-forge
[conda] pytorch                   1.10.2          cpu_py39hbfdb42d_1    conda-forge
[conda] torchvision               0.11.3          cpu_py39h85ba581_1    conda-forge

Screen Shot 2022-05-18 at 5 47 42 PM

@albanD albanD added the module: mps Related to Apple Metal Performance Shaders framework label May 18, 2022
@albanD
Copy link
Collaborator

albanD commented May 18, 2022

Hi,

The version of python that you're using is x86 and not arm (as you can see in the collect env report).
You will need to use a native version of python arm to be able to use MPS. Can you try that and see if it solves the issue please?

@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 18, 2022
@glenn-jocher
Copy link
Author

@albanD thanks for the pointer! My python install workflow for macOS is to go here and download the only available option:
https://www.python.org/downloads/release/python-390/
Screen Shot 2022-05-18 at 6 36 23 PM

Is there another method I should be using?

@glenn-jocher
Copy link
Author

@albanD I see here that as of 3.9.1 this should fully support arm64 in the same macOS executable? Maybe I just got unlucky installing 3.9.0, I'll try again with 3.9.1.
https://docs.python.org/3/whatsnew/3.9.html
Screen Shot 2022-05-18 at 6 41 04 PM

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

Yes, you can see the latest python download page: https://www.python.org/downloads/macos/
There are intel-only binaries and universal binaries there.
You will need to get a universal binary (that contains both x86 and arm version) to get full MPS support.

note that this will also mean that you won't have to go through the Rosetta layer to run your python code which might be faster :)

@glenn-jocher
Copy link
Author

glenn-jocher commented May 18, 2022

@albanD I installed Python 3.9.1 universal binary and saw significant speedup with PyTorch stable (torch==1.11.0 torchvision 0.12.0) for CPU ops (not MPS), but am not able to run nightly because it seems the pip installer is pairing torch nightly 1.12 with torchvision stable 0.12 (which appear incompatible). This is the error message (YOLOv5 uses torchvision for NMS):

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

This is the version info I have now:

Collecting environment information...
PyTorch version: 1.12.0.dev20220518
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:44:01)  [Clang 12.0.0 (clang-1200.0.32.27)] (64-bit runtime)
Python platform: macOS-12.4-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0.dev20220518
[pip3] torchvision==0.12.0

Code to reproduce (run in new macOS arm64 python>=3.9.1 venv):

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install --pre -r requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python detect.py

@gchanan
Copy link
Contributor

gchanan commented May 18, 2022

@albanD perhaps we should change the mps note/example to do the equivalent of torch.cuda.is_available() check.

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

PR ready: #77767

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

@glenn-jocher do you still have this issue if you install a nightly version of torchvision alongside the nightly version of torch ?

@glenn-jocher
Copy link
Author

glenn-jocher commented May 18, 2022

@albanD I'm not able to install torchvision nightly with arm64 python for some reason. This is what I see:

Screen Shot 2022-05-18 at 11 14 46 PM

EDIT: if I force a search of only pytorch.org/whl/nightly/cpu no matching distributions are found. My pip is the latest (22.1).
Screen Shot 2022-05-18 at 11 17 42 PM

@acostin1
Copy link

I had to compile torchvision from source to get it working.

@glenn-jocher
Copy link
Author

glenn-jocher commented May 19, 2022

@acostin1 did the same and am now able to use torch and torchvision arm64 nightly.

@albanD after installing torch nightly (pip) and torchvision nightly (from source) I'm now seeing a new buffer is not large enough. Must be 25600 bytes error message on YOLOv5-nano inference on Macbook Air M1 (this model is only 2M parameters and inference on a single 640 pixel image):

/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 25600 bytes

Reproduce (after installing torch and torchvision arm64 nightly):

git clone https://github.com/ultralytics/yolov5 -b apple/mps
cd yolov5
pip install -r requirements.txt
python detect.py --weights yolov5n.pt --device cpu  # verify CPU inference (45ms/image on Macbook Air M1)
python detect.py --weights yolov5n.pt --device mps  # error on MPS inference

Versions

PyTorch version: 1.12.0.dev20220519
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:44:01)  [Clang 12.0.0 (clang-1200.0.32.27)] (64-bit runtime)
Python platform: macOS-12.4-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0.dev20220519
[pip3] torchvision==0.13.0a0+9d9cfab

facebook-github-bot pushed a commit that referenced this issue May 20, 2022
… (#77767)

Summary:
Fixing #77748

Pull Request resolved: #77767
Approved by: https://github.com/soulitzer

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/dcd2ba353844bbc7d480bf3aa15fd0eeb0d55e78

Reviewed By: seemethere

Differential Revision: D36494371

Pulled By: albanD

fbshipit-source-id: 8c3d1b33272f100ee70880390c7e53e5b37b6f40
@ElliotQi
Copy link

@albanD So how is it going?

@glenn-jocher
Copy link
Author

glenn-jocher commented May 24, 2022

Closing as original issue is resolved.

PyTorch team seems to have a few TODOs on their side to complete before YOLOv5 MPS will work correctly though:

  1. create torchvision nightly arm64,
  2. resolve MPS buffer is not large enough. Must be 25600 bytes error reported by multiple users in buffer is not large enough when running pytorch on Mac M1 mps #77886

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants