M1 GPU mps device integration#596
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
muellerzr
left a comment
There was a problem hiding this comment.
Thanks for this! Left a suggestion to make sure that we get GPU tests actually running and passing, as I assume that's the right move here :)
sgugger
left a comment
There was a problem hiding this comment.
Nice addition! I left some comments and we should also have some documentation around that integration (flagging that BERT has a loss of performance for instance).
Tests can be added in other PRs once we have better access to a machine with M1.
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
muellerzr
left a comment
There was a problem hiding this comment.
A few comments for now until the spacing nits are fixed and I can view it better on the website :)
Co-authored-by: Zachary Mueller <muellerzr@gmail.com>
muellerzr
left a comment
There was a problem hiding this comment.
Great work! I left some final doc nits for you 😄
Co-Authored-By: Zachary Mueller <7831895+muellerzr@users.noreply.github.com>
Co-Authored-By: Zachary Mueller <7831895+muellerzr@users.noreply.github.com>
|
|
What does this PR do?
mpsdevice type in PyTorch for faster training and inference than CPU.accelerate configcommand:cv_example.pywith and without MPS to gauge the speedup. The speedup being ~7.5X over CPU. This experiment is done on a Mac Pro with M1 chip having 8 CPU performance cores (+2 efficiency cores), 14 GPU cores and 16GB of unified memory.Note: Pre-requisites: Installing torch with
mpssupportAttaching plots showing GPU usage and CPU usage when they are enabled correspondingly:
GPU M1

mpsenabled:Only CPU training:

Note: For

nlp_example.pythe time saving is 30% over CPU but the metrics are too bad when compared to CPU-only training. This means certain operations in BERT model are going wrong usingmpsdevice and this needs to be fixed by PyTorch.