New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPS memory issue, MPS backend out of memory, but works if I empty the MPS cache #105839
Comments
This feels like an edge case, we may not do anything here. |
Since using torchaudio 2.1.0 I also frequently get oom errors:
|
Something is using 7.42Gb of memory on our 8Gb MPS is using 1.45Gb. If 2.0 isn't enough setting it 0.0 will allow torch to use as much memory as needed, but all the memory usage in total (including other applications) over 8GB will come from swap. This will increase wear and tear on your system SSD (I'm assuming your on an Apple Silicon Mac) and could potentially crash the OS. Having said that I've used it a fair bit set to 0.0 on my 8GB M1 and its not caused a system crash since they added the watermark level system to pytorch |
When I check the RAM usage right after I get this error, it tells me only 2GB of my system memory is in use. There should be enough left. |
I'm also running into this error a lot trying to do SDXL 1.0 generations on a Mac Pro 2019 Intel with an AMD 6900XT 16GB GPU. I can do about 5-10 generations and then get the "MPS backend out of memory" error. See AUTOMATIC1111/stable-diffusion-webui#5461 (reply in thread) for more context. It would be great to get this fixed in the next version of PyTorch so that Mac users can SD! |
Tested a1111 with PyTorch 2.3.0.dev20240103 today on my aforementioned Mac Intel + AMD GPU rig and am no longer getting this MPS Out of Memory error! Yay! |
Any chance we will also get this fix in the stable 2.2.1? |
I still get this error using |
having this problem too, clearing the cache and trying to be parsimonious with memory allocation changes nothing |
I am having same issue while running it in my Jupyter Notebook locally on a Mac M2. There is a similar issue at Apple. |
@shubham-attri You mean you get a similar error to the one people are complaining about in this thread when you try and follow that Apple tutorial? |
To follow |
馃悰 Describe the bug
There appears to be something wrong the the MPS cache, I appears that either its not releasing memory when it ideally should be, or the freeable memory in the cache is not being taken into account when the check for space occurs.
The issue occurs on the currently nightly, see versions, and 2.0.1
This issue affects performance at best and terminates an application at worse.
Here's an example...
This works on a 8GB M1 Mac Mini without issue the two models run at
Remove the
mps.empty_cache()
and it fails during the second model runIf I reduce the height and width values to 512 it'll run to completion but the second model runs at 40 seconds per iter with a lot of swap file access. With the cache emptied manually it runs at around 2 seconds per iter.
the fp16fixes file is required to work around some issues with using fp16 on mps which fails with a broadcast error on 2.0.1 and fails with a bad image on the nightly I'm currently using. If I remove it the issue still occurs on the nightly.
Versions
Collecting environment information...
PyTorch version: 2.1.0.dev20230724
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 13.4.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.24.4
Libc version: N/A
Python version: 3.10.11 (main, Apr 8 2023, 02:11:11) [Clang 14.0.0 (clang-1400.0.29.202)] (64-bit runtime)
Python platform: macOS-13.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1
Versions of relevant libraries:
[pip3] numpy==1.25.1
[pip3] torch==2.1.0.dev20230724
[pip3] torchvision==0.15.2
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev
The text was updated successfully, but these errors were encountered: