Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

malloc error "Unable to allocate" on 8GB RAM Mac #63

Closed
davidjoffe opened this issue Dec 8, 2023 · 9 comments
Closed

malloc error "Unable to allocate" on 8GB RAM Mac #63

davidjoffe opened this issue Dec 8, 2023 · 9 comments

Comments

@davidjoffe
Copy link

davidjoffe commented Dec 8, 2023

I kept encountering the below error while trying the stable diffusion sample in mlx-examples on an 8GB M2 Mac Mini here. After some investigation (detailed here: ml-explore/mlx-examples#21) I found changing one line of code in MetalAllocator::MetalAllocator() in mlx/backend/metal/allocator.cpp to a much higher limit seems to have fixed the problem (this 1.5 seems maybe a bit conservative for low-RAM Macs):

block_limit_(1.5 * device_->recommendedMaxWorkingSetSize()) {}'
https://github.com/davidjoffe/mlx/blob/main/mlx/backend/metal/allocator.cpp

I made a fork with this change, and built from source to test.
I'd like to submit a Pull Request. This change should help for low-RAM Macs like 8GB Macs, though effectively just allows it to use swap instead of failing - arguably better than failing, but in the long run this behavior may need further improvement/refining, and/or giving users more control over whether/how they want this, or perhaps warning, or something.

(foo) david@Davids-Mac-mini stable_diffusion % python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 1
/Users/david/mlx/foo/lib/python3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
100%|
 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
zsh: abort      python txt2image.py "A photo of an astronaut riding a horse on Mars."  1  1
(foo) david@Davids-Mac-mini stable_diffusion % /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@saminatorkash
Copy link

saminatorkash commented Dec 9, 2023

Completely I agree that we should be able to use as much swap memory as we please.
same here. using a 8GB model. I haven't tried your solution of changing the value of 1.5 to more in allocation. I will play with it and let it update.

python txt2image.py "A photo of an astronaut riding a horse on Mars." --n_images 1 --n_rows 2
diffusion_pytorch_model.safetensors: 100%|█| 3.46G/3.46G [09:10<00:00, 6.29MB/s]
text_encoder/config.json: 100%|█████████████████| 613/613 [00:00<00:00, 906kB/s]
model.safetensors: 100%|███████████████████| 1.36G/1.36G [04:41<00:00, 4.83MB/s]
vae/config.json: 100%|██████████████████████████| 553/553 [00:00<00:00, 947kB/s]
diffusion_pytorch_model.safetensors: 100%|███| 335M/335M [00:57<00:00, 5.81MB/s]
tokenizer/vocab.json: 100%|████████████████| 1.06M/1.06M [00:00<00:00, 1.18MB/s]
tokenizer/merges.txt: 100%|███████████████████| 525k/525k [00:00<00:00, 882kB/s]
100%|███████████████████████████████████████████| 50/50 [05:19<00:00,  6.39s/it]
  0%|                                                                                                                                 | 0/1 [00:00<?, ?it/s]libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 134217728 bytes.
Abort trap: 6
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@swamyg
Copy link

swamyg commented Dec 12, 2023

libc++abi: terminating due to uncaught exception of type std::runtime_error:
[malloc_or_wait] Unable to allocate 100237312 bytes.

I'm running into this error on my M3 Max, 36GB of ram when trying to run lora.py from ml-examples. Trying to fine tune mistral-7B model.

100237312 bytes doesn't seem that much, not sure why it's failing.

@awni
Copy link
Member

awni commented Dec 12, 2023

See ml-explore/mlx-examples#70 for some ideas around how to reduce Lora memory consumption until we have quantization.

@s-smits
Copy link

s-smits commented Apr 24, 2024

Screenshot 2024-04-24 at 09 21 32 Also got a VRAM error while ~2GB more available than 9.8GB as shown in terminal when loading Phi-3. Is is possible to put the VRAM limit to `max_available_at_initiating` or something like that? So that other applications only take up swap.

@awni
Copy link
Member

awni commented Apr 24, 2024

There is a maximum size you can allocate into a single buffer (which is a machine specific property). I think it is less than 9.8 GB for you.

But either way the fact that you are trying to put 9GB into a single buffer is not a good sign. What are you running to get that? Is it from training or generation?

@s-smits
Copy link

s-smits commented Apr 24, 2024

It is a 16GB Air M1, do you happen to know a ballpark of the limit? Or is it dynamically dependent of other processes?
I was running a Phi-3-128k-mlx mlx_lm.utils load and generate function with ~6k context (when I run again it says 12.2GB needed), is it only limited to 8GB of VRAM? With PyTorch I am able to run 14GB of Python files without much of a speed loss (with around ~4-5GB swap of the top of my head).

@awni
Copy link
Member

awni commented Apr 25, 2024

It is a 16GB Air M1, do you happen to know a ballpark of the limit?

I don't know but you could try running this until it breaks:

import mlx.core as mx

mx.metal.set_cache_limit(0)
for i in range(100):
    print(f"{i} GB")
    a = mx.zeros((2**30, i), mx.bool_)
    mx.eval(a)
    del a

@awni
Copy link
Member

awni commented Apr 25, 2024

I'm going to close this issue as I'm not sure why it's still open. Feel free to file a new issue if you are still having issues with memory allocation.

@awni awni closed this as completed Apr 25, 2024
@s-smits
Copy link

s-smits commented Apr 25, 2024

air@MacBook-Air-van-Air test-repo % /opt/homebrew/bin/python3.
10 /Users/air/Repositories/test-repo/test4.py
0 GB
1 GB
2 GB
3 GB
4 GB
5 GB
6 GB
7 GB
8 GB
9 GB
libc++abi: terminating due to uncaught exception of type std::runtime_error: [malloc_or_wait] Unable to allocate 9663676416 bytes.
zsh: abort      /opt/homebrew/bin/python3.10 ```
Just an FYI, no need for me to open a new issue, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants