-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable diffusion malloc error #21
Comments
Hm unfortunately that means it ran out of memory. We haven't tested it on a Mac mini with 8GB of ram unfortunately. I am using it on my M2 Air but it has 24GB. Can you try removing CFG by setting |
Hm, thanks .. adding --cfg 0 unfortunately makes no difference Even if it's running out of memory, seems odd, why wouldn't it just use the system pagefile/swap for such a relatively small amount like ~134MB? In 20+ years of dev I've never seen malloc() fail from high memory load, it normally just allocs and uses swap ... these 8GB Macs regularly swap like crazy but continue, I often use much MUCH more on these Macs than this appears to be using from Activity Monitor Or does this have something to do with the unified memory architecture? Or memory fragmentation? It runs smoothly through the default 50 steps - it just fails at the end. I'm watching memory load as this runs and it's in the green most the way, using max 2GB of swap until it completes the 50 steps ... using only 2GB swap it doesn't seem to me like such an excessive memory load that a malloc of just 134MB should fail, that's very very light memory load:
|
I did some more tests, ran it again with --steps 2 - the maximum memory load system hits is about 7.5GB and the entire system is using ONLY ~500MB swap this time (that's nothing). I don't see why a malloc of just ~134MB should fail under those conditions, why it shouldn't just use swap I upgraded to Sonoma, same thing EDIT2 FWIW the system report looks like this: Edit3 Looking at the source I see this looks like it internally calls/uses metal::allocator? Edit4 is it possible the line 'block_limit_(1.5 * device_->recommendedMaxWorkingSetSize()) {}' is where the issue stems? If I have time later I may try rebuild with that line changed and see if it helps
|
New information, and good news as I found the cause: I forked mlx itself, changed one line of code, rebuilt from my source fork, and now it works 😊 This line of code in MetalAllocator::MetalAllocator() in mlx/backend/metal/allocator.cpp I changed to a much higher limit - this 1.5 seems maybe a bit conservative for low-RAM Macs:
There is indeed a relatively big spike of memory usage as it finishes the steps, but not world-ending, I'd rather have 'something that works' even if it spikes my swap, but I suppose it's debatable how best to handle that in the long run as a general solution for all users, or just warn, or maybe give more options to control how much memory to use or how to handle low memory, anyway. That means the issue though is not in mlx-examples but probably really in mlx. Should I try submit a PR for mlx? But I'm not sure if someone had some 'very good reason' for the particular choice of 1.5 as a hard factor here. |
May I say, awesome work :-) We need to start adding more 8GB Macs in our testing pool. There really wasn't a particular reason for the 1.5. It just gets significantly slower at that point and we wanted to avoid freezing the system. I would encourage you to at least open an issue or maybe a PR at the main repo. Can't tell you that it will be the first one merged but for sure we 'll test the implications of increasing this limit and if there are none for most use cases then merge it. Thanks for investigating! Feel free to close the issue and link to it from the PR. |
Thank you! I opened an issue for this on mlx: ml-explore/mlx#63 and created this Pull Request: ml-explore/mlx#64 - hope it's accepted, though of course it's up to the mlx maintainers. |
mlx seemed to install fine but I tried the stable diffusion example and keep getting this:
134217728 bytes seems small, there should not be a problem allocating only ~134MB memory, the machine has 8GB RAM & ~4GB free. It's an Apple M2 Mac mini (running headless & connected to via VNC, could that cause issues?),
Or am I misunderstanding something basic, does this need an M2 Pro or something? Though it does go through all the (default) 50 steps before failing
I ran the MLX unit tests and they seemed to run fine, said '154 tests ran, 4 skipped'
python --version 3.9.6. OS: Ventura 13.6.1. tried Python 3.11.5 and does same
It cuts off at "There appears to be %d '
Briefly tried installing via source & also tried on conda, and Python 3.12, also didn't work
The text was updated successfully, but these errors were encountered: