Prevent malloc failure on 8GB Macs #64

davidjoffe · 2023-12-08T01:59:35Z

Hi,

This Pull Request is for this issue below (error "std::runtime_error: [malloc_or_wait] Unable to allocate" which I'd been encountering running the official stable diffusion example on an 8GB M2 Mac Mini), please see this for more detailed explanation:

#63

Effectively this should help stop the metal::allocator malloc from failing on low-RAM Macs e.g. 8GB RAM Macs, though this behavior probably needs further future testing/refinement etc.

Quick experiment to see if this helps fix malloc error

Experimenting with values here to work on 8GB RAM Mac

awni · 2023-12-08T03:28:44Z

@jagrit06 what was the rationale for setting this to 1.5x in the first place? I have no idea what the performance implications of changing it to 4.0 are, but we should test it / understand that better. Then either merge this or come up with a more sophisticated method for setting it (or maybe it a configurable environment variable).

hduy7222 · 2023-12-08T11:48:50Z

have a nice today sir

jagrit06 · 2023-12-08T17:55:38Z

So, the rationale for setting it to 1.5 was that the recommended working set it usually 0.75 * the total ram available on the system - we wanted that to be the point where we basically block until one operation is done to use its memory before scheduling a new one

We will probably need something a little more sophisticated than just upping the memory pressure limit though - probably changing how and when we throw that error

awni · 2023-12-08T20:35:21Z

I agree though that a hard no-swap policy seems a little Draconian. Though maybe 4 * recommendedMaxWorkingSetSize is too high.

Part of the problem is that we are using this limit as a way to throttle the GPU from running to fast and gobbling memory that it won't need for a while (hence paying a swap penalty). This is a good thing for models where the working set is not too big at any point, but the overall amount of memory allocated in a single eval is.

But we also should allow allocated memory (that isn't being used in the current eval) to swap. I.e. if I'm just holding a bunch of arrays that aren't relevant for the computation and I won't use them for a while.

So maybe a better design would be something like keeping track of the amount of memory used during the specific eval and limiting that 1.5 * recommendedMaxWorkingSetSize?

In that case the API also has the flexibility to allow swapping by breaking a graph into multiple evals.

davidjoffe · 2023-12-08T23:41:43Z

Hi, yes, to be clear the idea of this PR was more something like 'a short-term fix to get it to at least work on 8GB Macs' (as that's preferable for users to failing with an error, I think) but I agree a better solution is desirable in the longer term.

Some thoughts; I like the idea of having ways to configure the behavior (e.g. via one or more environment variables, as suggested, or perhaps other ways) and having ways to (ideally fully) control it so users can decide what suits their situation

I guess ideally the defaults should be something that at least 'works on a broad range of platforms/use-cases' out the box, though that can be tricky. It could maybe warn on high mem usage (with maybe a setting to disable warnings). And if it fails it could give a message giving some guidance to the user how to try solve it (e.g. what settings to try change, of course, unless they're trying to run something really huge)

Unfortunately all my Macs are 8GB and I'd love to be able to run MLX stuff even if it's swappy, but as a programmer I know well enough what I can reasonably get away with.

awni · 2023-12-09T04:11:19Z

Hi @davidjoffe I hear you, I'm just worried this change is going to slow down other use cases which would be bad. I think if you made it an environment variable that can be set, we could merge that for now with the intention of fixing the overall strategy in the future.

Merging getting sync'd with latest mlx

awni · 2023-12-27T17:03:40Z

I think this is going to be fixed in #292 so I am closing it.

davidjoffe · 2023-12-27T18:38:49Z

Hi, ok, thank you! I haven't had time to work on it unfortunately so far, but will definitely give it a try again once this PR 292 is in

beverm2391 · 2024-01-04T19:03:05Z

Still getting the exact same error on M3 Pro 18GB, except it's being thrown during the quantization in convert.py I'm using mlx==0.0.7 via PyPi which I think should include the changes merged in #292.

awni · 2024-01-04T19:05:00Z

What are you converting? I'll try to repro.

beverm2391 · 2024-01-04T19:37:12Z

sorry for not including that - I'm converting Mixtral-8x7B-Instruct-v0.1 following this readme. Weirdly, every time I run it I get the same behavior - the first two tensors are converted and quantized as expected but the third throws the same Runtime Error: [malloc]... as outlined so far.

davidjoffe added 4 commits December 7, 2023 19:38

Update allocator.cpp

a2ec2e0

Quick experiment to see if this helps fix malloc error

Update allocator.cpp

7cf1989

Experimenting with values here to work on 8GB RAM Mac

Merge branch 'ml-explore:main' into main

81814da

Merge branch 'ml-explore:main' into main

d1af80a

davidjoffe mentioned this pull request Dec 8, 2023

stable diffusion malloc error ml-explore/mlx-examples#21

Closed

awni requested review from jagrit06 and awni December 8, 2023 03:27

Merge pull request #1 from ml-explore/main

31e563d

Merging getting sync'd with latest mlx

awni closed this Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent malloc failure on 8GB Macs #64

Prevent malloc failure on 8GB Macs #64

davidjoffe commented Dec 8, 2023

awni commented Dec 8, 2023

hduy7222 commented Dec 8, 2023

jagrit06 commented Dec 8, 2023

awni commented Dec 8, 2023

davidjoffe commented Dec 8, 2023 •

edited

Loading

awni commented Dec 9, 2023

awni commented Dec 27, 2023

davidjoffe commented Dec 27, 2023

beverm2391 commented Jan 4, 2024

awni commented Jan 4, 2024

beverm2391 commented Jan 4, 2024 •

edited

Loading

Prevent malloc failure on 8GB Macs #64

Prevent malloc failure on 8GB Macs #64

Conversation

davidjoffe commented Dec 8, 2023

awni commented Dec 8, 2023

hduy7222 commented Dec 8, 2023

jagrit06 commented Dec 8, 2023

awni commented Dec 8, 2023

davidjoffe commented Dec 8, 2023 • edited Loading

awni commented Dec 9, 2023

awni commented Dec 27, 2023

davidjoffe commented Dec 27, 2023

beverm2391 commented Jan 4, 2024

awni commented Jan 4, 2024

beverm2391 commented Jan 4, 2024 • edited Loading

davidjoffe commented Dec 8, 2023 •

edited

Loading

beverm2391 commented Jan 4, 2024 •

edited

Loading