-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent malloc failure on 8GB Macs #64
Conversation
Quick experiment to see if this helps fix malloc error
Experimenting with values here to work on 8GB RAM Mac
@jagrit06 what was the rationale for setting this to 1.5x in the first place? I have no idea what the performance implications of changing it to 4.0 are, but we should test it / understand that better. Then either merge this or come up with a more sophisticated method for setting it (or maybe it a configurable environment variable). |
have a nice today sir |
So, the rationale for setting it to 1.5 was that the recommended working set it usually 0.75 * the total ram available on the system - we wanted that to be the point where we basically block until one operation is done to use its memory before scheduling a new one We will probably need something a little more sophisticated than just upping the memory pressure limit though - probably changing how and when we throw that error |
I agree though that a hard no-swap policy seems a little Draconian. Though maybe Part of the problem is that we are using this limit as a way to throttle the GPU from running to fast and gobbling memory that it won't need for a while (hence paying a swap penalty). This is a good thing for models where the working set is not too big at any point, but the overall amount of memory allocated in a single But we also should allow allocated memory (that isn't being used in the current So maybe a better design would be something like keeping track of the amount of memory used during the specific In that case the API also has the flexibility to allow swapping by breaking a graph into multiple evals. |
Hi, yes, to be clear the idea of this PR was more something like 'a short-term fix to get it to at least work on 8GB Macs' (as that's preferable for users to failing with an error, I think) but I agree a better solution is desirable in the longer term. Some thoughts; I like the idea of having ways to configure the behavior (e.g. via one or more environment variables, as suggested, or perhaps other ways) and having ways to (ideally fully) control it so users can decide what suits their situation I guess ideally the defaults should be something that at least 'works on a broad range of platforms/use-cases' out the box, though that can be tricky. It could maybe warn on high mem usage (with maybe a setting to disable warnings). And if it fails it could give a message giving some guidance to the user how to try solve it (e.g. what settings to try change, of course, unless they're trying to run something really huge) Unfortunately all my Macs are 8GB and I'd love to be able to run MLX stuff even if it's swappy, but as a programmer I know well enough what I can reasonably get away with. |
Hi @davidjoffe I hear you, I'm just worried this change is going to slow down other use cases which would be bad. I think if you made it an environment variable that can be set, we could merge that for now with the intention of fixing the overall strategy in the future. |
Merging getting sync'd with latest mlx
I think this is going to be fixed in #292 so I am closing it. |
Hi, ok, thank you! I haven't had time to work on it unfortunately so far, but will definitely give it a try again once this PR 292 is in |
Still getting the exact same error on M3 Pro 18GB, except it's being thrown during the quantization in |
What are you converting? I'll try to repro. |
sorry for not including that - I'm converting Mixtral-8x7B-Instruct-v0.1 following this readme. Weirdly, every time I run it I get the same behavior - the first two tensors are converted and quantized as expected but the third throws the same |
Hi,
This Pull Request is for this issue below (error "std::runtime_error: [malloc_or_wait] Unable to allocate" which I'd been encountering running the official stable diffusion example on an 8GB M2 Mac Mini), please see this for more detailed explanation:
#63
Effectively this should help stop the metal::allocator malloc from failing on low-RAM Macs e.g. 8GB RAM Macs, though this behavior probably needs further future testing/refinement etc.