-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU support crashing on Linux in 0.2 releases #39
Comments
There's no |
Sure, here it is attached, however after a quick diff it seems to be identical to this repo's llama.cpp/ggml-cuda.cu file (and in fact, there's a |
You are correct about line 6006. Apologies for any confusion. So here's what failing: CUDA_CHECK(cudaMalloc((void **) &ptr, look_ahead_size)); Looks like you're running out of GPU memory. But you said you're not passing the |
Also do you know if this happens if you use llama.cpp upstream? |
I am getting the same error on my system. My system is kind of similar. I also got a mobile Nvidia gpu.
some sysinfo:
For me, it does not happen if I use mistral with llama.cpp. |
It does not happen with llama.cpp, both |
I just tried the earlier releases and everything works fine in v0.1, but it breaks in v0.2, so some breaking change must have been introduced there. |
I'm reasonably certain if you pass the |
works for me |
Great! I'll update all the llamafiles on HuggingFace so their |
Yup, that works, thank you! |
When I run llamafile on my system, the model loads fine into my GPU VRAM, however whenever I try to send a prompt llamafile crashes with the following error:
This error happens even when -ngl is not set.
Here is some info about my system:
The text was updated successfully, but these errors were encountered: