New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for full CUDA GPU offloading #105
Conversation
@deadprogram tested locally here, now seems to work! |
Signed-off-by: mudler <mudler@mocaccino.org>
Signed-off-by: mudler <mudler@mocaccino.org>
Signed-off-by: mudler <mudler@mocaccino.org>
Bumps [llama.cpp](https://github.com/ggerganov/llama.cpp) from `2347e45` to `bed9275`. - [Release notes](https://github.com/ggerganov/llama.cpp/releases) - [Commits](ggerganov/llama.cpp@2347e45...bed9275) --- updated-dependencies: - dependency-name: llama.cpp dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
looks like f16 is not respected and still loads with f32. Needs a closer look |
something is off.
|
Interestingly enough this only happens for me when lets say i spin up a clean localai instance and then run autogpt first thing. I dont get that when I initialize the model by starting a chat via the chatbot-ui first and then using it with autogpt.
Here one of the random crashes right after init:
GOLLAMA_VERSION?=a52ae7a66ae7fa42fd29f0bca9480c5c198feff9 |
Some other models that i tried behaved simliar weird, always around init, but i never got the FP16 issue there |
I've been debugging this with @lu-zero (thanks!) and seems it is around the pass-by-value in llama_init_from_file. Somehow, when compiled with the binding, the copy gets mangled, and booleans are shuffled. That yields to f16, memlock, or mmap not respected. Tested with GCC11 and GCC12 |
Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with the bindings in go-skynet/go-llama.cpp#105.
Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with the bindings in go-skynet/go-llama.cpp#105.
This is needed until ggerganov/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <mudler@mocaccino.org>
Signed-off-by: mudler <mudler@mocaccino.org>
With the patch and this PR:
|
Patch has been submitted to llama.cpp. meanwhile applying the patch manually here to unblock updates |
Fix upstreamed in: ggerganov/llama.cpp#1936 |
Great work @mudler thanks for all the effort! |
Special thanks to @chnyda that gave me access to a CUDA GPU to test this out. And of course, to llama.cpp to provide CUDA support!