-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support broken on old Intel/Amd CPUs #25
Comments
Your Sandybridge CPU is something we intend to support and I'm surprised it's not working. I have a ThinkPad I bought back in 2011 which should help me get to the bottom of this. I'll update this issue as I learn more. If there's any clues you can provide in the meantime about which specific instruction is faulting, and what it's address is in memory, then that'd help if you shared too. |
I have similar result : $ ./llamafile-server-0.1-llava-v1.5-7b-q4 My CPU is an i3 : Model name: Intel(R) Core(TM) i3-2120T CPU @ 2.60GHz gdb can't load the cosmo binary, I tried with lldb as suggested in another issue, but it doesn't start : lldb -o "run" ~/.ape-1.9 ./llamafile-server-0.1-llava-v1.5-7b-q4 |
Try using llamafile's |
Last few lines when running with --ftrace : FUN 485044 485059 57'262'404'097 592 &ggml_cuda_compute_forward |
I just cloned the repository, compiled and ran it with the gguf files from the binary, and it looks "same" as the binary release. I ran it with : llamafile$ o//llama.cpp/server/server -m models/llava-v1.5-7b-Q4_K.gguf --mmproj models/llava-v1.5-7b-mmproj-Q4_0.gguf So, I believe I can try your patch when available :) |
I just realized I can use |
One last thing. If you notice llamafile performing poorly compared to llama.cpp upstream on your CPU model, then I consider that a bug and I'd ask anyone who experiences that to please file an issue, so I can address it. Thanks! |
Thanks ! |
Out of curiosity, (1) what weights are you using, and (2) do you know if it's going equally slow as llama.cpp upstream? |
I'm running wtih "llava-v1.5-7b-q4" |
Glad to know you're back in business. I love fast the most, but even a slow LLM is useful if you're doing backend work. That's one of the reasons I'm happy to support you. |
I have an Intel 2500k (overclocked 4.2) and it's much slower compared to llama.cpp using mistral-7b-instruct-v0.1.Q5_K_M.gguf and this prompt "Can you explain why the sky is blue?" With llama.cpp (-n 128 -m mistral-7b-instruct-v0.1.Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -ins):
With llamafile-server-0.2 -m mistral-7b-instruct-v0.1.Q5_K_M.gguf
EDIT: updated llama.cpp to the lasted version |
@euh Are you running on Linux? If so, could you follow the source instructions in the readme on how to |
That's exactly what I needed. Looks like this quant hasn't been ported yet. Let me try and do that now. |
@euh You just got your performance back. Build at HEAD and your Q5 weights will go potentially 2.5x faster. |
Yes, better (using server):
Thanks. |
Thanks so much for all the great work around this issue. |
The Q5 perf improvement hasn't made it into a release yet. You have to build llamafile yourself right now using the Source Instructions in the README. |
Hi,
lscpu gives.....
after
Output is....
llama.log is
The text was updated successfully, but these errors were encountered: