-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models not responding (No AVX Support) #88
Comments
im actually starting to guess it might be because the lack of avx support on the cpus im using which would be causing this problem |
I am seeing the problem using an M1 Max Macbook Pro / Ventura 13.3.1 / Docker 4.17.0 (99724).
|
Exactly same issue on M1 Pro 32 gb, Docker version 20.10.24, build 297e128, Ventura 13.3.1 (22E261). Tried with different models, just nothing happens: no errors, no timeouts. |
@JeshMate , This could very well be the case. I ran the same container in two different hosts, one with AVX and another without it; the one with AVX works the other one does not.
|
i did see that there are ways to run it without avx but it does consume a lot more resources and also requires rebuilding the binaries behind it, i have no idea what settings to use and hopefully the repo host could guide us in the right way. |
Hey all 👋 very well detective work here! Docs are lacking here, so my bad. Until we get that's fixed let me give some hints:
|
Okie dokie! Just wondering, because I didn't really want to use docker regardless for something like this, I built a local binary with If nothing seems to be working, I'm happy to wait until the docs get updated to support environments like this so we've got more of an understanding to how stuff like this can be used. Fantastic work on this by the way! P.S, Didn't mean to close the thread lol, still new to github. |
After using make build then make run:
I also tried using go-gpt4all-j, go-gpt2 and go-llama as the model in the above curl. I then tried downloading the https://gpt4all.io/models/ggml-gpt4all-j.bin model:
Then running make build and make run, but still it shows 'model does not exist'. |
@pobmob to run locally, you have to specify the
or change directory to /models before running:
|
Thanks @MartyLake
Got things working for me. Although I still noticed the model had 'unexpectedly reached end of file' using the make build process.
Even with the error above, I am now getting a response using curl.
Anybody know where the task "need to make a list of all the cities in the world" came from? Has it just made that up/hallucinated? |
I am using the ggml-alpaca-7b-q4 model now and for my usage it works so good on my M1 Max 32GB Macbook Pro. |
I'm confused on if different installation instructions need to be added to the Readme for AVX vs Non-AVX or if we need to wait for a fix. I'm having what I believe to be a similar issue
Oddly in my case it worked for a bit until I tried a large payload, and now after restarting and recreating the container it won't work at all. I'm hoping it's related to this issue, but I can start a new issue if it turns out it isn't related. |
I'm running this on a K3s v1.25.5+k3s1 cluster. On VMWare. I controlled whether the CPU's dedicated to the VM supports And the output:
So the answer is yes. However, e.g. executing: curl http://local-ai.ai-llm.svc.cluster.local:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "ggml-gpt4all-j.bin",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}' Towards the Local-AI Pod gives me no answer. I've waited for more than 10 minutes over several tries. I also see the: Other notes:
For good measure I tried on a cluster with 8 vCPU's dedicated and double up on RAM/MEMORY on the workers. And the result: Bad. Couldn't even install. So.... Hmm I've not always been impressed with the IOPS that the Therefore I created a new Result: I could now install the Local-AI Helm Chart. Testing with: curl http://local-ai.ai-llm.svc.cluster.local:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "ggml-gpt4all-j.bin",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}' I see the The Local-AI process is now maxing out the 8 vCPU's over 15 threads. But, NO answer after having waited for more than 10 minutes. I also executed I therefore enabled So: spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: local-ai
app.kubernetes.io/name: local-ai
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: local-ai
app.kubernetes.io/name: local-ai
name: local-ai
spec:
containers:
- args:
- --debug
command:
- /usr/bin/local-ai
....
....
....
.... And here's the result >> Still hanging and no response. Looked into my options in regards to how the API can be requested. So tried a simpler query. curl http://local-ai.ai-llm.svc.cluster.local:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "ggml-gpt4all-j.bin",
"prompt": "How are you?",
"temperature": 0.7
}' However, the picture remains, No response and full CPU exhaustion on the worker. Trying with more tokens. curl http://local-ai.ai-llm.svc.cluster.local:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "ggml-gpt4all-j.bin",
"prompt": "How are you?",
"temperature": 0.7,
"max_tokens": 1536
}' Result: Waited for more than 10 minutes ... no answer Other notes I've experienced that the For good measure here are the
I really hope this can be fixed/get working ... as this really have promise and is exciting. Thank you very much to you all. 🥇 |
I'm getting the same issue but the CPU I'm running on does support AVX/AVX2 but not AVX512 Does this require AVX2 or AVX512 explicitly ? Intel has removed AVX512 support on consumer cpu's |
CPU Ryzen 5 4600G with AVX2, plain build (no docker). According to CPU usage it does something, waited for 10 min but got no response.. |
the issue with @LarsBingBong was resolved over discord and was threads overbooking. lowering the number of threads to physical cores was the issue in his case - can you try specifying a lower number of threads? |
The issue is fixed by latest 1.5.1 docker image for me :) |
Does anyone know how to speed things up , should i bump up the instance type, would it help ? |
That is definitely too much, this is my output:
I also get that exact error about the unexpected end of file. |
The error on the llama.cpp backend is on the first-load of the model. If you don't specify a backend in the model config file, the first load is greedy and will try to load a model from any of the backends. (see here https://github.com/go-skynet/LocalAI#advanced-configuration on how to do that ) I think the real problem here is that you have a 4vcpu, so you should lower the threads - at least at 4cores. But I wouldn't expect fast responses on small droplets. You should probably for rwkv models instead. |
|
closing this as not relevant anymore. Many things changed meanwhile and support for old CPU with missing flags is now documented in the build section of our docs https://localai.io/basics/build/ |
Not sure if im doing something wrong but when i send a request through curl to the api, it does this:
![image](https://user-images.githubusercontent.com/33387060/234503150-f63c10aa-aff8-4b57-a393-b2c937f1889e.png)
It doesn't go past this whatsoever, I'm new to this whole thing, so far I built the binary by itself but the same thing would happen in docker too.
If there's anything that needs to be supplied, let me know.
The text was updated successfully, but these errors were encountered: