-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue regarding falcon-7b quantized #728
Comments
Same here. I think this is not the segmentation issue is not related to the memory size. |
What exact file are you using? |
Tried all Bloke repository files and gpt4all-falcon file. I think MPT and Falcon models do not work. GPT4ALL is working. I think this Github repository is not maintained properly. Obviously, we can only use MPT or Falcon but cannot use llama nor gpt4all due to license issue. Now talking about llama and gpt4all under K8S is meaningless. Since these llama and gpt4all models are only for your personal work or research, there will be no use of K8S. :p |
Please file issues for the problems you find - this is how it works. If you keep the things that work or not by yourself things will never get fixed.
You are wrong here, there are OpenLLama based models that can be used freely, and gpt4all models based on GPT-J. MPT with gpt4all should work. I didn't tried Falcon neither MPT recently, as I'm busy with #726 , but I think the model you are trying is not the one I've tried it - that looks somewhat newer. |
I had a quick look at the current state and seems most of the work to support falcon went to ggllm.cpp. I quickly give a shot at creating bindings and seems to work with wizardlm-uncensored: https://github.com/mudler/go-ggllm.cpp - I will integrate it in LocalAI soon, that should give support for 7b and 40b at least and GPU support |
I'm having a closer look at it this weekend, a spare attempt seems to work here with falcon-7b. I'm looking into refactoring the backends first to get rid of some hacks, but this shouldn't take long. |
Now master should have I've also kept the old ggml implementation as a fallback in the Note: you need to be extra-careful to have a matching prompt. Without it the model hallucinates pretty quickly |
LocalAI version:
LocalAI version LocalAI v1.20.1-dirty (92614b9)
Environment, CPU architecture, OS, and Version:
Linux t14s 6.4.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 01 Jul 2023 16:17:21 +0000 x86_64 GNU/Linux
Describe the bug
Running LocalAI with
falcon7b-instruct.ggmlv3.fp16.bin
from TheBloke it is putting me out of memory with 16GB of RAM. So I triedfalcon7b-instruct.ggmlv3.q8_0.bin
which works with a little bit less of RAM but seg fault the backend.To Reproduce
falcon-7b
Expected behavior
To not seg fault.
Logs
Expand
Additional context
The text was updated successfully, but these errors were encountered: