-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for cublas/openblas in the llama.cpp backend #258
Conversation
1997bf6
to
6a185ca
Compare
Let's merge this to master as it add-only and doesn't hurt as a starting point. I successfully built it on colab, but no way to test this locally. I'll update the docs and let see out of bug reports. |
Might be worth dropping this command in a readme that should allow folks to test that they have a valid detectable GPU: docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi Example output showing a valid GPU: PS C:\Users\bubth\Development\LocalAI\nvidia> docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi
Unable to find image 'nvidia/cuda:10.2-base' locally
10.2-base: Pulling from nvidia/cuda
25fa05cd42bd: Already exists
24a22c1b7260: Already exists
8dea37be3176: Already exists
b4dc78aeafca: Already exists
a57130ec8de1: Already exists
Digest: sha256:86aba51da8781cc370350a6e30166ab2714229d505fd87f8d28ff6d3677a0ba4
Status: Downloaded newer image for nvidia/cuda:10.2-base
Tue May 16 18:56:46 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50 Driver Version: 531.79 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti On | 00000000:01:00.0 On | N/A |
| 35% 46C P8 36W / 350W| 6131MiB / 12288MiB | 6% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
PS C:\Users\bubth\Development\LocalAI\nvidia> |
Good stuff! Although it seems that The following solves the issue:
Which is necessary, otherwise llama.cpp compiles without
With
|
good catch @Thireus! thanks! - do you also have a GPU at hand so you can test this out? also, do you feel taking a stab at fixing it? otherwise I'll have a look soon |
Hey there!
Ive run into a couple of issues: - name: gpt-3.5-turbo
parameters:
model: Manticore-13B.ggmlv3.q4_0.bin
temperature: 0.3
context_size: 2048
threads: 6
backend: llama
stopwords:
- "USER:"
- "### Instruction:"
roles:
user: "USER:"
system: "ASSISTANT:"
assistant: "ASSISTANT:"
gpu_layers: 40
Using the provided yaml like in model-gallery yield the error
Cheers! |
Depends on: go-skynet/go-llama.cpp#51
See upstream PR: ggerganov/llama.cpp#1412
Allows to build LocalAI with the
llama.cpp
backend with cublas/openblas:Cublas
To build, run:
OpenBLAS
To set the number of GPU layers, in the config file:
This also drops the "generic" build type, as I'm sunsetting it in favor of specific cmake parameters
Related to: #69