feat: add support for cublas/openblas in the llama.cpp backend #258

mudler · 2023-05-14T20:17:42Z

See upstream PR: ggerganov/llama.cpp#1412

Allows to build LocalAI with the llama.cpp backend with cublas/openblas:

Cublas

To build, run:

make BUILD_TYPE=cublas CUDA_LIBPATH=.... build

OpenBLAS

make BUILD_TYPE=openblas build

To set the number of GPU layers, in the config file:

gpu_layers: 4

This also drops the "generic" build type, as I'm sunsetting it in favor of specific cmake parameters

Related to: #69

mudler · 2023-05-16T14:25:56Z

Let's merge this to master as it add-only and doesn't hurt as a starting point. I successfully built it on colab, but no way to test this locally. I'll update the docs and let see out of bug reports.

bubthegreat · 2023-05-16T18:58:27Z

Might be worth dropping this command in a readme that should allow folks to test that they have a valid detectable GPU:

docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi

Example output showing a valid GPU:

PS C:\Users\bubth\Development\LocalAI\nvidia> docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi
Unable to find image 'nvidia/cuda:10.2-base' locally
10.2-base: Pulling from nvidia/cuda
25fa05cd42bd: Already exists
24a22c1b7260: Already exists
8dea37be3176: Already exists
b4dc78aeafca: Already exists
a57130ec8de1: Already exists
Digest: sha256:86aba51da8781cc370350a6e30166ab2714229d505fd87f8d28ff6d3677a0ba4
Status: Downloaded newer image for nvidia/cuda:10.2-base
Tue May 16 18:56:46 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti      On | 00000000:01:00.0  On |                  N/A |
| 35%   46C    P8               36W / 350W|   6131MiB / 12288MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
PS C:\Users\bubth\Development\LocalAI\nvidia>

Thireus · 2023-05-19T07:43:20Z

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

mudler · 2023-05-19T09:08:59Z

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

good catch @Thireus! thanks! - do you also have a GPU at hand so you can test this out? also, do you feel taking a stab at fixing it? otherwise I'll have a look soon

@Thireus

Thanks to @Thireus for noticing it. See: #258

@Thireus

Thanks to @Thireus for noticing it. See: #258

@Thireus

Thanks to @Thireus for noticing it. See: #258

ghost · 2023-05-22T21:37:55Z

Hey there!
I was able to build localai from an nvidia/cuda image, modifying the Dockerfile to install golang like this:

ARG GO_VERSION=1.20.4
ARG BUILD_TYPE=cublas
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
ENV REBUILD=true
WORKDIR /build
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y checkinstall libgomp1 libopenblas-dev libopenblas-base libopencv-dev libopencv-core-dev git make
RUN apt-get install -y curl unzip
RUN curl -L https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz -o /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
ENV PATH="$PATH::/usr/local/go/bin"
RUN curl -L https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.tar.gz -o /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
ENV PATH="$PATH::/usr/local/cmake-3.26.4-linux-x86_64/bin"
RUN apt-get install -y ca-certificates
ENV PATH /usr/lib/go-${GO_VERSION}/bin:$PATH
COPY . .
RUN ln -s /usr/include/opencv4/opencv2/ /usr/include/opencv
RUN make build
EXPOSE 8080
ENTRYPOINT [ "/build/entrypoint.sh" ]

Ive run into a couple of issues:
• gpu_layers is ignored and i can't get it to offload any work to the GPU
• If i set REBUILD=false, then the GPU is not used and it assumes that the container is non-cublas/openblas
This is my config file:

- name: gpt-3.5-turbo
  parameters:
    model: Manticore-13B.ggmlv3.q4_0.bin
    temperature: 0.3 
  context_size: 2048
  threads: 6
  backend: llama
  stopwords:
  - "USER:"
  - "### Instruction:"
  roles:
    user: "USER:"
    system: "ASSISTANT:"
    assistant: "ASSISTANT:"
  gpu_layers: 40

Using the provided yaml like in model-gallery yield the error

ERR error loading config file: cannot load config file: cannot unmarshal config file: yaml: unmarshal errors:
  line 1: cannot unmarshal !!map into []*api.Config

Cheers!

mudler marked this pull request as draft May 14, 2023 20:18

mudler force-pushed the gpu branch from 068bc57 to ceafde2 Compare May 14, 2023 20:26

mudler changed the title ~~feat: add support for cublas/openblas on the llama.cpp backend~~ feat: add support for cublas/openblas in the llama.cpp backend May 14, 2023

mudler force-pushed the gpu branch 3 times, most recently from 1997bf6 to 6a185ca Compare May 14, 2023 21:07

feat: add support for cublas/openblas on the llama.cpp backend

7eb2d48

mudler force-pushed the gpu branch from 6a185ca to 7eb2d48 Compare May 14, 2023 21:10

Merge branch 'master' into gpu

9bd62be

mudler marked this pull request as ready for review May 16, 2023 14:24

mudler merged commit acd03d1 into master May 16, 2023
3 checks passed

mudler deleted the gpu branch May 16, 2023 14:26

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

60c0fe8

Thanks to @Thireus for noticing it. See: #258

mudler mentioned this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas #310

Merged

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

cfbf7bd

Thanks to @Thireus for noticing it. See: #258

mudler added a commit that referenced this pull request May 19, 2023

fix: add LLAMA_CUBLAS on BUILD_TYPE=cublas

512e1e2

Thanks to @Thireus for noticing it. See: #258

ghost mentioned this pull request May 29, 2023

feat: add CuBLAS support in Docker images #403

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for cublas/openblas in the llama.cpp backend #258

feat: add support for cublas/openblas in the llama.cpp backend #258

mudler commented May 14, 2023 •

edited

Loading

mudler commented May 16, 2023

bubthegreat commented May 16, 2023

Thireus commented May 19, 2023 •

edited

Loading

mudler commented May 19, 2023

ghost commented May 22, 2023 •

edited by ghost

Loading

feat: add support for cublas/openblas in the llama.cpp backend #258

feat: add support for cublas/openblas in the llama.cpp backend #258

Conversation

mudler commented May 14, 2023 • edited Loading

Cublas

OpenBLAS

mudler commented May 16, 2023

bubthegreat commented May 16, 2023

Thireus commented May 19, 2023 • edited Loading

mudler commented May 19, 2023

ghost commented May 22, 2023 • edited by ghost Loading

mudler commented May 14, 2023 •

edited

Loading

Thireus commented May 19, 2023 •

edited

Loading

ghost commented May 22, 2023 •

edited by ghost

Loading