Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for cublas/openblas in the llama.cpp backend #258

Merged
merged 2 commits into from May 16, 2023
Merged

Conversation

mudler
Copy link
Owner

@mudler mudler commented May 14, 2023

Depends on: go-skynet/go-llama.cpp#51

See upstream PR: ggerganov/llama.cpp#1412

Allows to build LocalAI with the llama.cpp backend with cublas/openblas:

Cublas

To build, run:

make BUILD_TYPE=cublas CUDA_LIBPATH=.... build

OpenBLAS

make BUILD_TYPE=openblas build

To set the number of GPU layers, in the config file:

gpu_layers: 4

This also drops the "generic" build type, as I'm sunsetting it in favor of specific cmake parameters

Related to: #69

@mudler mudler marked this pull request as draft May 14, 2023 20:18
@mudler mudler changed the title feat: add support for cublas/openblas on the llama.cpp backend feat: add support for cublas/openblas in the llama.cpp backend May 14, 2023
@mudler mudler force-pushed the gpu branch 3 times, most recently from 1997bf6 to 6a185ca Compare May 14, 2023 21:07
@mudler mudler marked this pull request as ready for review May 16, 2023 14:24
@mudler
Copy link
Owner Author

mudler commented May 16, 2023

Let's merge this to master as it add-only and doesn't hurt as a starting point. I successfully built it on colab, but no way to test this locally. I'll update the docs and let see out of bug reports.

@mudler mudler merged commit acd03d1 into master May 16, 2023
3 checks passed
@mudler mudler deleted the gpu branch May 16, 2023 14:26
@bubthegreat
Copy link

Might be worth dropping this command in a readme that should allow folks to test that they have a valid detectable GPU:

docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi

Example output showing a valid GPU:

PS C:\Users\bubth\Development\LocalAI\nvidia> docker run --gpus all --rm nvidia/cuda:10.2-base nvidia-smi
Unable to find image 'nvidia/cuda:10.2-base' locally
10.2-base: Pulling from nvidia/cuda
25fa05cd42bd: Already exists
24a22c1b7260: Already exists
8dea37be3176: Already exists
b4dc78aeafca: Already exists
a57130ec8de1: Already exists
Digest: sha256:86aba51da8781cc370350a6e30166ab2714229d505fd87f8d28ff6d3677a0ba4
Status: Downloaded newer image for nvidia/cuda:10.2-base
Tue May 16 18:56:46 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti      On | 00000000:01:00.0  On |                  N/A |
| 35%   46C    P8               36W / 350W|   6131MiB / 12288MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
PS C:\Users\bubth\Development\LocalAI\nvidia> 

@Thireus
Copy link

Thireus commented May 19, 2023

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

@mudler
Copy link
Owner Author

mudler commented May 19, 2023

Good stuff!

Although it seems that BUILD_TYPE=cublas doesn't automatically pass LLAMA_CUBLAS=1 to llama.cpp make.

The following solves the issue:

make BUILD_TYPE=cublas LLAMA_CUBLAS=1 build

Which is necessary, otherwise llama.cpp compiles without -DGGML_USE_CUBLAS as seen below.

make -C llama.cpp llama.o
make[2]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o

With LLAMA_CUBLAS=1:

make[1]: Entering directory '/home/thireus/LocalAI/go-llama/llama.cpp'
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -c llama.cpp -o llama.o

good catch @Thireus! thanks! - do you also have a GPU at hand so you can test this out? also, do you feel taking a stab at fixing it? otherwise I'll have a look soon

mudler added a commit that referenced this pull request May 19, 2023
mudler added a commit that referenced this pull request May 19, 2023
mudler added a commit that referenced this pull request May 19, 2023
@ghost
Copy link

ghost commented May 22, 2023

Hey there!
I was able to build localai from an nvidia/cuda image, modifying the Dockerfile to install golang like this:

ARG GO_VERSION=1.20.4
ARG BUILD_TYPE=cublas
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
ENV REBUILD=true
WORKDIR /build
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y checkinstall libgomp1 libopenblas-dev libopenblas-base libopencv-dev libopencv-core-dev git make
RUN apt-get install -y curl unzip
RUN curl -L https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz -o /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/go${GO_VERSION}.linux-amd64.tar.gz
ENV PATH="$PATH::/usr/local/go/bin"
RUN curl -L https://github.com/Kitware/CMake/releases/download/v3.26.4/cmake-3.26.4-linux-x86_64.tar.gz -o /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
RUN tar -C /usr/local -xzf /usr/local/cmake-3.26.4-linux-x86_64.tar.gz
ENV PATH="$PATH::/usr/local/cmake-3.26.4-linux-x86_64/bin"
RUN apt-get install -y ca-certificates
ENV PATH /usr/lib/go-${GO_VERSION}/bin:$PATH
COPY . .
RUN ln -s /usr/include/opencv4/opencv2/ /usr/include/opencv
RUN make build
EXPOSE 8080
ENTRYPOINT [ "/build/entrypoint.sh" ]

Ive run into a couple of issues:
• gpu_layers is ignored and i can't get it to offload any work to the GPU
• If i set REBUILD=false, then the GPU is not used and it assumes that the container is non-cublas/openblas
This is my config file:

- name: gpt-3.5-turbo
  parameters:
    model: Manticore-13B.ggmlv3.q4_0.bin
    temperature: 0.3 
  context_size: 2048
  threads: 6
  backend: llama
  stopwords:
  - "USER:"
  - "### Instruction:"
  roles:
    user: "USER:"
    system: "ASSISTANT:"
    assistant: "ASSISTANT:"
  gpu_layers: 40

Using the provided yaml like in model-gallery yield the error

ERR error loading config file: cannot load config file: cannot unmarshal config file: yaml: unmarshal errors:
  line 1: cannot unmarshal !!map into []*api.Config

Cheers!

@ghost ghost mentioned this pull request May 29, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants