how to use gpu instead cpu #7

mnofrizal · 2023-05-26T15:54:58Z

can we use the gpu to get response more faster than use cpu ?

Anil-matcha · 2023-05-27T01:55:42Z

GPT4All doesn't support GPU acceleration. Will add support for models like Llama which can do this

bradsec · 2023-05-30T01:43:34Z

I was able to get GPU working with this Llama model: ggml-vic13b-q5_1.bin using a manual workaround.

# Download the ggml-vic13b-q5_1.bin model and place in privateGPT/server/models/
# Edit privateGPT.py and comment out GPT4 model and add LLama model
# Change n_gpu_layers=40 layers based on what Nvidia GPU (max is 40). Uses about 9GB VRAM.

def load_model():
    filename = 'ggml-vic13b-q5_1.bin'  # Specify the name for the downloaded file
    models_folder = 'models'  # Specify the name of the folder inside the Flask app root
    file_path = f'{models_folder}/{filename}'
    if os.path.exists(file_path):
        global llm
        callbacks = [StreamingStdOutCallbackHandler()]
        #llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
        llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, n_gpu_layers=40, callbacks=callbacks, verbose=False)

# Edit privateGPT/server/.env

# Update .env as follows
PERSIST_DIRECTORY=db
MODEL_TYPE=LlamaCpp
MODEL_PATH=models/ggml-vic13b-q5_1.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1000

# If using conda enviroment
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

# Remove and reinstall llama-cpp-python with ENV variables set
# Linux uses "export" not "set" like Windows for setting environment variables

pip uninstall llama-cpp-python
export CMAKE_ARGS="-DLLAMA_CUBLAS=on"
export FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

Run python privateGPT from privateGPT/server/ directory
You should see the following lines in output as the model loads

llama_model_load_internal: [cublas] offloading 40 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 9076 MB

jackyoung022 · 2023-05-30T09:27:23Z

Hi, thanks for your info.
But when I was following your step in my windows, I got this error:
Could not load Llama model from path: D:/code/privateGPT/server/models/ggml-vic13b-q5_1.bin. Received error (type=value_error)
Any idea about this? Thanks.

MyraBaba · 2023-06-29T11:19:28Z

@bradsec

Hi,

I followed the instructions but looks still using cpu :

(venPrivateGPT) (base) alp2080@alp2080:~/data/dProjects/privateGPT/server$ python privateGPT.py
/data/dProjects/privateGPT/server/privateGPT.py:1: DeprecationWarning: 'flask.Markup' is deprecated and will be removed in Flask 2.4. Import 'markupsafe.Markup' instead.
from flask import Flask,jsonify, render_template, flash, redirect, url_for, Markup, request
llama.cpp: loading model from models/ggml-vic13b-q5_1.bin
llama_model_load_internal: format = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1000
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: mem required = 11359.05 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size = 781.25 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
LLM0 LlamaCpp
Params: {'model_path': 'models/ggml-vic13b-q5_1.bin', 'suffix': None, 'max_tokens': 256, 'temperature': 0.8, 'top_p': 0.95, 'logprobs': None, 'echo': False, 'stop_sequences': [], 'repeat_penalty': 1.1, 'top_k': 40}

Serving Flask app 'privateGPT'
Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
Running on all addresses (0.0.0.0)
Running on http://127.0.0.1:5000
Running on http://192.168.5.110:5000
Press CTRL+C to quit
Loading documents from source_documents

Musty1 · 2023-07-19T10:23:37Z

I tried this as well and it looks like it's still using CPU.. interesting. If anyone could suggest as to why it's not working with gpu, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use gpu instead cpu #7

how to use gpu instead cpu #7

mnofrizal commented May 26, 2023

Anil-matcha commented May 27, 2023

bradsec commented May 30, 2023 •

edited

jackyoung022 commented May 30, 2023

MyraBaba commented Jun 29, 2023

Musty1 commented Jul 19, 2023

how to use gpu instead cpu #7

how to use gpu instead cpu #7

Comments

mnofrizal commented May 26, 2023

Anil-matcha commented May 27, 2023

bradsec commented May 30, 2023 • edited

jackyoung022 commented May 30, 2023

MyraBaba commented Jun 29, 2023

Musty1 commented Jul 19, 2023

bradsec commented May 30, 2023 •

edited