Pre-built wheels for llama-cpp-python
To install the package, copy the wheel file's URL from the Releases page and run:
pip install PASTE_THE_COPIED_URL_HERENote
If import llama_cpp returns an error Failed to load shared library: libgomp.so.1, then you need to install the library
sudo apt-get update && apt-get install -y libgomp1Linux x86_64 / Python 3.11 / CPU
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cpu/llama_cpp_python-0.3.15-cp311-cp311-linux_x86_64.whlLinux x86_64 / Python 3.11 / CUDA 12.4
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cu124/llama_cpp_python-0.3.15-cp311-cp311-linux_x86_64.whlLinux x86_64 / Python 3.12 / CPU (Google Colab)
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cpu/llama_cpp_python-0.3.15-cp312-cp312-linux_x86_64.whlLinux x86_64 / Python 3.12 / CUDA 12.4 (Google Colab)
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cu124/llama_cpp_python-0.3.15-cp312-cp312-linux_x86_64.whlWindows amd64 / Python 3.12 / CPU
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cpu/llama_cpp_python-0.3.15-cp312-cp312-win_amd64.whlWindows amd64 / Python 3.12 / CUDA 12.8
pip install https://github.com/sergey21000/llama-cpp-python-wheels/releases/download/v0.3.15-cu128-win/llama_cpp_python-0.3.15-cp312-cp312-win_amd64.whl- Build the latest version of
llama-cpp-pythonwith CUDA support into thewheel_dirdirectory
!CMAKE_ARGS="-DGGML_CUDA=on" pip wheel --no-deps --wheel-dir=wheel_dir llama-cpp-pythonThe build process takes about 30–40 minutes. Make sure that a GPU is enabled in your Colab environment.
Once completed, the .whl file will be located in the wheel_dir directory.
The .whl file will be compiled for the architecture of the current GPU, if you need to compile with support for other CUDA architectures, you need to specify
!CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=75;80;86;89;90" pip wheel --no-deps --wheel-dir=wheel_dir llama-cpp-python- (Optional) Saving the
.whlfile to Google Drive for convenience (after mounting the drive)
import shutil
src_wheel_file = 'wheel_dir/llama_cpp_python-0.3.14-cp311-cp311-linux_x86_64.whl'
trg_wheel_file = '/content/drive/MyDrive/llama_cpp_python-0.3.14-cp311-cp311-linux_x86_64.whl'
shutil.copyfile(src_wheel_file, trg_wheel_file)- Installing from a saved wheel:
!pip install wheel_dir/llama_cpp_python-0.3.14-cp311-cp311-linux_x86_64.whlBuild the latest version of llama-cpp-python with CUDA support into the wheel_dir directory (Windows Powershell)
$env:FORCE_CMAKE='1'; $env:CMAKE_ARGS='-DGGML_CUDA=on'
pip wheel --no-deps --no-cache-dir --wheel-dir=wheel_dir llama-cpp-pythonIf DLLAMA_AVX or other instructions are not supported then you need to specify this
$env:FORCE_CMAKE='1'; $env:CMAKE_ARGS='-DGGML_CUDA=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off'
pip wheel --no-deps --no-cache-dir --wheel-dir=wheel_dir llama-cpp-pythonBuild for other CUDA architectures
$env:FORCE_CMAKE='1'; $env:CMAKE_ARGS='-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=75;80;86;89;90'
pip wheel --no-deps --no-cache-dir --wheel-dir=wheel_dir llama-cpp-pythonInstead of pip wheel you can use pip install to install the library right away
Note
To install llama-cpp-python on Windows with CUDA support, you must first install Visual Studio 2022 Community and CUDA Toolkit, as indicated in this or this instructions
Build the latest version of llama-cpp-python on Termux (Android, aarch64), taken from this comment
pkg update && pkg upgrade
pkg install libexpat openssl python-pip python-cryptography cmake ninja autoconf automake libandroid-execinfo patchelf
# command for build wheels
pip wheel --no-deps --no-cache-dir --wheel-dir=wheel_dir llama-cpp-python
# or command to install
pip install llama-cpp-python