Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vulkan support to ollama #5059

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Add Vulkan support to ollama #5059

wants to merge 32 commits into from

Conversation

whyvl
Copy link

@whyvl whyvl commented Jun 15, 2024

Edit: (2025/01/19)

It's been around 7 months and ollama devs don't seem to be interested in merging this PR. I'll maintain this fork as a separate project from now on. If you have any issues please raise them in the fork's repo so I can keep track of them.

This PR adds vulkan support to ollama with a proper memory monitoring implementation. This closes #2033 and replaces #2578 which does not implement proper memory monitoring.

Note that this implementation does not support GPU without VkPhysicalDeviceMemoryBudgetPropertiesEXT support. This shouldn't be a problem since on Linux the mesa driver supports it for all Intel devices afaik.

CAP_PERFMON capability is also needed for memory monitoring. This can be done by specifically enabling CAP_PERFMON when running ollama as a systemd service by adding AmbientCapabilities=CAP_PERFMON to the service or just run ollama as root.

Vulkan devices that are CPUs under the hood (e.g. llvmpipe) are also not supported. This is purposely done so to avoid accidentally using CPUs for accelerated inference. Let me know if you think this behavior should be changed.

I've not tested this on Windows nor have I implemented the logic for building ollama with Vulkan support yet because I don't use Windows. If someone can help me with this that would be great.

I've tested this on my machine with an Intel Arc A770:

System:
  Host: rofl Kernel: 6.8.11 arch: x86_64 bits: 64 compiler: gcc v: 13.2.0
  Console: pty pts/2 Distro: NixOS 24.05 (Uakari)
CPU:
  Info: 8-core (4-mt/4-st) model: Intel 0000 bits: 64 type: MST AMCP arch: Raptor Lake rev: 2
    cache: L1: 704 KiB L2: 7 MiB L3: 12 MiB
  Speed (MHz): avg: 473 high: 1100 min/max: 400/4500:3400 cores: 1: 400 2: 400 3: 400 4: 576
    5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 1100 12: 400 bogomips: 59904
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Graphics:
  Device-1: Intel DG2 [Arc A770] vendor: Acer Incorporated ALI driver: i915 v: kernel
    arch: Gen-12.7 pcie: speed: 2.5 GT/s lanes: 1 ports: active: DP-1 empty: DP-2, DP-3, DP-4,
    HDMI-A-1, HDMI-A-2, HDMI-A-3 bus-ID: 03:00.0 chip-ID: 8086:56a0
  Display: server: No display server data found. Headless machine? tty: 98x63
  Monitor-1: DP-1 model: Daewoo HDMI res: 1024x600 dpi: 55 diag: 537mm (21.1")
  API: Vulkan v: 1.3.283 surfaces: N/A device: 0 type: discrete-gpu driver: N/A
    device-ID: 8086:56a0 device: 1 type: cpu driver: N/A device-ID: 10005:0000

@whyvl whyvl mentioned this pull request Jun 15, 2024
@rasodu
Copy link

rasodu commented Jun 15, 2024

Are there any available instructions or guides that outline the steps to install Ollama from its source code on a Windows operating system? I have a Windows 10 machine equipped with an Arc A770 GPU with 8GB of memory

@whyvl
Copy link
Author

whyvl commented Jun 15, 2024

Are there any available instructions or guides that outline the steps to install Ollama from its source code on a Windows operating system? I have a Windows 10 machine equipped with an Arc A770 GPU with 8GB of memory

https://github.com/ollama/ollama/blob/main/docs/development.md

@ddpasa
Copy link

ddpasa commented Jun 15, 2024

I compiled and ran this on Linux (arch, with Intel iGPU). It seems to work as correctly, with the performance and output similar to my hacky version on #2578 .

I think we can abandon my version in favour of this (it was never meant to be merged anyway).

gpu/gpu.go Outdated
index: i,
}

C.vk_check_vram(*vHandles.vulkan, C.int(i), &memInfo)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be nice to have a debugging log here printing the amount of memory detected. (especially with iGPUs this number can be useful)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't ollama do it already? When I was debugging I saw something like Jun 15 20:25:32 rofl strace[403896]: time=2024-06-15T20:25:32.702+08:00 level=INFO source=types.go:102 msg="inference compute" id=0 library=vulkan compute=1.3 driver=1.3 name="Intel(R) Arc(tm) A770 Graphics (DG2)" total="15.9 GiB" available="14.3 GiB"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, but I don't that line exactly. Looks like a CAP_PERFMON thing or I messed up compilation:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: Intel(R) Iris(R) Plus Graphics (ICL GT2) | uma: 1 | fp16: 1 | warp size: 32

llama_new_context_with_model: Vulkan_Host output buffer size = 0.14 MiB
llama_new_context_with_model: Vulkan0 compute buffer size = 234.06 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 24.01 MiB

Vulkan.time=2024-06-16T13:20:27.582+02:00 level=DEBUG source=gpu.go:649 msg="Unable to load vulkan" library=/usr/lib64/libvulkan.so.1.3.279 /usr/lib64/libcap.so.2.69=error !BADKEY="performance monitoring is not allowed. Please enable CAP_PERFMON or run as root to use Vulkan."

nvtop reveals iGPU being used as expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe run ollama as root? Or do setcap cap_perfmon=+ep /path/to/ollama

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

setcap didn't work for some reason, I still get CAP_PERFMON errors. But running with sudo gives:

time=2024-06-16T13:52:35.115+02:00 level=INFO source=gpu.go:355 msg="error looking up vulkan GPU memory" error="device is a CPU"
time=2024-06-16T13:52:35.130+02:00 level=INFO source=types.go:102 msg="inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Iris(R) Plus Graphics" total="0 B" available="0 B"
time=2024-06-16T13:52:35.130+02:00 level=INFO source=types.go:102 msg="inference compute" id=0 library=vulkan compute=1.3 driver=1.3 name="Intel(R) Iris(R) Plus Graphics (ICL GT2)" total="11.4 GiB" available="8.4 GiB"

Copy link
Author

@whyvl whyvl Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vulkan is reporting that the device is a CPU. If it's an iGPU it should've been detected.

You mentioned the performance was similar to when you were testing your branch. Are you sure you are not using CPU inference the entire time? Can you compare the performance against a CPU runner like cpu_avx?

Copy link
Author

@whyvl whyvl Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second read nevermind. It seems like everything is working as expected. Ollama detected two Vulkan devices, one is a CPU software implementation, which is skipped according to the error message, and the last line reports a Vulkan device that is recognized by ollama, which is the actual iGPU.

Copy link

@ddpasa ddpasa Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks right. There is also a lot of oneAPI junk in the logs that confuses me. But it looks like Vulkan works as intended, but I have a CAP_PERFMON problem.

nvtop screenshot below:

2024-06-16_14-37

I wonder why setcap does not work... Could it be that one of the shared libraries (like libcap or libvulkan) needs a setcap instead of ollama binary?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the CAP_PERFMON is likely due to something off in my system. It's trying to load the 32bit library for some reason:

time=2024-06-16T14:58:06.326+02:00 level=DEBUG source=gpu.go:649 msg="Unable to load vulkan" library=/usr/lib/libvulkan.so.1.3.279 /usr/lib32/libcap.so.2.69=error !BADKEY="Unable to load /usr/lib32/libcap.so.2.69 library to query for Vulkan GPUs: /usr/lib32/libcap.so.2.69: wrong ELF class: ELFCLASS32"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading 32bit is expected, it's not related because it'll just skip it when it realizes it can't load it.

gpu/gpu_linux.go Outdated
}

var capLinuxGlobs = []string{
"/usr/lib/x86_64-linux-gnu/libcap.so*",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding * after /user/lib also detects 32bit libraries in the system. Not sure if you want this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this depends on the OS? I need to specify only lib64 on fedora for this to work as lib is 32bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, regarding x86_64 specific usage, this doesn't work on aarch64 like my Raspberry Pi :)

@whyvl
Copy link
Author

whyvl commented Jul 2, 2024

@dhiltgen mind reviewing this? I'd imagine this would be pretty useful for plenty of people. Intel ARC GPUs perform faster on Vulkan than with oneapi and oneapi is still not packaged yet on NixOS. Someone has also mailed me noting how Vulkan support has let them run ollama on Polaris GPUs much faster.

@genehand
Copy link

genehand commented Jul 2, 2024

Working well here on an RX 5700 (the notorious gfx1010). Hoping I can use rocm again with 6.2, but this is a great alternative.

@Zxilly
Copy link

Zxilly commented Jul 2, 2024

I managed to run this on Windows and an AMD GPU, and if I'm successful I'll reciprocate the way I tried.

@whyvl
Copy link
Author

whyvl commented Jul 2, 2024

I managed to run this on Windows and an AMD GPU, and if I'm successful I'll reciprocate the way I tried.

Interesting, I had expected that because I hadn't implemented Vulkan library loading in Windows it wouldn't have detected any Vulkan devices. Please do share how you did it.

@Zxilly
Copy link

Zxilly commented Jul 2, 2024

I'll add the corresponding code, but I'm not that familiar with vulkan and this may take time.

@ddpasa
Copy link

ddpasa commented Jul 3, 2024

@dhiltgen mind reviewing this? I'd imagine this would be pretty useful for plenty of people. Intel ARC GPUs perform faster on Vulkan than with oneapi and oneapi is still not packaged yet on NixOS. Someone has also mailed me noting how Vulkan support has let them run ollama on Polaris GPUs much faster.

Not just Arc, but it also gives nice speedups in Intel iGPUs too (Iris series).

@gioan777
Copy link

gioan777 commented Jul 3, 2024

It works perfectly on Arch Linux with my RX 6700XT as well which doesn't have official ROCm support. I did encounter a couple of hiccups while setting it up, though they're probably distro specific issues with my Arch Linux installation. I'll post the changes I made just for the record.

  • I had to do the following change to the code (git diff result):
diff --git a/llm/generate/gen_linux.sh b/llm/generate/gen_linux.sh
index 0e98e163..411e9e65 100755
--- a/llm/generate/gen_linux.sh
+++ b/llm/generate/gen_linux.sh
@@ -216,7 +216,6 @@ if [ -z "${CAP_ROOT}" ]; then
     CAP_ROOT=/usr/lib/
 fi
 
-if [ -z "${OLLAMA_SKIP_VULKAN_GENERATE}" -a -d "${VULKAN_ROOT}" ] && [ -z "${OLLAMA_SKIP_VULKAN_GENERATE}" -a -d "${CAP_ROOT}" ]; then
     echo "Vulkan and capabilities libraries detected - building dynamic Vulkan library"
     init_vars
 
@@ -232,7 +231,6 @@ if [ -z "${OLLAMA_SKIP_VULKAN_GENERATE}" -a -d "${VULKAN_ROOT}" ] && [ -z "${OLL
     cp "${VULKAN_ROOT}/libvulkan.so" "${BUILD_DIR}/bin/"
     cp "${CAP_ROOT}/libcap.so" "${BUILD_DIR}/bin/"
     compress
-fi
 
 if [ -z "${ONEAPI_ROOT}" ]; then
     # Try the default location in case it exists

otherwise Ollama wouldn't compile with Vulkan support.

  • Then I had to run the following command (./ollama is the final executable):
    sudo setcap 'cap_perfmon=ep' ./ollama
    otherwise ollama upon launching it would complain it didn't have CAP_PERFMON and couldn't use Vulkan, and then reverted to CPU only. Running ./ollama with root also solved the issue but I'm not comfortable running it with root.

@gtors
Copy link

gtors commented Feb 1, 2025

@utherbone, try running sudo setcap cap_perfmon=+ep ./ollama. This may resolve the issue. In my case, when I had OLLAMA_DEBUG=1 enabled, the log contained the following line:

time=2025-02-02T00:39:02.251+03:00 level=DEBUG source=gpu.go:788 msg="Unable to load vulkan" library=/usr/lib32/libvulkan.so.1.4.303 /usr/lib64/libcap.so.2.71=error !BADKEY="Unable to load /usr/lib32/libvulkan.so.1.4.303 library to query for Vulkan GPUs: /usr/lib32/libvulkan.so.1.4.303: wrong ELF class: ELFCLASS32" vulkan: performance monitoring is not allowed. Please enable CAP_PERFMON or run as root to use Vulkan.
time=2025-02-02T00:39:02.251+03:00 level=DEBUG source=gpu.go:788 msg="Unable to load vulkan" library=/usr/lib64/libvulkan.so.1.4.303 /usr/lib/libcap.so.2.71=error !BADKEY="performance monitoring is not allowed. Please enable CAP_PERFMON or run as root to use Vulkan."

@gtors
Copy link

gtors commented Feb 2, 2025

Hmm, for some reason, it doesn’t work for me (Manjaro Linux, Kernel 6.12, 5700 XT, drivers 24.20, qwen2.5-coder:7b). Ollama detects my GPU, but during model execution, it’s not being used at all. The --n-gpu-layers (num_gpu) parameter is passed to the runner, but it seems to be ignored for some reason...

At the same time, LM Studio works perfectly and fully utilizes the GPU, achieving 35 tokens/sec

@McBane87
Copy link

McBane87 commented Feb 2, 2025

Does anyone have a hint on how to build Ollama (based on v0.5.7) with this PR or more specifically with pufferffish:vulkan? I don't see any Makefile rules for vulkan. So only thing I was able to try so far, was building with make dist. Here is my approach, using Docker for building:

FROM --platform=linux/amd64 library/ubuntu:noble as builder

ENV DEBIAN_FRONTEND="noninteractive"

ENV VULKAN_VER_BASE="1.3.296"
ENV VULKAN_VER="${VULKAN_VER_BASE}.0"
ENV UBUNTU_VERSION="noble"

ENV GOLANG_VERSION="1.22.8"
ENV GOARCH="amd64"
ENV CGO_ENABLED=1
ENV LDFLAGS=-s

# Default mirror was very slow
RUN \
    sed -i 's/archive.ubuntu.com/gb.archive.ubuntu.com/g' /etc/apt/sources.list.d/ubuntu.sources

RUN \
    apt-get update && \
    apt-get install -y ca-certificates build-essential ccache cmake wget git curl rsync xz-utils libcap-dev

RUN \
    mkdir -p /usr/local 2>/dev/null || true && \
    curl -s -L https://dl.google.com/go/go${GOLANG_VERSION}.linux-${GOARCH}.tar.gz | tar -xz -C /usr/local && \
    ln -s /usr/local/go/bin/go /usr/local/bin/go && \
    ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt


RUN \
    wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | gpg --dearmor -o /etc/apt/trusted.gpg.d/lunarg-signing-key-pub.gpg && \
    wget -qO /etc/apt/sources.list.d/lunarg-vulkan-${UBUNTU_VERSION}.list https://packages.lunarg.com/vulkan/${VULKAN_VER_BASE}/lunarg-vulkan-${VULKAN_VER_BASE}-${UBUNTU_VERSION}.list && \
    apt update && apt install -y vulkan-sdk

RUN \
    git clone -b vulkan https://github.com/pufferffish/ollama-vulkan.git "/tmp/ollama-vulkan-git" && \
    cd "/tmp/ollama-vulkan-git" && \
    `# COMMENT: Cherry pick open pull request` && \
    for pr in 5; do \
        git ls-remote origin pull/$pr/\* | grep -q '/merge$' && \
        git fetch origin pull/$pr/head:pr-$pr && \
        git merge --no-edit pr-$pr; \
    done

RUN \
    cd "/tmp/ollama-vulkan-git" && \
    export PLATFORM="linux/amd64" && \
    . scripts/env.sh && \
    make -j $(nproc) dist

# Workaround, this is faster.
# Because always re-builds last command, before going to next stage
RUN find /tmp/ollama-vulkan-git/dist/


FROM --platform=linux/amd64 library/ubuntu:noble
RUN \
    apt-get update && \
    apt-get install -y ca-certificates libcap2 libvulkan1 && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
COPY --from=builder /tmp/ollama-vulkan-git/dist/linux-amd64/bin/ /bin/
COPY --from=builder /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ /lib/

EXPOSE 11434
ENV OLLAMA_HOST 0.0.0.0

ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

And the result. Don't see any vulkan specific runners:

#11 [builder 8/8] RUN find /tmp/ollama-vulkan-git/dist/
#11 0.110 /tmp/ollama-vulkan-git/dist/
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama/runners
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama/runners/cpu_avx
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama/runners/cpu_avx/ollama_llama_server
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama/runners/cpu_avx2
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/lib/ollama/runners/cpu_avx2/ollama_llama_server
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/bin
#11 0.110 /tmp/ollama-vulkan-git/dist/linux-amd64/bin/ollama

@rwalle
Copy link

rwalle commented Feb 2, 2025

Probably a dumb question, but how do you build this on Windows? I keep getting vulkan/vulkan.h not found error. I can confirm that Vulkan SDK is installed, and VULKAN_SDK and VK_SDK_PATH correctly point to the SDK folder. I don't see anything in this PR that uses either of the environment variable. I wonder if you are configuring the build flags yourself to add the include search folder.

@mathstuf
Copy link

mathstuf commented Feb 3, 2025

If you have any issues please raise them in the fork's repo so I can keep track of them.

FYI, issues are disabled in the fork, so where is it best to raise an issue?

@whyvl
Copy link
Author

whyvl commented Feb 3, 2025

If you have any issues please raise them in the fork's repo so I can keep track of them.

FYI, issues are disabled in the fork, so where is it best to raise an issue?

My bad I've enabled it

Sync vendored ggml to add Vulkan support
@isoos
Copy link

isoos commented Feb 19, 2025

Vulkan support is important! I was able to run ollama + vulkan locally with ~21% improvement over CPU inference with an AMD iGPU (8700G vs 780M).

@hashangit
Copy link

Thanks so much @pufferffish for the PR. I'm really sorry for the slow reply here.

The core focus currently for Ollama is improving performance on existing supported hardware (e.g. a recent PR added AVX512, AVX/AVX2 + CUDA, and there's an early PR for MLX support). For now the goal is to be the fastest on existing platforms ahead of adding new backends, unless those backends of course help make Ollama faster.

There's also a larger change to Ollama's architecture being worked on to help support new model architectures and modalities. Adding too many backends quickly will slow this down since for every backend integration, Ollama also needs to do GPU discovery and needs to be tested on real GPUs for every release. This is a good chunk of work often not provided by acceleration libraries (although thank you @pufferffish for the work in this PR on the discovery side).

In short, it's a backend we'd like to support once Ollama is the fastest on existing platforms to help add compatibility for more GPUs especially on Linux. Until then, it will be hard to carry it as a backend and/or turn it on by default.

In terms of the recent build changes, the below patch should work to build Vulkan:

From 1e5cf3b5940297b54d654de7ab9981b011777023 Mon Sep 17 00:00:00 2001
From: Ollama <hello@ollama.com>
Date: Tue, 28 Jan 2025 23:05:25 -0800
Subject: [PATCH] vulkan

---
 CMakeLists.txt                     | 13 +++++++++++++
 ml/backend/ggml/ggml/.rsync-filter |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 19d9bd8f..05f8e2c4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -110,3 +110,16 @@ if(CMAKE_HIP_COMPILER)
         endforeach()
     endif()
 endif()
+
+find_package(Vulkan)
+if(Vulkan_FOUND)
+    add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-vulkan)
+    set(OLLAMA_VULKAN_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/vulkan)
+    install(TARGETS ggml-vulkan
+        RUNTIME_DEPENDENCIES
+            PRE_INCLUDE_REGEXES vulkan
+            PRE_EXCLUDE_REGEXES ".*"
+        RUNTIME DESTINATION ${OLLAMA_VULKAN_INSTALL_DIR} COMPONENT Vulkan
+        LIBRARY DESTINATION ${OLLAMA_VULKAN_INSTALL_DIR} COMPONENT Vulkan
+    )
+endif()
diff --git a/ml/backend/ggml/ggml/.rsync-filter b/ml/backend/ggml/ggml/.rsync-filter
index c5acbe49..09d67f27 100644
--- a/ml/backend/ggml/ggml/.rsync-filter
+++ b/ml/backend/ggml/ggml/.rsync-filter
@@ -12,6 +12,8 @@ include src/ggml-cuda/
 include src/ggml-cuda/template-instances/
 include src/ggml-hip/
 include src/ggml-metal/
+include src/ggml-vulkan/
+include src/ggml-vulkan/vulkan-shaders
 include *.c
 include *.h
 include *.cpp
@@ -19,4 +21,5 @@ include *.cu
 include *.cuh
 include *.m
 include *.metal
+include *.comp
 exclude *
-- 
2.47.1

Then run:

make -f Makefile.sync clean sync
cmake --preset Vulkan; cmake --build --preset Vulkan

@jmorganca Ollama is plenty fast on Nvidia GPUs and optimising the speed can wait. Adding support to platforms that enable a massive segment of the market (iGPU users) should take priority over gaining higher speeds on platforms that are already working well enough. Regarding testing, if the main has support for iGPUs (890M for example) I'm sure there are enough members in the community who are more than willing to help with the testing (I know I am if it helps me to get Ollama using my GPU inference).

@Ahmedsaed
Copy link

Hello Everyone!

It seems like there isn’t an easily accessible Docker image for Ollama with Vulkan support, or at least, it’s hard to find one. So, I decided to create one using @whyvl's fork along with some patches shared in his fork’s discussions.

If you’re looking for a straightforward way to run Ollama with Vulkan support, you can use the following Docker command:

docker run -v ~/.ollama:/root/.ollama --name ollama --device /dev/dri:/dev/dri --cap-add PERFMON -p 11434:11434 ahmedsaed26/ollama-vulkan

Or, if you prefer Docker Compose, use this configuration:

services:
  ollama:
    image: ahmedsaed26/ollama-vulkan
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ~/.ollama:/root/.ollama
    devices:
      - /dev/dri:/dev/dri
    cap_add:
      - PERFMON

Then, start the container with:

docker compose up -d

Currently, this image includes Ollama v0.5.11, and I have only tested it on an AMD Radeon RX 470 GPU on linux.

Hope this helps! Let me know if you run into any issues. 🚀

@eliranwong
Copy link

Hi, I am new to this issue. Is there a quick start guide that I can try Ollama with Vulkan backend?

I am running Ubuntu, I did some speed tests ( https://github.com/eliranwong/AMD_iGPU_AI_Setup/tree/main#speed-tests ) to comparing the performance llama.cpp with Vulkan and ROCm backends, and that of Ollama.

In view of the test results, Ollama is not working very well. I would like to try Ollama with Vulkan. seeing long post here, wondering if someone may give me some hints how to compile with Vulkan support ...

@rwalle
Copy link

rwalle commented Feb 25, 2025

Hi, I am new to this issue. Is there a quick start guide that I can try Ollama with Vulkan backend?

I am running Ubuntu, I did some speed tests ( https://github.com/eliranwong/AMD_iGPU_AI_Setup/tree/main#speed-tests ) to comparing the performance llama.cpp with Vulkan and ROCm backends, and that of Ollama.

In view of the test results, Ollama is not working very well. I would like to try Ollama with Vulkan. seeing long post here, wondering if someone may give me some hints how to compile with Vulkan support ...

The author of this PR has been weirdly quiet about providing build instructions. Not here, not in an GH issue in the forked repository (whyvl#7), while actively pushing code updates and merging PRs. I honestly have no idea what's going on. But I would recommend you going through the steps other people provided in that issue instead of waiting for an "official" response.

@EdoaLive
Copy link

For those who want to test Ollama Vulkan without building it and have build related issues, you can find ready to use binary in these (@eliranwong @rwalle etc.)
Windows:
whyvl#7 (comment)
Linux:
whyvl#7 (comment)

I personally tested both and they work well for me.

@eliranwong
Copy link

For those who want to test Ollama Vulkan without building it and have build related issues, you can find ready to use binary in these (@eliranwong @rwalle etc.) Windows: whyvl#7 (comment) Linux: whyvl#7 (comment)

I personally tested both and they work well for me.

May I prefer build instruction instead?

I encountered errors when running your binary:

./ollama: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by ./ollama)
./ollama: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ./ollama)

I would like to compile on my side, thanks.

@EdoaLive
Copy link

I encountered errors when running your binary:

Sorry to hear that. I have not tried to build it, the best thing I can suggest is to read that issue (whyvl#7) and extrapolate the correct build instructions. I wanted to work on a "clean" fork with clean build instructions, but I did not have the time.

@McBane87
Copy link

McBane87 commented Feb 25, 2025

May I prefer build instruction instead?

Please have a look into the Dockerfile contents, I have posted here. If you follow most of the RUN commands you should be good to go.

Additionally, the patch I'm using there is this one: whyvl#7 (comment)

@JamesClarke7283
Copy link

For those who want to test Ollama Vulkan without building it and have build related issues, you can find ready to use binary in these (@eliranwong @rwalle etc.) Windows: whyvl#7 (comment) Linux: whyvl#7 (comment)

I personally tested both and they work well for me.

Hello, i know i am a bit late to the party but i would just like to warn that those binaries were made by third party,not the project maintainer. I did some scans with virus total and locally with clam and maldet and seem okay.

But just wanted to remind people that installing/running random binaries is not ideal. I would advice a CI/CD
pipeline be setup and have reproducible builds.

Many thanks,
James Clarke

@McBane87
Copy link

For those who want to test Ollama Vulkan without building it and have build related issues, you can find ready to use binary in these (@eliranwong @rwalle etc.) Windows: whyvl#7 (comment) Linux: whyvl#7 (comment)
I personally tested both and they work well for me.

Hello, i know i am a bit late to the party but i would just like to warn that those binaries were made by third party,not the project maintainer. I did some scans with virus total and locally with clam and maldet and seem okay.

But just wanted to remind people that installing/running random binaries is not ideal. I would advice a CI/CD pipeline be setup and have reproducible builds.

Many thanks, James Clarke

Absolutely agree. Always feel free to build yourself.
As a starter. Build instructions can be found:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Vulkan runner