Nvidia 555 driver does not work with Ollama #4563

ginestopo · 2024-05-21T21:37:03Z

What is the issue?

I just updated nvidia drivers in my host to this version. I have a RTX4070ti.

Then, when I run ollama inside my container (my container is running Ubuntu 20.04), Ollama is not using the GPU (I can tell because it is at 1% when responding)

This is my log when running Ollama

2024/05/21 21:31:02 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:704 msg="total blobs: 5"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-21T21:31:02.077Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-05-21T21:31:02.077Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama53751129/runners
time=2024-05-21T21:31:04.593Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
time=2024-05-21T21:31:04.669Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.6 GiB" available="165.9 MiB"
[GIN] 2024/05/21 - 21:31:25 | 200 |      34.776µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:25 | 404 |     146.669µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:25 | 200 |  776.506553ms |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/05/21 - 21:31:35 | 200 |      19.307µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:35 | 200 |     482.647µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:35 | 200 |     323.505µs |       127.0.0.1 | POST     "/api/show"
time=2024-05-21T21:31:36.117Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="145.4 MiB" memory.required.full="4.6 GiB" memory.required.partial="794.5 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-05-21T21:31:36.118Z level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama53751129/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 34935"
time=2024-05-21T21:31:36.118Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="952d03d" tid="140248322381696" timestamp=1716327096
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140248322381696" timestamp=1716327096 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="34935" tid="140248322381696" timestamp=1716327096
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
time=2024-05-21T21:31:36.370Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
INFO [main] model loaded | tid="140248322381696" timestamp=1716327097
time=2024-05-21T21:31:37.123Z level=INFO source=server.go:545 msg="llama runner started in 1.00 seconds"
[GIN] 2024/05/21 - 21:31:37 | 200 |   1.64966928s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/05/21 - 21:31:43 | 200 |   4.29836565s |       127.0.0.1 | POST     "/api/chat"

Edit:
In addition, my container detects successfully the gpu passthrough when doing nvidia-smi.

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.1.38

The text was updated successfully, but these errors were encountered:

Zyfax · 2024-05-21T22:07:39Z

Interestingly, I experienced a similar phenomenon when I upgraded from driver version 535 to 550 - my CPU usage remained high until I rebooted the host machine.
As for the nvidia-smi command, if it outputs an error message, it's likely indicative of a problem with the connection between the NVIDIA driver and graphics card.

pdevine · 2024-05-21T22:09:46Z

What's the output of ollama ps?

dhiltgen · 2024-05-22T00:05:26Z

Please add -e OLLAMA_DEBUG=1 to your container and share the log so we can see a little more detail on why it can't discover the GPU. Also try docker run --gpus all ubuntu nvidia-smi to see if the Docker + Nvidia container runtime has become unhealthy.

brodieferguson · 2024-05-22T16:45:53Z

I don't believe it's related to Ollama. I also had this issue. I discovered it when setting up a container unrelated to Ollama. It's the new 555 drivers, and affects any cuda/gpu related container (I tested multiple including base pytorch cuda docker image).

For example, trying to list Cuda capability in pytorch docker gives a "CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error?"

Nvidia computer container benchmark said 1 device requested, 0 available.

Issue was instantly fixed by reverting to prior drivers.

ginestopo · 2024-05-22T17:17:21Z

Thank you very much @brodieferguson ! this seemed to do the trick. Nevertheless, I am not quite happy to downgrade my GPU drivers in order to make Ollama work. For that reason I wouldn't consider this issue resolved and I will cooperate to provide more info to solve this problem in case it is needed.

@dhiltgen I just downgraded my drivers to the immediate version and Ollama started to use the GPU (RTX4070ti) instantly:

There might be an incompatibility with the new drivers as @brodieferguson had the same problem and probably many other users too.

Thank you very much both. I hope I can enjoy Ollama with the latest drivers soon.

nerdpudding · 2024-05-24T07:55:39Z

same issue here:
Ollama worked fine on GPU before upgrading both Ollama and NVIDIA previous drivers so far I know. I am on Windows 11 with WSL2 and using Docker Desktop. This morning I did two things:

noticed new Nvidia drivers available: 555.85; It also included a PhysX update this time (first time I saw that in years actually): version 9.23.1019 --> installed both
docker pull ollama/ollama to get 0.1.38 version (I was on 0.1.37 before)
deleted the existing OIlama container. Then did: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Later I noticed that ollama now no longer uses my GPU, noticing it was much slower and looking at resources there GPU memory was not used. Using the newly available ollama ps command confirmed the same thing:
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 4.6 GB 100% CPU 4 minutes from now

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
| 0 N/A N/A 27 G /Xwayland N/A |
| 0 N/A N/A 31 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+

I will downgrade drivers as well, but there clearly is a issue with ollama and these drivers

nerdpudding · 2024-05-24T08:07:00Z

confirmed, reverted to 552.42 drivers, GPU is now used again.
ollama ps
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 5.1 GB 100% GPU 4 minutes from now

ginestopo · 2024-05-24T08:13:36Z

@nerdpudding thanks for your contribution. We can confirm this is not an isolated issue and nvidia drivers 555.85 causes ollama not to use the gpu for some reason.

TSavo · 2024-05-26T04:14:52Z

Can confirm, no CUDA docker image works with 555. downgrading to 552 fixes the issue. This is unrelated to ollama and needs to be fixed by docker/nvidia.

jmorganca · 2024-05-26T05:51:23Z

Hi folks it seems the 555 Nvidia driver branch is not working with Ollama (and other projects that integrate llama.cpp). We're working to resolve this together – in the meantime downgrading to a prior version will fix the issue. So sorry about this and will post more updates here.

jmorganca · 2024-05-26T18:57:57Z

Hi all, this seems to be from the nvidia_uvm kernel module not being loaded. You can run:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

to load them manually. A fix is coming with the install script.

Also, adding:

nvidia
nvidia-uvm

to /etc/modules-load.d/nvidia.conf will make sure they are loaded on startup

jmorganca · 2024-05-26T22:00:53Z

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf

nvidia
nvidia-uvm

ginestopo · 2024-05-27T06:18:23Z

@jmorganca Thanks a lot for the fix! 💙

falmanna · 2024-05-27T08:29:01Z

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:
sudo modprobe nvidia
sudo modprobe nvidia_uvm
and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf
nvidia
nvidia-uvm

Not sure how to apply these on WLS docker installation. I downgraded my driver for now and it is working again.

nerdpudding · 2024-06-03T12:32:58Z

Same issue here. The modprobe nvidia fix doesn't seem to work with WSL 2.

I’m not an expert, but I tried different Docker builds using the install script with pytorch:2.3.0-cuda12.1-cudnn8-runtime as the base image. Everything works fine with older NVIDIA drivers, but not with 555.85.

Even though nvidia-smi shows the GPU is available (both in WSL and the running container), Ollama defaults to CPU. The debug logs look like this:

time=2024-06-03T11:59:56.076Z level=DEBUG source=gpu.go:355 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="nvcuda init failure: 500"

Any ideas or suggestions on what might be causing this with the latest drivers? Should we just await if a newer nvidia driver release fixes it?

ginestopo · 2024-06-03T12:36:43Z

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

nerdpudding · 2024-06-03T12:38:44Z

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

Yes. First I tried pulling the lates official docker image.
Then I used this custom docker build I must admit that (chatGPT created that dockerfile for me though... it is probably not ideal...), which uses the install.sh and installs v0.1.41:

Use the official PyTorch image with CUDA support

FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

Install dependencies

RUN apt-get update && apt-get install -y
curl
sudo
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*

Create a non-root user

RUN useradd -m -s /bin/bash ollama
RUN echo 'ollama ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER ollama
WORKDIR /home/ollama

Install Ollama

RUN curl -fsSL https://ollama.com/install.sh | sh

Ensure the volumes are correctly set up

VOLUME ["/root/.ollama"]

Set environment variables for CUDA and debugging

ENV OLLAMA_DEBUG=1
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
ENV PATH=/usr/local/cuda/bin:$PATH

Ensure correct library links

RUN sudo mkdir -p /usr/local/cuda/lib64/stubs && sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/local/cuda/lib64/stubs/libcuda.so

Run Ollama

CMD ["ollama", "serve"]

This works fine with the old drivers, not with the new.

brodieferguson · 2024-06-03T12:51:19Z

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

NVIDIA/nvidia-container-toolkit#520

If you see this symptom using Docker CE on Linux under WSL2, please update your nvidia-container-toolkit to 1.14.4 or newer.

If you see this symptom using Docker Desktop, a fix (to upgrade the bundled nvidia-container-toolkit) is in progress; we will reply back here when it is published. Until that fix is ready, if you are using Docker Desktop, please use NVIDIA Driver 552.xx or earlier.

nerdpudding · 2024-06-03T12:57:13Z

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

zimdin12 · 2024-06-06T14:16:25Z

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

nerdpudding · 2024-06-06T17:38:20Z

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

Not sure, but to my understanding, it is a NVIDIA issue so...
By the way, I noticed newer drivers where released today (555.99) and hoped it fixed it after reading this:

https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/543951/geforce-grd-55599-feedback-thread-released-6424/
Fixed General Bugs:
CUDA 12.5 does not work with CUDA enabled Docker images [4668302]

I just installed, rebooted and tested with that newer driver using both my own docker file (which uses pytorch:2.3.0-cuda12.1-cudnn8-runtime and then installs ollama using the install.sh script) and the latest official Ollama Docker image. Both still only use the CPU. I reverted back to 551.44 and immediately GPU was used again. So apparently that 'general fix' still does not apply to WLS2 with Docker Desktop yet then, but only for CE maybe.

I'm not sure if there is another open issue on it here, but I guess it is a NVIDIA issue, so we probably just have to look on forum/threads there and wait until they release a fix with newer drivers.

falmanna · 2024-06-06T21:28:11Z

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

I subscribed to this one

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

NVIDIA/nvidia-container-toolkit#520

nerdpudding · 2024-06-07T05:32:09Z

Docker has released an update for Docker Desktop.

See https://docs.docker.com/desktop/release-notes/
Upgrades:
--> NVIDIA Container Toolkit v1.15.0

I just tested it and GPU is used now with Nvidia drivers 555.99 after upgrading Docker Desktop to 4.31.0

This fixed it for me!

So if you are using Docker on Windows with WSL 2 (Now not only for Docker CE, but also Docker Desktop), after updating, it will work again.

ginestopo added the bug Something isn't working label May 21, 2024

dhiltgen self-assigned this May 22, 2024

dhiltgen added nvidia Issues relating to Nvidia GPUs and CUDA docker Issues relating to using ollama in containers labels May 22, 2024

jmorganca changed the title ~~After updating nvidia drivers in my host, ollama inside a docker container running ubuntu does not use GPU~~ Nvidia 555 driver does not work with Ollama May 26, 2024

jmorganca pinned this issue May 26, 2024

jmorganca mentioned this issue May 26, 2024

1.0.39 pre-release - timed out waiting for llama runner to start #4636

Closed

dcasota mentioned this issue May 26, 2024

langchain-python-rag-privategpt "Cannot submit more than 5,461 embeddings at once" #4476

Open

jmorganca mentioned this issue May 26, 2024

Ensure nvidia and nvidia_uvm kernel modules are loaded in install.sh script and at startup #4652

Merged

jmorganca closed this as completed in #4652 May 26, 2024

jmorganca unpinned this issue May 26, 2024

lukasmwerner mentioned this issue Jun 12, 2024

Ollama not using GPU after OS Reboot #4984

Closed

jobongo mentioned this issue Jun 20, 2024

CUDA 12.5 support or GPU acceleration not working after graphics driver update mudler/LocalAI#2394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia 555 driver does not work with Ollama #4563

Nvidia 555 driver does not work with Ollama #4563

ginestopo commented May 21, 2024 •

edited

Loading

Zyfax commented May 21, 2024

pdevine commented May 21, 2024

dhiltgen commented May 22, 2024

brodieferguson commented May 22, 2024 •

edited

Loading

ginestopo commented May 22, 2024

nerdpudding commented May 24, 2024

nerdpudding commented May 24, 2024

ginestopo commented May 24, 2024

TSavo commented May 26, 2024

jmorganca commented May 26, 2024

jmorganca commented May 26, 2024

jmorganca commented May 26, 2024

ginestopo commented May 27, 2024 •

edited

Loading

falmanna commented May 27, 2024

nerdpudding commented Jun 3, 2024

ginestopo commented Jun 3, 2024

nerdpudding commented Jun 3, 2024 •

edited

Loading

brodieferguson commented Jun 3, 2024

nerdpudding commented Jun 3, 2024

zimdin12 commented Jun 6, 2024

nerdpudding commented Jun 6, 2024

falmanna commented Jun 6, 2024

nerdpudding commented Jun 7, 2024

Nvidia 555 driver does not work with Ollama #4563

Nvidia 555 driver does not work with Ollama #4563

Comments

ginestopo commented May 21, 2024 • edited Loading

What is the issue?

OS

GPU

CPU

Ollama version

Zyfax commented May 21, 2024

pdevine commented May 21, 2024

dhiltgen commented May 22, 2024

brodieferguson commented May 22, 2024 • edited Loading

ginestopo commented May 22, 2024

nerdpudding commented May 24, 2024

nerdpudding commented May 24, 2024

ginestopo commented May 24, 2024

TSavo commented May 26, 2024

jmorganca commented May 26, 2024

jmorganca commented May 26, 2024

jmorganca commented May 26, 2024

ginestopo commented May 27, 2024 • edited Loading

falmanna commented May 27, 2024

nerdpudding commented Jun 3, 2024

ginestopo commented Jun 3, 2024

nerdpudding commented Jun 3, 2024 • edited Loading

Use the official PyTorch image with CUDA support

Install dependencies

Create a non-root user

Install Ollama

Ensure the volumes are correctly set up

Set environment variables for CUDA and debugging

Ensure correct library links

Run Ollama

brodieferguson commented Jun 3, 2024

nerdpudding commented Jun 3, 2024

zimdin12 commented Jun 6, 2024

nerdpudding commented Jun 6, 2024

falmanna commented Jun 6, 2024

nerdpudding commented Jun 7, 2024

ginestopo commented May 21, 2024 •

edited

Loading

brodieferguson commented May 22, 2024 •

edited

Loading

ginestopo commented May 27, 2024 •

edited

Loading

nerdpudding commented Jun 3, 2024 •

edited

Loading