Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia 555 driver does not work with Ollama #4563

Closed
ginestopo opened this issue May 21, 2024 · 23 comments · Fixed by #4652
Closed

Nvidia 555 driver does not work with Ollama #4563

ginestopo opened this issue May 21, 2024 · 23 comments · Fixed by #4652
Assignees
Labels
bug Something isn't working docker Issues relating to using ollama in containers nvidia Issues relating to Nvidia GPUs and CUDA

Comments

@ginestopo
Copy link

ginestopo commented May 21, 2024

What is the issue?

I just updated nvidia drivers in my host to this version. I have a RTX4070ti.
imagen

Then, when I run ollama inside my container (my container is running Ubuntu 20.04), Ollama is not using the GPU (I can tell because it is at 1% when responding)

This is my log when running Ollama

2024/05/21 21:31:02 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:704 msg="total blobs: 5"
time=2024-05-21T21:31:02.076Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-21T21:31:02.077Z level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-05-21T21:31:02.077Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama53751129/runners
time=2024-05-21T21:31:04.593Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60002 cpu cpu_avx cpu_avx2]"
time=2024-05-21T21:31:04.669Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.6 GiB" available="165.9 MiB"
[GIN] 2024/05/21 - 21:31:25 | 200 |      34.776µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:25 | 404 |     146.669µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:25 | 200 |  776.506553ms |       127.0.0.1 | POST     "/api/pull"
[GIN] 2024/05/21 - 21:31:35 | 200 |      19.307µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/05/21 - 21:31:35 | 200 |     482.647µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/05/21 - 21:31:35 | 200 |     323.505µs |       127.0.0.1 | POST     "/api/show"
time=2024-05-21T21:31:36.117Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="145.4 MiB" memory.required.full="4.6 GiB" memory.required.partial="794.5 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-05-21T21:31:36.118Z level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama53751129/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 34935"
time=2024-05-21T21:31:36.118Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-21T21:31:36.119Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="952d03d" tid="140248322381696" timestamp=1716327096
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140248322381696" timestamp=1716327096 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="34935" tid="140248322381696" timestamp=1716327096
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
time=2024-05-21T21:31:36.370Z level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
INFO [main] model loaded | tid="140248322381696" timestamp=1716327097
time=2024-05-21T21:31:37.123Z level=INFO source=server.go:545 msg="llama runner started in 1.00 seconds"
[GIN] 2024/05/21 - 21:31:37 | 200 |   1.64966928s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/05/21 - 21:31:43 | 200 |   4.29836565s |       127.0.0.1 | POST     "/api/chat"

Edit:
In addition, my container detects successfully the gpu passthrough when doing nvidia-smi.

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.1.38

@ginestopo ginestopo added the bug Something isn't working label May 21, 2024
@Zyfax
Copy link

Zyfax commented May 21, 2024

Interestingly, I experienced a similar phenomenon when I upgraded from driver version 535 to 550 - my CPU usage remained high until I rebooted the host machine.
As for the nvidia-smi command, if it outputs an error message, it's likely indicative of a problem with the connection between the NVIDIA driver and graphics card.

@pdevine
Copy link
Contributor

pdevine commented May 21, 2024

What's the output of ollama ps?

@dhiltgen dhiltgen self-assigned this May 22, 2024
@dhiltgen dhiltgen added nvidia Issues relating to Nvidia GPUs and CUDA docker Issues relating to using ollama in containers labels May 22, 2024
@dhiltgen
Copy link
Collaborator

Please add -e OLLAMA_DEBUG=1 to your container and share the log so we can see a little more detail on why it can't discover the GPU. Also try docker run --gpus all ubuntu nvidia-smi to see if the Docker + Nvidia container runtime has become unhealthy.

@brodieferguson
Copy link

brodieferguson commented May 22, 2024

I don't believe it's related to Ollama. I also had this issue. I discovered it when setting up a container unrelated to Ollama. It's the new 555 drivers, and affects any cuda/gpu related container (I tested multiple including base pytorch cuda docker image).

For example, trying to list Cuda capability in pytorch docker gives a "CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error?"

Nvidia computer container benchmark said 1 device requested, 0 available.

Issue was instantly fixed by reverting to prior drivers.

@ginestopo
Copy link
Author

Thank you very much @brodieferguson ! this seemed to do the trick. Nevertheless, I am not quite happy to downgrade my GPU drivers in order to make Ollama work. For that reason I wouldn't consider this issue resolved and I will cooperate to provide more info to solve this problem in case it is needed.

@dhiltgen I just downgraded my drivers to the immediate version and Ollama started to use the GPU (RTX4070ti) instantly:
imagen

There might be an incompatibility with the new drivers as @brodieferguson had the same problem and probably many other users too.

Thank you very much both. I hope I can enjoy Ollama with the latest drivers soon.

@nerdpudding
Copy link

same issue here:
Ollama worked fine on GPU before upgrading both Ollama and NVIDIA previous drivers so far I know. I am on Windows 11 with WSL2 and using Docker Desktop. This morning I did two things:

  1. noticed new Nvidia drivers available: 555.85; It also included a PhysX update this time (first time I saw that in years actually): version 9.23.1019 --> installed both
  2. docker pull ollama/ollama to get 0.1.38 version (I was on 0.1.37 before)
  3. deleted the existing OIlama container. Then did: docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Later I noticed that ollama now no longer uses my GPU, noticing it was much slower and looking at resources there GPU memory was not used. Using the newly available ollama ps command confirmed the same thing:
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 4.6 GB 100% CPU 4 minutes from now

nvidia-smi clearly showed GPU is available:
nvidia-smi
Fri May 24 07:34:10 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:0A:00.0 On | Off |
| 50% 27C P8 52W / 530W | 933MiB / 24564MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
| 0 N/A N/A 27 G /Xwayland N/A |
| 0 N/A N/A 31 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+

I will downgrade drivers as well, but there clearly is a issue with ollama and these drivers

@nerdpudding
Copy link

confirmed, reverted to 552.42 drivers, GPU is now used again.
ollama ps
NAME ID SIZE PROCESSOR UNTIL
mistral:latest 61e88e884507 5.1 GB 100% GPU 4 minutes from now

@ginestopo
Copy link
Author

@nerdpudding thanks for your contribution. We can confirm this is not an isolated issue and nvidia drivers 555.85 causes ollama not to use the gpu for some reason.

@TSavo
Copy link

TSavo commented May 26, 2024

Can confirm, no CUDA docker image works with 555. downgrading to 552 fixes the issue. This is unrelated to ollama and needs to be fixed by docker/nvidia.

@jmorganca jmorganca changed the title After updating nvidia drivers in my host, ollama inside a docker container running ubuntu does not use GPU Nvidia 555 driver does not work with Ollama May 26, 2024
@jmorganca jmorganca pinned this issue May 26, 2024
@jmorganca
Copy link
Member

Hi folks it seems the 555 Nvidia driver branch is not working with Ollama (and other projects that integrate llama.cpp). We're working to resolve this together – in the meantime downgrading to a prior version will fix the issue. So sorry about this and will post more updates here.

@jmorganca
Copy link
Member

Hi all, this seems to be from the nvidia_uvm kernel module not being loaded. You can run:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

to load them manually. A fix is coming with the install script.

Also, adding:

nvidia
nvidia-uvm

to /etc/modules-load.d/nvidia.conf will make sure they are loaded on startup

@jmorganca
Copy link
Member

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf

nvidia
nvidia-uvm

@ginestopo
Copy link
Author

ginestopo commented May 27, 2024

@jmorganca Thanks a lot for the fix! 💙

@falmanna
Copy link

Hi folks it seems this is from the new driver packages not loading the nvidia_uvm kernel module from what I can see. It should work to re-load the module:

sudo modprobe nvidia
sudo modprobe nvidia_uvm

and then to keep it loaded, edit the conrfig for nvidia-persistenced by adding them to /etc/modules-load.d/nvidia.conf

nvidia
nvidia-uvm

Not sure how to apply these on WLS docker installation. I downgraded my driver for now and it is working again.

@nerdpudding
Copy link

Same issue here. The modprobe nvidia fix doesn't seem to work with WSL 2.

I’m not an expert, but I tried different Docker builds using the install script with pytorch:2.3.0-cuda12.1-cudnn8-runtime as the base image. Everything works fine with older NVIDIA drivers, but not with 555.85.

Even though nvidia-smi shows the GPU is available (both in WSL and the running container), Ollama defaults to CPU. The debug logs look like this:

time=2024-06-03T11:59:56.076Z level=DEBUG source=gpu.go:355 msg="Unable to load nvcuda" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="nvcuda init failure: 500"

Any ideas or suggestions on what might be causing this with the latest drivers? Should we just await if a newer nvidia driver release fixes it?

@ginestopo
Copy link
Author

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

@nerdpudding
Copy link

nerdpudding commented Jun 3, 2024

@nerdpudding Did you try using Ollama v0.1.41 (the latest release) ?

Yes. First I tried pulling the lates official docker image.
Then I used this custom docker build I must admit that (chatGPT created that dockerfile for me though... it is probably not ideal...), which uses the install.sh and installs v0.1.41:

Use the official PyTorch image with CUDA support

FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

Install dependencies

RUN apt-get update && apt-get install -y
curl
sudo
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*

Create a non-root user

RUN useradd -m -s /bin/bash ollama
RUN echo 'ollama ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER ollama
WORKDIR /home/ollama

Install Ollama

RUN curl -fsSL https://ollama.com/install.sh | sh

Ensure the volumes are correctly set up

VOLUME ["/root/.ollama"]

Set environment variables for CUDA and debugging

ENV OLLAMA_DEBUG=1
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
ENV PATH=/usr/local/cuda/bin:$PATH

Ensure correct library links

RUN sudo mkdir -p /usr/local/cuda/lib64/stubs && sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/local/cuda/lib64/stubs/libcuda.so

Run Ollama

CMD ["ollama", "serve"]

This works fine with the old drivers, not with the new.

@brodieferguson
Copy link

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

NVIDIA/nvidia-container-toolkit#520

  • If you see this symptom using Docker CE on Linux under WSL2, please update your nvidia-container-toolkit to 1.14.4 or newer.
  • If you see this symptom using Docker Desktop, a fix (to upgrade the bundled nvidia-container-toolkit) is in progress; we will reply back here when it is published. Until that fix is ready, if you are using Docker Desktop, please use NVIDIA Driver 552.xx or earlier.

@nerdpudding
Copy link

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

@zimdin12
Copy link

zimdin12 commented Jun 6, 2024

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

@nerdpudding
Copy link

Yes, Docker Desktop with Windows 11 and WSL2 indeed. Older drivers still work fine. I missed that the workaround with the 'nvidia_uvm' was not possible with docker desktop and I was (pointlessly) trying to look for a something that does...Thanks for pointing out they are still working on a solution for Docker Desktop users, I'll just be more patient :-)

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

Not sure, but to my understanding, it is a NVIDIA issue so...
By the way, I noticed newer drivers where released today (555.99) and hoped it fixed it after reading this:

https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/543951/geforce-grd-55599-feedback-thread-released-6424/
Fixed General Bugs:
CUDA 12.5 does not work with CUDA enabled Docker images [4668302]

I just installed, rebooted and tested with that newer driver using both my own docker file (which uses pytorch:2.3.0-cuda12.1-cudnn8-runtime and then installs ollama using the install.sh script) and the latest official Ollama Docker image. Both still only use the CPU. I reverted back to 551.44 and immediately GPU was used again. So apparently that 'general fix' still does not apply to WLS2 with Docker Desktop yet then, but only for CE maybe.

I'm not sure if there is another open issue on it here, but I guess it is a NVIDIA issue, so we probably just have to look on forum/threads there and wait until they release a fix with newer drivers.

@falmanna
Copy link

falmanna commented Jun 6, 2024

Is there a separate issue for that (this one got closed) ? I would like to keep my eye on it. then I would know when to update :D

I subscribed to this one

@nerdpudding Are you using Docker Desktop? smi worked for me too, but not the rest.

NVIDIA/nvidia-container-toolkit#520

@nerdpudding
Copy link

Docker has released an update for Docker Desktop.

See https://docs.docker.com/desktop/release-notes/
Upgrades:
--> NVIDIA Container Toolkit v1.15.0

I just tested it and GPU is used now with Nvidia drivers 555.99 after upgrading Docker Desktop to 4.31.0

This fixed it for me!

So if you are using Docker on Windows with WSL 2 (Now not only for Docker CE, but also Docker Desktop), after updating, it will work again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docker Issues relating to using ollama in containers nvidia Issues relating to Nvidia GPUs and CUDA
Projects
None yet
10 participants