Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Capability Misidentification with PhysX cudart library #4008

Closed
aaronjrod opened this issue Apr 28, 2024 · 15 comments · Fixed by #4135 or #4067
Closed

Compute Capability Misidentification with PhysX cudart library #4008

aaronjrod opened this issue Apr 28, 2024 · 15 comments · Fixed by #4135 or #4067
Assignees
Labels
bug Something isn't working nvidia Issues relating to Nvidia GPUs and CUDA

Comments

@aaronjrod
Copy link

aaronjrod commented Apr 28, 2024

What is the issue?

Ollama server incorrectly identifies the Compute Capability of my GPU (detects 1.0 instead of 5.2). It seems to me that this is due to a recent change in gpu/gpu.go. Thanks!

Previously: CUDART CUDA Compute Capability detected: 5.2
Now: CUDA GPU is too old. Compute Capability detected: 1.0

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.33-rc5

Workaround

Remove c:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\ from your PATH environment variable so Ollama does not use this cuda runtime library.

@aaronjrod aaronjrod added the bug Something isn't working label Apr 28, 2024
@dhiltgen dhiltgen self-assigned this Apr 28, 2024
@dhiltgen
Copy link
Collaborator

I'll try to find it by code inspection, but could you share a server log with OLLAMA_DEBUG=1 set?

@dhiltgen dhiltgen added the nvidia Issues relating to Nvidia GPUs and CUDA label Apr 28, 2024
@aaronjrod
Copy link
Author

PS C:\Users\Aaron> $env:OLLAMA_DEBUG="1"
PS C:\Users\Aaron> ollama serve
time=2024-04-28T16:56:46.238-04:00 level=INFO source=images.go:821 msg="total blobs: 5"
time=2024-04-28T16:56:46.240-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0"
time=2024-04-28T16:56:46.241-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)"
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx2
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cuda_v11.3
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\rocm_v5.7
time=2024-04-28T16:56:46.241-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]"
time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-04-28T16:56:46.242-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=cudart64_*.dll
time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll C:\\Python38\\Scripts\\cudart64_*.dll* C:\\Python38\\cudart64_*.dll* C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\iCLS Client\\cudart64_*.dll* C:\\Program Files\\Intel\\iCLS Client\\cudart64_*.dll* C:\\WINDOWS\\system32\\cudart64_*.dll* C:\\WINDOWS\\cudart64_*.dll* C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll* C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\TortoiseGit\\bin\\cudart64_*.dll* C:\\Program Files\\nodejs\\cudart64_*.dll* C:\\UnxTools\\cudart64_*.dll* C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll* C:\\Program Files\\dotnet\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\130\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\170\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files (x86)\\Yarn\\bin\\cudart64_*.dll* C:\\ProgramData\\chocolatey\\bin\\cudart64_*.dll* c:\\k\\cudart64_*.dll* C:\\tools\\java\\jdk1.8.0_221\\bin\\cudart64_*.dll* C:\\Program Files\\Git\\cmd\\cudart64_*.dll* C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2024.1.1\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\atom\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Git\\cmd\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Roaming\\npm\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll*]"
time=2024-04-28T16:56:46.262-04:00 level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_110.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll]"
cudaSetDevice err: 3
time=2024-04-28T16:56:46.462-04:00 level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll error="cudart init failure: 3"
CUDA driver version: 9-1
time=2024-04-28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll" count=1
time=2024-04-28T16:56:46.470-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GPU-00000000-0100-0000-00c0-000000000000] CUDA totalMem 0
[GPU-00000000-0100-0000-00c0-000000000000] CUDA freeMem 3540602060
[GPU-00000000-0100-0000-00c0-000000000000] Compute Capability 1.0
time=2024-04-28T16:56:46.552-04:00 level=INFO source=gpu.go:148 msg="[0] CUDA GPU is too old. Compute Capability detected: 1.0"
time=2024-04-28T16:56:46.554-04:00 level=DEBUG source=amd_windows.go:32 msg="unable to load amdhip64.dll: The specified module could not be found."

@dhiltgen
Copy link
Collaborator

dhiltgen commented Apr 30, 2024

Yikes, yeah, those responses are definitely incorrect. I have a suspicion on what's going wrong, and should be able to get a fix before we finalize 0.1.33.

@dhiltgen
Copy link
Collaborator

dhiltgen commented Apr 30, 2024

One data point that may help, can you search on your system for other instances of cudart64_*.dll and try putting those directories early in your PATH and see if it changes behavior?

In addition, can you share the output of nvidia-smi so I can see your driver version, and a bit more about your GPU?

@dhiltgen
Copy link
Collaborator

dhiltgen commented Apr 30, 2024

One more data point. On my test system running Win 11 pro, I have driver 546.12 and cuda v12.3 installed (as well as v11). Our bundled v11 cudart64_110.dll works with my driver and GPU, but if I force ollama to use C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_65.dll I get a bogus Compute Capability 1.0 as well. Looking at the other fields in the response from cudaGetDeviceProperties it seems consistently wrong. Switching to other cudart libraries on my system I see correct results. I don't understand yet why the PhysX cudart library isn't working, but I think if you find another cuda library on your host and add that into the PATH before the PhysX directory it should start working.

@aaronjrod
Copy link
Author

aaronjrod commented Apr 30, 2024

No longer able to replicate the Compute Capability issue, updating Cuda and restarting a couple times may have done it 😅😅

By bundled, are you referring to the .dll stored in Programs/Ollama?

time=2024-04-30T18:43:51.616-04:00 level=INFO source=images.go:821 msg="total blobs: 5"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-30T18:43:51.665-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1
time=2024-04-30T18:43:51.674-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GIN] 2024/04/30 - 18:44:18 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/04/30 - 18:44:18 | 200 |      2.7838ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/04/30 - 18:44:18 | 200 |      2.5923ms |       127.0.0.1 | POST     "/api/show"
time=2024-04-30T18:44:18.674-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-30T18:44:18.690-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1
time=2024-04-30T18:44:18.690-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.682-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.682-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-30T18:44:22.704-04:00 level=INFO source=server.go:290 msg="starting llama server" cmd="C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cuda_v11.3\\ollama_llama_server.exe --model C:\\Users\\Aaron\\.ollama\\models\\blobs\\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 51645"
time=2024-04-30T18:44:22.711-04:00 level=INFO source=sched.go:327 msg="loaded runners" count=1
time=2024-04-30T18:44:22.711-04:00 level=INFO source=server.go:439 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"15128","timestamp":1714517062}
{"build":2737,"commit":"46e12c4","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"15128","timestamp":1714517062}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 | ","tid":"15128","timestamp":1714517062,"total_threads":4}
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\Aaron\.ollama\models\blobs\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
...
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 17 repeating layers to GPU
llm_load_tensors: offloaded 17/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4437.80 MiB
llm_load_tensors:      CUDA0 buffer size =  1989.53 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =   120.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   136.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.50 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   677.48 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    12.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 169
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.15       Driver Version: 512.15       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| 36%   54C    P8    15W / 151W |   3175MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12592      C   ...3\ollama_llama_server.exe    N/A      |
+-----------------------------------------------------------------------------+

@aaronjrod
Copy link
Author

aaronjrod commented Apr 30, 2024

However, it seems that while the model is loaded into VRAM, all the compute is done at the CPU level. Looking into solutions, any suggestions? I see that not all layers are sent to the GPU. I tried using phi3 as a smaller model (in case llama3 is not being fully loaded), but it is definitely being ran on the CPU. Is shared memory with the integrated GPU (the other 20 GB) not sufficient as VRAM?

Phi3:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 30 repeating layers to GPU
llm_load_tensors: offloaded 30/33 layers to GPU
llm_load_tensors:        CPU buffer size =  2210.78 MiB
llm_load_tensors:      CUDA0 buffer size =  1942.31 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =    48.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   720.00 MiB
llama_new_context_with_model: KV self size  =  768.00 MiB, K (f16):  384.00 MiB, V (f16):  384.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.13 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   185.06 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    16.01 MiB

CPU being maxxed, GPU doing no compute:

image

@aaronjrod
Copy link
Author

aaronjrod commented May 1, 2024

Ok, I assume the issue is the same as #3201. However, not sure why phi3 will not fit all of its layers in VRAM (30 of 33 layers), given that it is a 2.3GB model and I have 4 GB of VRAM. Task manager also shows that dedicated GPU memory is 3.0/4.0, any suggestions on how to fit the full model/get Ollama to use a little more VRAM?

I understand that the question is unrelated to the original thread (and apologize for that), thanks for the help!

@aaronjrod
Copy link
Author

aaronjrod commented May 1, 2024

-28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cuda

Ran into the compute capability again this morning (Compute Capability 1.0), and saw that Ollama was reading the cudart from Common. Then, I checked nvidia-smi, and saw that the CUDA version was a major version lower than what it last night (??? thanks windows)

Updated GeForce drivers, CUDA version is now 12.4. Also added Ollama further up in path. Seems to have done the trick, GPU is in use, the cudart used is the Ollama one. 16 tokens per second on Phi3, 30/33 layers allocated in VRAM (3.0/4.0 GB in use). Lots of CPU usage, I guess it is what it is then?

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 1, 2024

Happy to hear you got it running on GPU. I'm still trying to get to the bottom of why that PhysX cudart library behaves strangely. I'm sort of wondering if it's exposing some sort of "virtual" GPU.

We include a copy of cudart v11 in the distribution to try to make it easier for users to install without having to add the cuda libraries on their host. There's some combination of factors that causes that bundled version to not work for some users which we're still trying to get to the bottom of.

As to the layers question - we're continuing to refine our prediction algorithm to maximize VRAM usage without hitting OOM crashes. Model architecture, context size and other factors can influence the actual VRAM usage at runtime compared to the on-disk size of the model.

I'd like to keep this issue tracking the unexplained PhysX cudart behavior leading to misidentification as CC 1.0.

@dhiltgen dhiltgen changed the title Compute Capability Misidentification Compute Capability Misidentification with PhysX cudart library May 1, 2024
@Eisaichen
Copy link

I found 0.1.33 loaded significantly more lib files from the system, including nvcuda.dll nvcuda64.dll and nvapi64.dll, while 0.1.32 did not need to.
Maybe 0.1.32 using the bundled runtime as you mentioned, however, 0.1.33 didn't?

0.1.32 Dll
0	ollama.exe	0x560000	0x3365000	C:\Users\local\Desktop\Ollama_32\ollama.exe				
1	ntdll.dll	0x7FFB52A30000	0x216000	C:\WINDOWS\SYSTEM32\ntdll.dll	NT Layer DLL	10.0.22621.3374	Microsoft Corporation	
2	KERNEL32.DLL	0x7FFB52200000	0xC4000	C:\WINDOWS\System32\KERNEL32.DLL	Windows NT BASE API Client DLL	10.0.22621.3374	Microsoft Corporation	
3	KERNELBASE.dll	0x7FFB4FEF0000	0x3A7000	C:\WINDOWS\System32\KERNELBASE.dll	Windows NT BASE API Client DLL	10.0.22621.3447	Microsoft Corporation	
4	msvcrt.dll	0x7FFB51CB0000	0xA7000	C:\WINDOWS\System32\msvcrt.dll	Windows NT CRT DLL	7.0.22621.2506	Microsoft Corporation	
5	bcryptprimitives.dll	0x7FFB4FE70000	0x79000	C:\WINDOWS\System32\bcryptprimitives.dll	Windows Cryptographic Primitives Library	10.0.22621.3374	Microsoft Corporation	
6	winmm.dll	0x7FFB42D40000	0x34000	C:\WINDOWS\SYSTEM32\winmm.dll	MCI API DLL	10.0.22621.2506	Microsoft Corporation	
7	ucrtbase.dll	0x7FFB4FD50000	0x111000	C:\WINDOWS\System32\ucrtbase.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
8	ws2_32.dll	0x7FFB527F0000	0x71000	C:\WINDOWS\System32\ws2_32.dll	Windows Socket 2.0 32-Bit DLL	10.0.22621.1	Microsoft Corporation	
9	RPCRT4.dll	0x7FFB528D0000	0x115000	C:\WINDOWS\System32\RPCRT4.dll	Remote Procedure Call Runtime	10.0.22621.3447	Microsoft Corporation	
10	powrprof.dll	0x7FFB4EBB0000	0x4D000	C:\WINDOWS\SYSTEM32\powrprof.dll	Power Profile Helper DLL	10.0.22621.3374	Microsoft Corporation	
11	UMPDC.dll	0x7FFB4EB90000	0x13000	C:\WINDOWS\SYSTEM32\UMPDC.dll	User Mode Power Dependency Coordinator	10.0.22621.1	Microsoft Corporation	
12	mswsock.dll	0x7FFB4F2A0000	0x69000	C:\WINDOWS\system32\mswsock.dll	Microsoft Windows Sockets 2.0 Service Provider	10.0.22621.2506	Microsoft Corporation	
0.1.33 Dll
0	ollama.exe	0x530000	0x19BE000	C:\Users\local\Desktop\ollama_33\ollama.exe				
1	ntdll.dll	0x7FFB52A30000	0x216000	C:\WINDOWS\SYSTEM32\ntdll.dll	NT Layer DLL	10.0.22621.3374	Microsoft Corporation	
2	KERNEL32.DLL	0x7FFB52200000	0xC4000	C:\WINDOWS\System32\KERNEL32.DLL	Windows NT BASE API Client DLL	10.0.22621.3374	Microsoft Corporation	
3	KERNELBASE.dll	0x7FFB4FEF0000	0x3A7000	C:\WINDOWS\System32\KERNELBASE.dll	Windows NT BASE API Client DLL	10.0.22621.3447	Microsoft Corporation	
4	msvcrt.dll	0x7FFB51CB0000	0xA7000	C:\WINDOWS\System32\msvcrt.dll	Windows NT CRT DLL	7.0.22621.2506	Microsoft Corporation	
5	bcryptprimitives.dll	0x7FFB4FE70000	0x79000	C:\WINDOWS\System32\bcryptprimitives.dll	Windows Cryptographic Primitives Library	10.0.22621.3374	Microsoft Corporation	
6	winmm.dll	0x7FFB42D40000	0x34000	C:\WINDOWS\SYSTEM32\winmm.dll	MCI API DLL	10.0.22621.2506	Microsoft Corporation	
7	ucrtbase.dll	0x7FFB4FD50000	0x111000	C:\WINDOWS\System32\ucrtbase.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
8	ws2_32.dll	0x7FFB527F0000	0x71000	C:\WINDOWS\System32\ws2_32.dll	Windows Socket 2.0 32-Bit DLL	10.0.22621.1	Microsoft Corporation	
9	RPCRT4.dll	0x7FFB528D0000	0x115000	C:\WINDOWS\System32\RPCRT4.dll	Remote Procedure Call Runtime	10.0.22621.3447	Microsoft Corporation	
10	powrprof.dll	0x7FFB4EBB0000	0x4D000	C:\WINDOWS\SYSTEM32\powrprof.dll	Power Profile Helper DLL	10.0.22621.3374	Microsoft Corporation	
11	UMPDC.dll	0x7FFB4EB90000	0x13000	C:\WINDOWS\SYSTEM32\UMPDC.dll	User Mode Power Dependency Coordinator	10.0.22621.1	Microsoft Corporation	
12	mswsock.dll	0x7FFB4F2A0000	0x69000	C:\WINDOWS\system32\mswsock.dll	Microsoft Windows Sockets 2.0 Service Provider	10.0.22621.2506	Microsoft Corporation	
13	nvcuda.dll	0x7FFACC740000	0x38E000	C:\WINDOWS\system32\nvcuda.dll	NVIDIA CUDA Driver, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
14	ADVAPI32.dll	0x7FFB51BE0000	0xB2000	C:\WINDOWS\System32\ADVAPI32.dll	Advanced Windows 32 Base API	10.0.22621.3296	Microsoft Corporation	
15	sechost.dll	0x7FFB52440000	0xA8000	C:\WINDOWS\System32\sechost.dll	Host for SCM/SDDL/LSA Lookup APIs	10.0.22621.3296	Microsoft Corporation	
16	bcrypt.dll	0x7FFB50730000	0x28000	C:\WINDOWS\System32\bcrypt.dll	Windows Cryptographic Primitives Library	10.0.22621.2506	Microsoft Corporation	
17	gdi32.dll	0x7FFB527C0000	0x29000	C:\WINDOWS\System32\gdi32.dll	GDI Client DLL	10.0.22621.3085	Microsoft Corporation	
18	win32u.dll	0x7FFB50410000	0x26000	C:\WINDOWS\System32\win32u.dll	Win32u	10.0.22621.3447	Microsoft Corporation	
19	gdi32full.dll	0x7FFB50610000	0x119000	C:\WINDOWS\System32\gdi32full.dll	GDI Client DLL	10.0.22621.3374	Microsoft Corporation	
20	msvcp_win.dll	0x7FFB50440000	0x9A000	C:\WINDOWS\System32\msvcp_win.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
21	USER32.dll	0x7FFB51E80000	0x1AE000	C:\WINDOWS\System32\USER32.dll	Multi-User Windows USER API Client DLL	10.0.22621.3374	Microsoft Corporation	
22	IMM32.DLL	0x7FFB52770000	0x31000	C:\WINDOWS\System32\IMM32.DLL	Multi-User Windows IMM32 API Client DLL	10.0.22621.3374	Microsoft Corporation	
23	dxcore.dll	0x7FFB4D4A0000	0x39000	C:\WINDOWS\SYSTEM32\dxcore.dll	DXCore	10.0.22621.3374	Microsoft Corporation	
24	nvcuda64.dll	0x7FFA9C570000	0xA1F000	C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_362f239e9bd019fc\nvcuda64.dll	NVIDIA CUDA Driver, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
25	SHLWAPI.dll	0x7FFB510C0000	0x5E000	C:\WINDOWS\System32\SHLWAPI.dll	Shell Light-weight Utility Library	10.0.22621.2506	Microsoft Corporation	
26	VERSION.dll	0x7FFB47740000	0xA000	C:\WINDOWS\SYSTEM32\VERSION.dll	Version Checking and File Installation Libraries	10.0.22621.1	Microsoft Corporation	
27	msasn1.dll	0x7FFB4F640000	0x12000	C:\WINDOWS\SYSTEM32\msasn1.dll	ASN.1 Runtime APIs	10.0.22621.2506	Microsoft Corporation	
28	cryptnet.dll	0x7FFB46D60000	0x32000	C:\WINDOWS\SYSTEM32\cryptnet.dll	Crypto Network Related API	10.0.22621.1	Microsoft Corporation	
29	CRYPT32.dll	0x7FFB502A0000	0x167000	C:\WINDOWS\System32\CRYPT32.dll	Crypto API32	10.0.22621.3447	Microsoft Corporation	
30	drvstore.dll	0x7FFB46B80000	0x158000	C:\WINDOWS\SYSTEM32\drvstore.dll	Driver Store API	10.0.22621.2506	Microsoft Corporation	
31	devobj.dll	0x7FFB4FA10000	0x2C000	C:\WINDOWS\SYSTEM32\devobj.dll	Device Information Set DLL	10.0.22621.2506	Microsoft Corporation	
32	cfgmgr32.dll	0x7FFB4FA40000	0x4E000	C:\WINDOWS\SYSTEM32\cfgmgr32.dll	Configuration Manager DLL	10.0.22621.2506	Microsoft Corporation	
33	wldp.dll	0x7FFB4F530000	0x48000	C:\WINDOWS\SYSTEM32\wldp.dll	Windows Lockdown Policy	10.0.22621.3447	Microsoft Corporation	
34	combase.dll	0x7FFB51320000	0x388000	C:\WINDOWS\System32\combase.dll	Microsoft COM for Windows	10.0.22621.3235	Microsoft Corporation	
35	OLEAUT32.dll	0x7FFB50FE0000	0xD7000	C:\WINDOWS\System32\OLEAUT32.dll	OLEAUT32.DLL	10.0.22621.2506	Microsoft Corporation	
36	cryptbase.dll	0x7FFB4F4A0000	0xC000	C:\WINDOWS\SYSTEM32\cryptbase.dll	Base cryptographic API DLL	10.0.22621.1	Microsoft Corporation	
37	nvapi64.dll	0x7FFB3C910000	0x6C4000	C:\WINDOWS\SYSTEM32\nvapi64.dll	NVIDIA NVAPI Library, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
38	SETUPAPI.dll	0x7FFB516B0000	0x474000	C:\WINDOWS\System32\SETUPAPI.dll	Windows Setup API	10.0.22621.2506	Microsoft Corporation	
39	SHELL32.dll	0x7FFB50780000	0x85C000	C:\WINDOWS\System32\SHELL32.dll	Windows Shell Common Dll	10.0.22621.3374	Microsoft Corporation	
40	ole32.dll	0x7FFB524F0000	0x1A5000	C:\WINDOWS\System32\ole32.dll	Microsoft OLE for Windows	10.0.22621.3374	Microsoft Corporation	
41	kernel.appcore.dll	0x7FFB4EE40000	0x18000	C:\WINDOWS\SYSTEM32\kernel.appcore.dll	AppModel API Host	10.0.22621.2715	Microsoft Corporation	
42	dwmapi.dll	0x7FFB4D430000	0x2B000	C:\WINDOWS\SYSTEM32\dwmapi.dll	Microsoft Desktop Window Manager API	10.0.22621.3085	Microsoft Corporation	
43	WINTRUST.dll	0x7FFB504E0000	0x6B000	C:\WINDOWS\System32\WINTRUST.dll	Microsoft Trust Verification APIs	10.0.22621.3447	Microsoft Corporation	

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 3, 2024

We adjusted the behavior in 0.1.33 to try to use cuda libraries on the host system if found in the hopes that would resolve some other issues we've seen with our bundled library not working in some cases. We weren't anticipating a cudart library successfully loading and enumerating a GPU but providing incorrect information about memory and CC version. I'd definitely like to get this fixed ASAP for the next release, we just need to figure out what the best approach is.

@Freffles
Copy link

Freffles commented May 4, 2024

FWIW, I think I have the same issue.

Edit: Actually, not the same but similar. GPU is not being used. Have a very similar graphics card to the OP, GTX 1050Ti.

Noticed my ollama embeddings take a very long time. Server logs seem to point to an Ollama version of cuda DLL (cudart64_110.dll) despite the fact that up to date cuda is installed. Removed Ollama, removed all nvidia then re-installed nvidia first then ollama but still the same.

image

ollama.log

@makeryangcom
Copy link

I also have the same issue; although I removed it from the PATH, I still can't use the GPU. #3969

@cr1cr1
Copy link

cr1cr1 commented May 6, 2024

Managed to make it use GPU by force setting in PATH the ollama dir (or any other) that contains cudart64_110.dll
ollama version: 0.1.33, Windows 11

set PATH=C:\tools\scoop\apps\ollama\current;%PATH%

ollama serve
time=2024-05-06T21:00:09.798+03:00 level=INFO source=images.go:828 msg="total blobs: 7"
time=2024-05-06T21:00:09.799+03:00 level=INFO source=images.go:835 msg="total unused blobs removed: 0"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=routes.go:1071 msg="Listening on 127.0.0.1:11434 (version 0.1.33)"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11.3 rocm_v5.7 cpu cpu_avx cpu_avx2]"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-06T21:00:09.833+03:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\tools\scoop\apps\ollama\current\cudart64_110.dll count=1

......

ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   281.81 MiB
llm_load_tensors:      CUDA0 buffer size =  4155.99 MiB
nvidia-smi
Mon May  6 20:56:04 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.22                 Driver Version: 552.22         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080      WDDM  |   00000000:01:00.0  On |                  N/A |
| 47%   38C    P8             35W /  350W |    8023MiB /  12288MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
.....
|    0   N/A  N/A     23524      C   ...\cuda_v11.3\ollama_llama_server.exe      N/A      |
+-----------------------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working nvidia Issues relating to Nvidia GPUs and CUDA
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants