Add support for CUDA 5.0 cards #2116

dhiltgen · 2024-01-20T22:28:09Z

Building on #2112, this expands back to 5.0 cards, and also adds a few newer targets which theoretically should help performance on the more modern cards. The resulting binary grows a little in size but not significantly

0.1.21 => 263M
Add support for CUDA 5.2 cards #2112 => 264M
This PR: => 266M

Fixes #1865

I'll keep this draft until we can run more performance testing on modern cards to ensure no significant regression

dhiltgen · 2024-01-20T22:35:43Z

llm/generate/gen_common.sh

@@ -39,6 +39,9 @@ init_vars() {
    *)
        ;;
    esac
+    if [ -z "${CMAKE_CUDA_ARCHITECTURES}" ] ; then 
+        CMAKE_CUDA_ARCHITECTURES="50;52;61;70;75;80"


Note: we may want to make this list "smart" about the cuda version detected, as there's a sliding window of EOL architectures. (e.g., v12 drops 3.5 and 3.7 support)

dhiltgen · 2024-01-23T22:49:21Z

Comparing before/after on a NVIDIA GeForce GTX 1650 with Max-Q Design, compute capability 7.5 system, I'm seeing an ~8% performance hit. CC 6.x's seem to be roughly the same performance as before. Of course 5.x systems are much faster now on GPU vs. CPU.

Comparing NVIDIA L4, compute capability 8.9 I see a ~7% performance hit.

We might want to create a new llm library variant and toggling which one we load based on the CC of the card we detect.

dhiltgen · 2024-01-27T15:11:03Z

I think my prior perf tests may have been across llama.cpp version bumps, or there was some other anomaly. Comparing 0.1.22 vs. this change rebased on main shows almost no impact except for unlocking older GPUs.

--- 0.1.22 vs 0.1.22-6-gb5d1bdb ---
node1/orca-mini.tps -0.35% == NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yes
Daniels-Mini/orca-mini.tps -0.06% == CPU has AVX
anton/orca-mini.tps -0.34% == Radeon RX 7900 XTX, compute capability 11.0, VMM: no
burton/orca-mini.tps 245.49% == NVIDIA GeForce GTX 980, compute capability 5.2, VMM: yes
daniel-laptop/orca-mini.tps 1.84% == NVIDIA GeForce GTX 1650 with Max-Q Design, compute capability 7.5, VMM: yes
orac/orca-mini.tps 1.15% == NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
dhiltgen-mbp/orca-mini.tps 0.12% == Apple M3 Max

CompositeCoding · 2024-01-27T20:01:13Z

Thank you for this! I built from main and my GeForce GTX 960 is alive and kicking:

2024/01/27 14:56:57 gpu.go:146: INFO CUDA Compute Capability detected: 5.2

ansemz · 2024-01-30T08:17:04Z

docker image upgrade to 0.1.22, but cc 5.2 gpu still not working.

[root@localhost ~]# docker exec ollama ollama --version
ollama version is 0.1.22
[root@localhost ~]# docker logs ollama 2>&1 |grep gpu
2024/01/30 06:34:22 gpu.go:94: INFO Detecting GPU type
2024/01/30 06:34:22 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
2024/01/30 06:34:22 gpu.go:282: INFO Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.54.03]
2024/01/30 06:34:23 gpu.go:99: INFO Nvidia GPU detected
2024/01/30 06:34:23 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2

moppman · 2024-01-30T08:33:50Z

docker image upgrade to 0.1.22, but cc 5.2 gpu still not working.

This PR is not in 0.1.22. If you can't wait for 0.1.23, you need to build from main yourself.

iplayfast · 2024-01-31T07:20:35Z

many impatiently waiting! :)

dhiltgen added 2 commits January 20, 2024 10:48

Add support for CUDA 5.2 cards

681a914

Add compute capability 5.0, 7.5, and 8.0

a447a08

dhiltgen mentioned this pull request Jan 20, 2024

Older CUDA compute capability 3.5 and 3.7 support #1756

Open

dhiltgen commented Jan 20, 2024

View reviewed changes

jmorganca approved these changes Jan 21, 2024

View reviewed changes

dhiltgen mentioned this pull request Jan 22, 2024

CUDA error 999 #1877

Closed

dhiltgen marked this pull request as ready for review January 27, 2024 15:11

dhiltgen mentioned this pull request Jan 27, 2024

Add support for CUDA 5.2 cards #2112

Closed

jmorganca approved these changes Jan 27, 2024

View reviewed changes

dhiltgen merged commit e02ecfb into ollama:main Jan 27, 2024
10 checks passed

dhiltgen deleted the cc_50_80 branch January 27, 2024 18:28

lyczak mentioned this pull request Mar 3, 2024

Windows preview CUDA 5.2 support #2897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CUDA 5.0 cards #2116

Add support for CUDA 5.0 cards #2116

dhiltgen commented Jan 20, 2024 •

edited

Loading

dhiltgen Jan 20, 2024

dhiltgen commented Jan 23, 2024 •

edited

Loading

dhiltgen commented Jan 27, 2024

CompositeCoding commented Jan 27, 2024

ansemz commented Jan 30, 2024 •

edited

Loading

moppman commented Jan 30, 2024 •

edited

Loading

iplayfast commented Jan 31, 2024

Add support for CUDA 5.0 cards #2116

Add support for CUDA 5.0 cards #2116

Conversation

dhiltgen commented Jan 20, 2024 • edited Loading

dhiltgen Jan 20, 2024

Choose a reason for hiding this comment

dhiltgen commented Jan 23, 2024 • edited Loading

dhiltgen commented Jan 27, 2024

CompositeCoding commented Jan 27, 2024

ansemz commented Jan 30, 2024 • edited Loading

moppman commented Jan 30, 2024 • edited Loading

iplayfast commented Jan 31, 2024

dhiltgen commented Jan 20, 2024 •

edited

Loading

dhiltgen commented Jan 23, 2024 •

edited

Loading

ansemz commented Jan 30, 2024 •

edited

Loading

moppman commented Jan 30, 2024 •

edited

Loading