Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CUDA 5.0 cards #2116

Merged
merged 2 commits into from
Jan 27, 2024
Merged

Add support for CUDA 5.0 cards #2116

merged 2 commits into from
Jan 27, 2024

Conversation

dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented Jan 20, 2024

Building on #2112, this expands back to 5.0 cards, and also adds a few newer targets which theoretically should help performance on the more modern cards. The resulting binary grows a little in size but not significantly

Fixes #1865

I'll keep this draft until we can run more performance testing on modern cards to ensure no significant regression

@@ -39,6 +39,9 @@ init_vars() {
*)
;;
esac
if [ -z "${CMAKE_CUDA_ARCHITECTURES}" ] ; then
CMAKE_CUDA_ARCHITECTURES="50;52;61;70;75;80"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we may want to make this list "smart" about the cuda version detected, as there's a sliding window of EOL architectures. (e.g., v12 drops 3.5 and 3.7 support)

@dhiltgen dhiltgen mentioned this pull request Jan 22, 2024
@dhiltgen
Copy link
Collaborator Author

dhiltgen commented Jan 23, 2024

Comparing before/after on a NVIDIA GeForce GTX 1650 with Max-Q Design, compute capability 7.5 system, I'm seeing an ~8% performance hit. CC 6.x's seem to be roughly the same performance as before. Of course 5.x systems are much faster now on GPU vs. CPU.

Comparing NVIDIA L4, compute capability 8.9 I see a ~7% performance hit.

We might want to create a new llm library variant and toggling which one we load based on the CC of the card we detect.

@dhiltgen
Copy link
Collaborator Author

I think my prior perf tests may have been across llama.cpp version bumps, or there was some other anomaly. Comparing 0.1.22 vs. this change rebased on main shows almost no impact except for unlocking older GPUs.

--- 0.1.22 vs 0.1.22-6-gb5d1bdb ---
node1/orca-mini.tps -0.35% == NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yes
Daniels-Mini/orca-mini.tps -0.06% == CPU has AVX
anton/orca-mini.tps -0.34% == Radeon RX 7900 XTX, compute capability 11.0, VMM: no
burton/orca-mini.tps 245.49% == NVIDIA GeForce GTX 980, compute capability 5.2, VMM: yes
daniel-laptop/orca-mini.tps 1.84% == NVIDIA GeForce GTX 1650 with Max-Q Design, compute capability 7.5, VMM: yes
orac/orca-mini.tps 1.15% == NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
dhiltgen-mbp/orca-mini.tps 0.12% == Apple M3 Max

@dhiltgen dhiltgen marked this pull request as ready for review January 27, 2024 15:11
@dhiltgen dhiltgen merged commit e02ecfb into ollama:main Jan 27, 2024
10 checks passed
@dhiltgen dhiltgen deleted the cc_50_80 branch January 27, 2024 18:28
@CompositeCoding
Copy link

Thank you for this! I built from main and my GeForce GTX 960 is alive and kicking:

2024/01/27 14:56:57 gpu.go:146: INFO CUDA Compute Capability detected: 5.2

@ansemz
Copy link

ansemz commented Jan 30, 2024

docker image upgrade to 0.1.22, but cc 5.2 gpu still not working.

[root@localhost ~]# docker exec ollama ollama --version
ollama version is 0.1.22
[root@localhost ~]# docker logs ollama 2>&1 |grep gpu
2024/01/30 06:34:22 gpu.go:94: INFO Detecting GPU type
2024/01/30 06:34:22 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
2024/01/30 06:34:22 gpu.go:282: INFO Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.54.03]
2024/01/30 06:34:23 gpu.go:99: INFO Nvidia GPU detected
2024/01/30 06:34:23 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2

@moppman
Copy link

moppman commented Jan 30, 2024

docker image upgrade to 0.1.22, but cc 5.2 gpu still not working.

This PR is not in 0.1.22. If you can't wait for 0.1.23, you need to build from main yourself.

@iplayfast
Copy link

many impatiently waiting! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GPU support for CUDA Compute Capability 5.0 and 5.2 cards
6 participants