-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CUDA 5.0 cards #2116
Conversation
@@ -39,6 +39,9 @@ init_vars() { | |||
*) | |||
;; | |||
esac | |||
if [ -z "${CMAKE_CUDA_ARCHITECTURES}" ] ; then | |||
CMAKE_CUDA_ARCHITECTURES="50;52;61;70;75;80" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we may want to make this list "smart" about the cuda version detected, as there's a sliding window of EOL architectures. (e.g., v12 drops 3.5 and 3.7 support)
Comparing before/after on a Comparing We might want to create a new llm library variant and toggling which one we load based on the CC of the card we detect. |
I think my prior perf tests may have been across llama.cpp version bumps, or there was some other anomaly. Comparing 0.1.22 vs. this change rebased on main shows almost no impact except for unlocking older GPUs.
|
Thank you for this! I built from main and my GeForce GTX 960 is alive and kicking: 2024/01/27 14:56:57 gpu.go:146: INFO CUDA Compute Capability detected: 5.2 |
docker image upgrade to 0.1.22, but cc 5.2 gpu still not working. [root@localhost ~]# docker exec ollama ollama --version
ollama version is 0.1.22
[root@localhost ~]# docker logs ollama 2>&1 |grep gpu
2024/01/30 06:34:22 gpu.go:94: INFO Detecting GPU type
2024/01/30 06:34:22 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
2024/01/30 06:34:22 gpu.go:282: INFO Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.54.03]
2024/01/30 06:34:23 gpu.go:99: INFO Nvidia GPU detected
2024/01/30 06:34:23 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 06:37:11 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:17:14 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
2024/01/30 07:26:48 gpu.go:143: INFO CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 5.2
|
This PR is not in 0.1.22. If you can't wait for 0.1.23, you need to build from main yourself. |
many impatiently waiting! :) |
Building on #2112, this expands back to 5.0 cards, and also adds a few newer targets which theoretically should help performance on the more modern cards. The resulting binary grows a little in size but not significantly
Fixes #1865
I'll keep this draft until we can run more performance testing on modern cards to ensure no significant regression