-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm support #814
ROCm support #814
Conversation
Example build instructions (for Arch, other distributions may have different paths for CLBlast Cmake includes and ROCm install directory):
then
GGUF (uses ROCm for acceleration, RX6950XT) mistral-7b-q8:
GGML (legacy, uses CLBlast for acceleration, RX6950XT) llama-7b-q2k:
|
Hi @65a, sorry for not jumping in on the other PR sooner. Will take a look at this, and thank you so much for taking all of the time to give adding ROCm a go! |
@jmorganca no worries, I'm using it locally so I have to keep going regardless :) |
just wanted to say, your repo built flawlessly and is working great on my 6700XT, thank you! |
actually, doesn't seem to work with mistral 7B, guessing it's because it's using a different backed or something in ollama? (as in, slow, no gpu activity, and it's not making any of the usual noises) |
@TheScreechingBagel you can see above I tested with Mistral-7b. Likely you are falling back to CPU, there will be an error in your logs, but perhaps we can continue the conversation in #738 |
Rebased on HEAD and incorporated changes, testing again. W7900 is still out of commission and going around in RMA world, but I have a 7900XTX to test with now. |
ROCm: 7900XTX GGUF (Mistral 7b q8):
OpenCL: 7900XTX GGML (Llama-7b q2k):
Seems to work, and ready for review @jmorganca |
We should be able to test this now. I ordered a Radeon 7900 XTX and it just came in, but I still have to pull a machine apart and get it working. Thanks for your patience! |
@pdevine sounds good! I can try syncing to head and rebuilding to make sure things are still in a good state. |
Seems like it's working still (7900XTX, Mistral-7b quantized to q8):
|
Tested this on my Vega 56 on Linux, with llama2 (7b, 13b) and mistral, works! Thanks a lot. How do I run the benchmark you did? I'm curious about how my old card stacks up. |
Hi I'm currently trying to run this with my 6700XT. is there a way to specify a setting the parallel level from the environment doesn't seem to help: also it looks like its building a lot of ggml cuda code. can we turn that off somehow?
|
@K1ngjulien The "cuda" code is actually hipified for ROCm, and it's compiled for several targets (hence slow). I'll leave more parallelism for the next PR, if it's possible, though if it's bottlenecked on compiling the "cuda" (actually ROCm) kernels, it might help to just override AMDGPU_TARGETS and friends to only your card and trade portability for compile time locally. |
Rebased on HEAD and made sure nVidia behavior matches it by copying new changes in CheckVRAM and generators |
@lu4p the benchmark is in the logs for wherever you are running |
My card a lot slower, than yours by factor 10 or so.
My cpu ryzen 5 3600 for reference, around 30% slower.
Is this a problem? (output from ollama serve)
|
I'd prefer to provide support in #738, if that's all right. I would need the full log from |
Fixed typo in generate_linux_rocm.go |
Finally managed a successful test on 6700S again:
|
hmm looks like its detecting the gpu correctly, but then something goes wrong and it falls back to the cpu:
any ideas? |
GOT IT! from this comment. before on Ryzen 9 5900X cpu:
now with
so we went from 25t/s to 240t/s running codellama. I'd call that a win 🎉 |
Minor code cleanup to make the way runners are accumulated (though I'm not sure CPU fallback is ever good UX...out of scope for this change). It would be great to have someone test the |
Sync to HEAD, tested 6700S (dGPU) again on a clean checkout with a mistral-7b q5k quant:
|
Tested again with 7900XTX (a mistral-7b, not quantized/f16):
retested OpenCL on the same card (ggml acceleration for older models, llama-7b q2k):
|
Finally, managed to compile it. Compiler looked for OpenCL lib in fixed patch |
Thanks @65a . Tried below commands:
With Command
|
@65a I have produced a fix to switch the GPU to be used as primary in a multi-GPU environment. PR #1192 |
…Linux. The build tags rocm or cuda must be specified to both go generate and go build. ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also used to switch VRAM detection between cuda and rocm implementations, using added "accelerator_foo.go" files which contain architecture specific functions and variables. accelerator_none is used when no tags are set, and a helper function addRunner will ignore it if it is the chosen accelerator. Fix go generate commands, thanks @deadmeu for testing.
Hi there! It's been a couple of weeks since you mentioned ordering your Radeon 7900 XTX. I'm curious, have you had the chance to test it out yet? I'm particularly interested in knowing how it's performing in terms of stability and speed, especially compared to an NVIDIA card. Looking forward to hearing about your experience! |
@tuhochi Still hoping to get this sorted soon, although it's still probably a few weeks out unfortunately. Feel free to ping me on the discord if you want more details. |
Thank you all for the efforts, I seem to have built successfully and ollama is running from rocm 5.6. My system is Ubuntu 22.04.03 on a Ryzen 5800h/Vega 8 (w/ 16g out of 64g ram assigned) with HSA_OVERRIDE_GFX_VERSION=9.0.0. ROCm 5.6 works fine with pytorch 2.1, verified by running stable diffusion webui. Now, the output from ollama runner:
When I run mistral, the model got loaded into vram and gpu is working on inference:
However, the output of the model is garbage:
I have tried running on CPU, everything works fine. Just wonder what is going wrong? Thanks! |
Note, I will close and delete my branch when #1146 merges. |
@ml2s if you haven't sorted it out by now, that's usually a prompting/sampling problem, but there is an issue upstream right now (at llama.cpp) which looks like that. I haven't encountered it on ROCm, but it may be hardware or environment specific. |
This would be awesome to have official support, rocm is for sure harder to setup than cuda but I have the gpu I have. |
Hi! |
Fixed merge conflicts on https://github.com/65a/ollama/pull/1 |
This change is getting carried in #1146 which is just about to go in. |
Nice! But one tiny thing I had to add was an additional Environment entry in the service file:
But other than that it worked out of the box. |
For uninitiated like myself, does this mean ollama will support AMD graphics cards like 7900 XTX going forward? |
Yes, it should be working in |
I tried building from source, but got this error (Arch Linux, RX 6700XT):
|
I do not have my outout of ollama serve: There is no mention of But again I use this branch here rebased onto main (23dc179). I don't know what happend on main since then as I haven't checked it out, yet. |
After some trial and error (and much appreciated help from the Discord), I got it working on my 6700XT! https://discord.com/channels/1128867683291627614/1188401254284669008/1188411154725351485 TL;DR
|
Can someone please tell me how to enable the AMD support? I've installed ollama and it still shows "WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode". |
The pre-release for 0.1.21 is up now, and we've made various improvements to support ROCm cards, covering both v5 and v6 of the ROCm libraries. You'll have to install ROCm, but then the Ollama binary should work. Please let us know if you run into any problems by filing new tickets. |
Thanks @dhiltgen. Looks like the problem is that ROCm does not support Debian 12. |
I have switched to Arch hoping that I could use ollama with AMD support. Posted my issue here: Anyone can help me sort this out? |
#667 got closed during a bad rebase attempt. This should be just about the minimum I can come up with to use build tags to switch between ROCm and CUDA, as well as docs for how to build it. The existing dockerfiles are updated so they do not break.
Please let me know @jmorganca @mxyng @BruceMacD if you'd like this in a different approach or something, or if you don't want to do this. Closes #738. Will post test results for GGML and GGUF files.