-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrated GPU support #2637
Comments
ROCm's support for integrated GPUs is not that well. This issue may largely depend on AMD's progress on improving ROCm. |
OK, but i would like to have an option to have it enable. Just to check if it works. |
Their |
I've seen this behavior in #2411, but only with the version from ollama.com. |
Yes, latest release fixed this behavior. |
I had a permission issue with lxc/docker. Now:
So as the topic says, please add integrated GPU support (AMD 5800U here) |
Latest (0.1.27) docker image with ROCm works for me on Ryzen 5600G with 8GB VRAM allocation. Prompt processing is 2x faster than with CPU. Generation runs at max speed even if CPU is busy running other processes. I am on Fedora 39. Container setup:
It's however still shaky:
|
See also discussion in the #738 epic. |
Why does it work for you??
Also the non-docker version doesnt work...
@dhiltgen please have a look |
And by the way there is no /sys/module/amdgpu/version. You have to correct the code. |
Ollama skipped the iGPU, because it has less than 1GB of VRAM. You have to configure VRAM allocation for the iGPU in BIOS to something like 8GB. |
Thanks i will check if i can do that. |
Why do you think so? Where is it documented? Mine maxes at 512MB unless I explicitly configure it in BIOS. |
Detecting and using this VRAM information without sharing with the user the reason for the iGPU rejection leads to "missing support" issues being opened, rather than "increase my VRAM allocation" steps taken. I think the log output should be improved in this case. This task would probably qualify for a "good first issue" tag, too. |
Totally agree! |
i have 2 systems. export HSA_OVERRIDE_GFX_VERSION=9.0.0
building with
my 6750xt system works perfectly |
OK i was wrong. Works now with 8GB VRAM, thank you!
|
Hmm, i see the model loaded into VRAM, but nothing happens...
|
Do i need another amdgpu module on the host than the one from the kernel (6.7.6)? |
Maybe, ROCm/ROCm#816 seems relevant. I'm just using AMD-provided DKMS modules from https://repo.radeon.com/amdgpu/6.0.2/ubuntu to be sure. |
Hmm, tinyllama model does work with 5800U. The bigger ones stuck as i mentioned before. |
i added this "-DLLAMA_HIP_UMA=ON" to "ollama/llm/generate/gen_linux.sh"
now its stuck here
|
iGPUs indeed do allocate system RAM on demand. It's called GTT/GART. Here's what I get when I run If I set VRAM to Auto in BIOS:
If I set VRAM to 8GB in BIOS:
If I set VRAM to 16GB in BIOS:
It looks like GTT size is 0.5*(RAM-VRAM). I wonder how far can this go if you have 64GB or 96GB RAM. Can you have iGPU with 32GB or 48GB of GTT memory? That would make $200 APU with $200 DDR5 RAM superior to $2,000 dGPU for running Mixtral and future sparse models. I also wonder whether any BIOS offers 32GB VRAM setting if you have 64GB of RAM. Unfortunately, ROCm does not use GTT. That thread mentions several workarounds (torch-apu-helper, force-host-alloction-APU, Rusticl, unlock VRAM allocation), but I am not sure whether Ollama would be able to use any of them. Chances are highest in docker container where Ollama has greatest control over dependencies. |
Very cool findings. Interesting you mention 96GB. I did a research and it seems thats the max. we can buy right now for SO-DIMMS. Wasn't aware it's called GTT. Let's hope someday we get support for this. https://github.com/segurac/force-host-alloction-APU looks like the best solution to me if it works. Will try in my docker containers...
This is how much i would get :-) (64GB system) |
OK, doesn't work with ollama. Wasn't aware that it doesn't use PyTorch right? |
llama.cpp supports it. thats what i was trying to do in my previous post. Support AMD Ryzen Unified Memory Architecture (UMA) |
@chiragkrishna Do you mean this? ggerganov/llama.cpp#4449 Since llama.cpp already supports UMA (GGT/GART), Ollama could perhaps include llama.cpp build with UMA enabled and use it when the conditions are right (AMD iGPU with VRAM smaller than the model). PS: UMA support seems a bit unstable, so perhaps enable it with environment variable at first. |
How does the env thing work? Like this? (Doesn't do anything btw) |
@DocMAX I don't think there's UMA support in ollama yet. It's a compile-time option in llama.cpp. The other env variables (HSA_OVERRIDE_GFX_VERSION was sufficient in my experiments) are correctly passed down to ROCm. |
I'm using docker image ollama/ollama:rocm |
What I suggested is a change over ollama, OLLAMA_VRAM_OVERRIDE, this is not part of ollama today... |
OK, then i have to wait for the docker version, because i want stay on docker. |
Curious question - why do you use libforcegttalloc.so with ollama? Isn't it only intended for use with applications that require PyTorch? Without LD_PRELOAD everything should work exactly the same. |
Well the reason is that if you will look when you compile ollama/llama.cpp (even with So with this trick you can now load models much bigger, and "steal" less memory from your system for your GPU. So for example I loaded llama3:70b-instruct-q4_K_M which is about 40Gb, and I still get 0.8tps which is fairly ok for the power of our iGPU... |
Interesting. I have an AMD 5600G APU with UMA_AUTO set in UEFI/BIOS (which means 512 MB is taken from my RAM for VRAM). On my Ubuntu 22.04 the libforcegttalloc.so is required only for Stable Diffusion apps like Fooocus. Running ollama with or without LD_PRELOAD makes no difference in my case. VRAM is kept at 512 MB, models are loaded to RAM, and the compute is done on GPU. Have you tried running ollama without LD_PRELOAD? |
Interesting for me it crashes if I tried to module bigger than the allocated VRAM, I wonder if it's an issue because in Fedora 40, the default ROCm is 6.0. |
Hey, I also have a 5600G and I wanted to make use of it. I read the whole thread, but I'm confused about which steps should I do to change the build in order to make this work with this iGPU. Is there a version already pre-compiled that everyone can use? I tried to follow @qkiel steps, but this fails miserably when I try to compile and build using go... |
When you download source code and compile it with commands below, do you still get an error?
|
I updated my instruction a bit, see if it works this time. If not, I can send you my binary. |
I followed everything again and made sure about the versions of the requirements. This time I managed to pass the generate step, but it seems that I have a problem with the goroot path when I try to run the builder command: Could it be because of this being installed in a custom folder? EDIT: Meanwhile I tried to point both variables to different paths, but now I have an error on GOPROXY. @qkiel, did you also experienced this? What are your paths for each variable GOROOT, GOPATH and GOPROXY? |
This warning doesn't matter, just run ollama:
Then in a second terminal window:
|
I did that and I believe I'm still running it using the CPU. Is there a way to confirm that I'm running this using the GPU? |
Look at GPU utilization. I use nvtop for that (also available as a snap):
|
Installed it, but I think I messed up on the ROCm installation. Do I need to do some extra step besides this?:
If it helps, I'm using windows with wsl. |
Unfortunately, there is no equivalent of the Secondly, you install ROCm differently on Windows. I don't think it can be done on Windows the same way as on Linux. Edit:
Besides that, you can this to install ROCm:
But chances of success are very slim. |
Yeah, I tried it and got the same problem mentioned on this thread: ROCm/ROCm#3051 And what about a Virtual Machine running linux, @qkiel? Do you think that could work or am I stretching too much here? |
I have no idea. If you have a regular GPU, then you can pass iGPU to the VM and that could work. I don't think that 5600G supports SR-IOV so you can't partition iGPU and pass only part of it. |
You can try radeontop, it works fine on iGPU from AMD, -c flag ads colorized output. |
@qkiel thx for this tip 🙏
My current setup:
What is crazy now, is that if I install docker in this LXC and run |
@smellouk I have tutorials on how I install ROCm and Ollama in Icnus containers (fork of LXD):
Do you do this similarly or somehow differently? |
@qkiel I used that article and I just noticed you are the same owner 😆, that Ai tutorial: ROCm and PyTorch on AMD APU or GPU led to me here. I follow everything and same issue 😢 |
When you run this command, what do you see?
Do
If they belong to
Or your user inside the container doesn't belong to
|
@qkiel permissions are correct as expected |
@dhiltgen perhaps you want to consider adding this patch to ollama? (I dont have any NVIDIA computer to test and do the same for CUDA, or whatever Intel has/will have) to test this with, but i know it works well for AMD |
Opening a new issue (see #2195) to track support for integrated GPUs. I have a AMD 5800U CPU with integrated graphics. As far as i did research ROCR lately does support integrated graphics too.
Currently Ollama seems to ignore iGPUs in general.
The text was updated successfully, but these errors were encountered: