-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama.cpp not working with intel ARC 770? #7042
Comments
If I understand correctly, then CPU works as expected, but not GPU. I think there needs to be more information provided on how you built |
Yes, exactly. I use cmake, yes. CPU works. With GPU the program waits for something to happen but never ends... just waits. I see something happening with the GPU for like 30s before usage goes to 0 and then nothing happens.
I tried with the intel packages from the intel repo and compiling them as well and I get the same results. GPU never finishes |
Sounds like maybe the non-free firmware isn't installed. |
will check, thanks |
You need the output to show level-zero or you won't be able to use the SYCL backend properly. There was a regression with Linux kernel 6.8.x and the compute runtime not doing the right thing that caused that. The freeze also isn't a Llama.cpp issue, it's a kernel issue with Intel's drivers and compute runtime. See the below two issues for more details. |
thanks @simonlui that certainly makes my life a bit harder having installed tumbleweed mainly to get driver support for ARC770. Would you suggest simply going for Ubuntu? I think support there is a bit better for Intel GPUs. cheers |
Ubuntu is the only officially supported consumer distro by Intel for Arc consumer GPUs. They support Red Hat and SUSE Enterpise also but only if you can pay for Arc Datacenter GPUs. See their dGPU install guide for Linux. |
The kernel issue is a good point. I downgraded to 6.8.4 and it runs fine there. That's the most recent kernel that works, all new revision has a regression, but... The fact that @SergioVargasRamirez doesn't see the SYCL device isn't indicative of being on the wrong kernel. That issue manifests itself by the SYCL device being visible, but attempts to use it just hang. The fact that the Arc A770 is missing as a SYCL device hints at something different. In terms of OS, I find it easier to run my LLM work within a docker container. So I can run the official tooling from Intel on Ubunutu 22, without having to worry about changing my host OS. All my host OS needs to worry about is ensure that the non-free firmware is loaded and that I'm not running a kernel newer than 6.8.4. |
I had a SYCL device with the packages from the opensuse intel repo. Kernel in Tumbleweed is 6.8.7, I think. I will need to check at home but I am sure is >6.8.4. I will give it a try using docker. If I understand correctly, you don't need any of the intel oneapi, etc. stuff in the host, just the firmware from intel (i915, I guess) and the right kernel. It is a bit silly from intel to support only ubuntu, I think. Why not offer support for .deb and .rpm based distros? thanks for all your help this had helped me a lot. |
Guys, you are mixing issues. Showing up as SYCL with Level Zero in @sevragorgia Arc by itself will work in any Linux distro with the right packages repackaged which all of them at this point do but only Ubuntu will ever offer the "full experience" with the custom kernel modules and other packages which are optional. oneAPI basekit and compute runtime at minimum is required to compile and run Llama.cpp but that can be done either in Docker or host, oneAPI has .deb and .rpm or etc. custom repositories out there Intel are hosting. |
@SergioVargasRamirez If you want to run llama.cpp on openCL, use CLBlas (openCL) backend is an optional. I meet some hang issue recently, most of solution is downgrade kernel or driver. If you still meet issue, please try with the latest llama.cpp code and paste whole running log here. Hope above info can help you! |
thanks. I installed Ubuntu 22 now and IPEX-LLM works fine. Will try lama.cpp probably today but I don't see why it will not work out of the box because it already works fine with the GPU if installed from IPEX-LLM. Ollama also works. If the compiled version of llama.cpp works I will try to do as suggested above: go back to opensuse (this time Leap 15.6 instead of tumbleweed) and try to install llama.cpp and ipex-llm in a docker container running ubuntu22 to see if I get the GPU to work in opensuse with this. I am not too attached to any distro but I work with opensuse at work and would like to have the same system at home. I can report back. Now... In ubuntu it works fine. But, since I need an older version of the intel packages to run ipex-llm the system keeps asking me to upgrade, which is annoying. Therefore I tend to favor the containerized solution proposed by @jwhitehorn . |
Just wanted to report back. I compiled llama.cpp on ubuntu 22 with the intel packages from the intel repo installed via apt. All working fine now. Intel Arc 770 GPU is used to generate text. I tried different models and some generate weird output, like non-sense or cyrillic symbols or german text, mixed with english. No idea. All with GPU. Like I wrote before I will try to containerize this to avoid ubuntu constantly asking to upgrade the intel packages and breaking ipex-llm or llama.cpp. I will also try to run the containerized solution in OpenSuse 15.6 after I tested the containers in ubuntu. thanks for all your help! |
Thank you feedback! |
Gonna leave this here just to document the process in case it helps other users. I installed OpenSuse Leap 15.6 on my PC. The kernel version is 6.4.0. I understand the kernel pulls GPU support from the backports. The system is pretty clean because I have not installed much besides docker, git-lfs, intel-gpu-tools, and htop. I get the Arc 770 fully supported out of the box. So, instead of going through the previous path of installing the intel packages, I did, as recommended, all the LLM stuff via docker. I can confirm that the IPEX-LLM docker image sees the GPU via SYCL, which is not installed on my PC. I tested the GPU via the test script provided by the image and can confirm that the GPU is used by the I cloned
I used the example call ("create a website in 10 steps") asking for 512 tokens and got 57.93 tokens per second in the So, I guess that's it from my side. Again, thanks for all the help. I hope this info here helps other users. cheers |
@SergioVargasRamirez |
changed that, but the model kept outputting 400-600 words |
Hi,
I am trying to get llama.cpp to work on a workstation with one ARC 770 Intel GPU but somehow whenever I try to use the GPU, llama.cpp does something (I see the GPU being used for computation using intel_gpu_top) for 30 seconds or so and then just hang there, using 100% CPU (but only one core) as if it would be waiting for something to happen...
I am using the following command:
it doesn't matter if I ignore the ZES_ENABLE_SYSMAN part.
now, the same biuld does work when
-ngl 0
. There I see the 32 cores be used and the model produces output.If I run
clinfo
, I get the following output.sycl-ls:
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000] [opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 9 5950X 16-Core Processor OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [24.18.0]
ls-sycl-device
sees three SYCL devices on is the GPU.Now I don't see the level-zero device, I had it at some point but had no opencl:gpu in exchange. With the level-zero device I had the same problem. The GPU will activate for 30s and go back to zero activity while ./main stays on guard for hours if I don't cancel.
I a running OpenSuse Tumbleweed and installed intel oneAPI locally using the online installer. I don't see compilation issues. I also compiled Neo and its requirements. All these packages are in my home but that doesn't seem to be the issue because previously installed intel packages (via zypper) where avail. system-wide with the same results.
I am really lost here because I don't seem to be getting any error, I am sure the GPU crashes but I don't know why or where to look for this info. So I would really appreciate your help on this. I can test anything you want (this is not a production system or else).
thanks in advance and best regards,
Sergio
The text was updated successfully, but these errors were encountered: