Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel Arc thread #3761

Closed
oobabooga opened this issue Aug 30, 2023 · 58 comments
Closed

Intel Arc thread #3761

oobabooga opened this issue Aug 30, 2023 · 58 comments
Labels

Comments

@oobabooga
Copy link
Owner

oobabooga commented Aug 30, 2023

This thread is dedicated to discussing the setup of the webui on Intel Arc GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all Intel Arc users.

@oobabooga oobabooga pinned this issue Aug 30, 2023
@Jacoby1218
Copy link

OK, so some notes:

  • In my testing, CLBlast is quite slow when compared to CUDA or ROCm when used with llama.cpp (I'm not using llama.cpp-python as it simply refuses to use the GPU no matter what I do, despite being built with OpenCL support, but with koboldcpp I get ~1.7t/s with a 13b model.)
  • None of the other backends work right now, but maybe they could work if IPEX is used (hopefully, it's as simple as Intel says it is. It'd still require custom versions of each script though.) Given the fact that IPEX is now supported on Windows (haven't tested that yet, but will with SDUI) maybe it would be worth it to see if that could work.

@simonlui
Copy link

simonlui commented Sep 3, 2023

Sorry, this is going to be a somewhat long post. I've been looking into this a bit but unlike with other areas of user-facing ML, the LLM community vs others communities involved in user-facing ML seems to have a lot more limited options in what can be used to get anything Intel working easily at full speed. For example, in the image generation space, it's much easier to slot in Intel's Extension for Pytorch (IPEX) because everyone is using Pytorch directly one way or another in the software projects and the extension is designed and intended to be pretty easy to insert into a project already using Pytorch. In stark comparison, backends in the LLM space do not use Pytorch directly, there's a lot of lower level programming going into C/C++ and custom libraries and model deployment due to performance considerations and the RAM needed to load these projects which were all but unavailable for the average consumer to acquire. So this means there is no "easy" option to slot in something which would make things easy with the lack of something like Pytorch in the picture.

That wouldn't really be a problem if there is a lower level solution. However, we get into the real main issue which is that Intel is not taking the same path as AMD when it comes to CUDA compatibility. They have a different strategy they have been approaching with regards to what they have been doing as a hardware company for the last couple of years. They consolidated all their software and have unified them under something called oneAPI which is their intention to write something once and deploy everywhere in their ecosystem. That goes from anything higher level like Intel's Extension for Pytorch/TensorFlow to middleware libraries like oneMKL/oneDNN all the way down to Intel's compilers and runtime.

As a result, there is nothing like HIP which Intel is providing to anyone (There is a community project called chipStar trying to take that approach but it still seems too early and when I tried it, it isn't ready to try and even start tackling complex projects). What Intel intends is for people to port their software directly from CUDA into SYCL, a Khronos standard that basically is like OpenCL but with C++ instead of which they had provided an automatic tool here to port over CUDA code. The intention is that the output of the conversion can then with very little effort be modified to support their SYCL extensions with DPC++ and pulling in their libraries which interface with SYCL and then this would be able to target everything Intel from CPU to GPU to FPGAs to custom hardware AI accelerators and etc. SYCL then either will get compiled down to Level-Zero, which is the actual API that will run on Intel's devices or it can compile into AMD ROCm and Nvidia's CUDA too which was announced by Codeplay last year. And as fallback, it will compile to OpenCL which everyone supports.

As a result of the above, I would say that it would take some serious effort to get Intel GPUs working at the moment at full speed for anything. That is not to say it is impossible, but it would take either a new software project to make a backend or some sort of large patch to existing backends to make it happen. It's not like I don't see where Intel's coming from and if their vision actually works, things wouldn't be as difficult to deal with given a possible "write once run anywhere" approach. But as is at the moment, it's not tested enough for people to make that effort and it is very incompatible with CUDA and ROCm efforts even if the APIs roughly do the same thing. Using OpenCL if we're talking about Intel GPUs will get users about roughly halfway but it will never be as optimized as CUDA/ROCm and the extra effort needed to get that last portion of optimization even if CLBLast tomorrow can optimize their existing OpenCL code to run on Intel GPUs is a pretty dim prospect in my opinion. I have no clue what can be done about that in a planned fashion but that seems to be the situation at the moment.

@Jacoby1218
Copy link

Jacoby1218 commented Sep 7, 2023

it appears that HF Transformers might support XPU now huggingface/transformers#25714 which would mean that even if nothing else works, this might. (no quants because no bitsandbytes, but that's also being worked on it seems here: bitsandbytes-foundation/bitsandbytes#747)

@oobabooga
Copy link
Owner Author

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

@simonlui
Copy link

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

Keep in mind Windows native does not work yet because Intel botched their release process and I suspect most people wanting to try would have that. So only Linux and WSL 2 for now. Windows also doesn't support Ahead of Time compilation in earlier version of the Windows Pytorch pip package too which makes running the first pass of anything painful. See intel/intel-extension-for-pytorch#398 and intel/intel-extension-for-pytorch#399 for more information.

@Jacoby1218
Copy link

Intel always manages to botch something when it comes to Arc, so not surprised. Will test this out once i get my WSL2 install back working again.

@Jacoby1218
Copy link

I have added an "Intel Arc" option to the one-click installer that installs the appropriate Pytorch version: 0306b61

The question is if it works when you try to generate text. A good small model for testing is GALACTICA 125M loaded through the transformers loader.

This doesn't work, it checks if CUDA is available and then uses the CPU, rather than trying the extension.
Also, a good idea would be to call "source /opt/intel/oneapi/setvars.sh" from the script, to auto-initialize the oneAPI environment. Otherwise, users might not get it working and wouldn't be able to figure out why.

@simonlui
Copy link

For now, it seems like there are now unofficial Windows PIP packages available here that address both the issues I stated above from one of the WebUI contributors for getting IPEX working optimally on Windows natively. Install at your own risk knowing they are not from Intel and not official.

@ashunaveed
Copy link

intel extension for pytorch supports one version of pytorch and if we change to it in the one click installer file, it is downloading but when as per requirements file, the code is downloading same requirements file which is overwriting the exisiting supporting file and the system is unable to use the intel gpu. can anyone provide a get around to this problem. We need to check which pytorch version is compatible with the intel extension for pytorch module and download those versions only.

@Daroude
Copy link

Daroude commented Oct 17, 2023

Changed the one_click.py so that it downloads and installs the (hopefully) correct pytorch and torch packages, and created a requiremts.txt which may or may not be correct, for Intel Arc since there was none and also added the calls for them in one_click.py.

It downloads and installs the packages but I am stuck at Installing extensions requirements. As soon as this part starts it seems to swtich back to CPU (!?) and installs nvidia packages and uninstalls the intel torch versions.

Update: it looks like the requirements from the various extensions subfolders request the nvidia packages as dependencies for the required packages.

@simonlui
Copy link

New all-in-one Pytorch for Windows packages are available here which is preferable to the other packages I linked earlier as they had dependencies which couldn't easily be satisfied without a requirements.txt detailing them. There does seem to be a bug in the newest Windows drivers as seen in intel/intel-extension-for-pytorch#442, you have to revert to something older than version 4885. Version 4676 here is recommended as that was what was used to build the pip packages.

@Daroude
Copy link

Daroude commented Oct 19, 2023

Wouldn't it be easiest to make an option to compile llama.cpp with CLBlast?

@oobabooga oobabooga unpinned this issue Oct 21, 2023
@fractal-fumbler
Copy link

fractal-fumbler commented Nov 12, 2023

hello :)
can webui be used with arc a770 to launch gptq models?

transformers giving me error WARNING:No GPU has been detected by Pytorch. Falling back to CPU mode.
after clean install it gave me error
AssertionError: Torch not compiled with CUDA enabled

after another clean install now i have this error

raise RuntimeError("GPU is required to quantize or run quantize model.")```

@fractal-fumbler
Copy link

fractal-fumbler commented Nov 12, 2023

https://github.com/intel-analytics/BigDL/tree/main/python/llm

[bigdl-llm](https://bigdl.readthedocs.io/en/latest/doc/LLM/index.html) is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency[1](https://github.com/intel-analytics/BigDL/tree/main/python/llm#user-content-fn-1-bc5e065cc5f0f26d432e4d76786a6dd7) (for any PyTorch model).

can this be used with webui and intel arc gpus?

@djstraylight
Copy link

Seems that Intel has broken the pytorch extension for xpu repo and it's going to a HTTP site instead of https.
Here is a workaround for the one_click.py:
"python -m pip install --trusted-host ec2-52-27-27-201.us-west-2.compute.amazonaws.com torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f 'http://ec2-52-27-27-201.us-west-2.compute.amazonaws.com/ipex-release.php?device=xpu&repo=us&release=stable'"

But seeing other errors related to the PyTorch version:
/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelerate/utils/imports.py:245: UserWarning: Intel Extension for PyTorch 2.0 needs to work with PyTorch 2.0.*, but PyTorch 2.1.0 is found. Please switch to the matching version and run again.

@HubKing
Copy link

HubKing commented Nov 19, 2023

Hello, does it currently work with Intel Arc (on Arch Linux) without much of a problem? I can run Vladmir's automatic1111 on this computer, so maybe I think this could also run, but I am not sure.

PS: I ran the installer and it exited with the following error:

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Looking in links: https://developer.intel.com/ipex-whl-stable-xpu
ERROR: Could not find a version that satisfies the requirement torch==2.0.1a0 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1)
ERROR: No matching distribution found for torch==2.0.1a0
Command '. "/home/username/diffusion/text-generation-webui/installer_files/conda/etc/profile.d/conda.sh" && conda activate "/home/username/diffusion/text-generation-webui/installer_files/env" && conda install -y -k ninja git && python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu && python -m pip install py-cpuinfo==9.0.0' failed with exit status code '1'.

Exiting now.
Try running the start/update script again.

@naptastic
Copy link
Contributor

As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work. In order to get the three necessary wheel files (torch 2.0.1a0, torchvision 0.15.2a0, intel_extension_for_pytorch 2.0.110+xpu) I had to download them as files from the URL provided, then install them with pip.

This is not enough to get ARC support working. The answer still seems to be "it should work, in theory, but nobody's actually done it yet".

@HubKing
Copy link

HubKing commented Nov 28, 2023

As of right now (2023-11-27) Intel's instructions to install their pytorch extension do not work.

Isn't a stupid move from Intel? I mean, Intel should have done their best to make their GPU work with the latest A.I stuff and help developers to achieve it, instead of focusing on games. These days, people constantly talk about A.I., not about triple-A 3D games. This kind of constant frustration with the A.I. apps makes me think about switching to NVidia (if they fix the damn Wayland problem).

Any way, please let us know when it works again.

@simonlui
Copy link

simonlui commented Nov 28, 2023

The packages are there at https://developer.intel.com/ipex-whl-stable-xpu which you can browse, pip just isn't picking them up for whatever reason now with the URL. You need to manually install the packages or directly link the packages that are needed for install. For my Linux install, I had to do the following:

pip install https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torch-2.0.1a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torchvision-0.15.2a0%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/intel_extension_for_pytorch-2.0.110%2Bxpu-cp310-cp310-linux_x86_64.whl

The package versions needed for install will vary depending on what OS platform and Python version is being used on your machine.

@HubKing
Copy link

HubKing commented Nov 28, 2023

It says that the environment is externally managed and try pacman -S python-xyz where xyz is the package. In that case, what do I need to do?

@Jacoby1218
Copy link

As of right now, there are 3 possible ways to get this to work with ARC GPUs:

  1. The Intel Extension for PyTorch, which currently doesn't work on Windows.
  2. OpenVINO with PyTorch dev versions (unsure if this actually would work, OpenVINO needs to be supported by the frontend to be used, and OpenVINO supports LLMs, just haven't seen it used before for something like this)
  3. The new Intel Extension for Transformers: the most promising, supports models converted with llama.cpp (though I don't know if it supports ARC GPUs yet, last I checked, support was forthcoming)

@simonlui
Copy link

1. The Intel Extension for PyTorch, which currently doesn't work on Windows.

As I posted in #3761 (comment), Windows does work with Intel Extension for Pytorch but you need to install a third party package since Intel does not do it at this time. Using the latest Windows drivers now work too. Intel has started in the issue tracker on GitHub they will Windows packaging soon. IPEX is also due for an update soon.

@Jacoby1218
Copy link

I was under the impression there were still driver issues, but if it works now that's great.

@ghost
Copy link

ghost commented Dec 30, 2023

I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions?

It's an Arc A770 on Windows 10. Intel® Graphics Driver 31.0.101.5081/31.0.101.5122 (WHQL Certified). I also tried rolling back to driver 4676 and doing a clean install with the same results. Some of the paths I added were those listed here. I'm also not seeing any of the DLL's listed at that link in those directories. Instead, I have intel-ext-pt-gpu.dll and intel-ext-pt-python.dll in "%PYTHON_ENV_DIR%\lib\site-packages\intel_extension_for_pytorch\bin" and no DLL's in "%PYTHON_ENV_DIR%\lib\site-packages\torch\lib". backend_with_compiler.dll is there.

Traceback (most recent call last) ─────────────────────────────────────────┐
│ C:\text-generation-webui\server.py:6 in <module>                                                                 │
│                                                                                                                     │
│     5                                                                                                               │
│ >   6 import accelerate  # This early import makes Intel GPUs happy                                                 │
│     7                                                                                                               │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\__init__.py:3 in <module>              │
│                                                                                                                     │
│    2                                                                                                                │
│ >  3 from .accelerator import Accelerator                                                                           │
│    4 from .big_modeling import (                                                                                    │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\accelerator.py:32 in <module>          │
│                                                                                                                     │
│     31                                                                                                              │
│ >   32 import torch                                                                                                 │
│     33 import torch.utils.hooks as hooks                                                                            │
│                                                                                                                     │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\__init__.py:139 in <module>                 │
│                                                                                                                     │
│    138                 err.strerror += f' Error loading "{dll}" or one of its dependencies.'                        │
│ >  139                 raise err                                                                                    │
│    140                                                                                                              │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its
dependencies.
Press any key to continue . . .

@HubKing
Copy link

HubKing commented Dec 30, 2023

I updated the code and run it again (did not do anything else). This time, it passed the previous crash "No matching distribution found for torch==2.0.1a0", but after downloading a lot of stuff, it crashed with the following. If I run the script again, I get the same output as below again.

*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************


╭───────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /home/username/diffusion/text-generation-webui/server.py:6 in <module>                                   │
│                                                                                                     │
│     5                                                                                               │
│ ❱   6 import accelerate  # This early import makes Intel GPUs happy                                 │
│     7                                                                                               │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/__init__.py:3 in <module>                                                                        │
│                                                                                                     │
│    2                                                                                                │
│ ❱  3 from .accelerator import Accelerator                                                           │
│    4 from .big_modeling import (                                                                    │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/accelera │
│ te/accelerator.py:32 in <module>                                                                    │
│                                                                                                     │
│     31                                                                                              │
│ ❱   32 import torch                                                                                 │
│     33 import torch.utils.hooks as hooks                                                            │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:234 in <module>                                                                           │
│                                                                                                     │
│    233     if USE_GLOBAL_DEPS:                                                                      │
│ ❱  234         _load_global_deps()                                                                  │
│    235     from torch._C import *  # noqa: F403                                                     │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:193 in _load_global_deps                                                                  │
│                                                                                                     │
│    192         if not is_cuda_lib_err:                                                              │
│ ❱  193             raise err                                                                        │
│    194         for lib_folder, lib_name in cuda_libs.items():                                       │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/__ │
│ init__.py:174 in _load_global_deps                                                                  │
│                                                                                                     │
│    173     try:                                                                                     │
│ ❱  174         ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)                                       │
│    175     except OSError as err:                                                                   │
│                                                                                                     │
│ /home/username/diffusion/text-generation-webui/installer_files/env/lib/python3.11/ctypes/__init__.py:376 │
│ in __init__                                                                                         │
│                                                                                                     │
│   375         if handle is None:                                                                    │
│ ❱ 376             self._handle = _dlopen(self._name, mode)                                          │
│   377         else:                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory

@Jacoby1218
Copy link

@HubKing Run "source /opt/intel/oneapi/setvars.sh" and try again. If you don't have it, make sure to install the oneAPI Basekit.

@Nuullll Nuullll mentioned this issue Jan 7, 2024
1 task
@Nuullll
Copy link
Contributor

Nuullll commented Jan 7, 2024

#5191 would fix most of the env issues for IPEX.

@idelacio
Copy link

idelacio commented Jan 11, 2024

It builds now but on starting I get the attached errors
SadArcLogs.txt

The Nvidia version runs just fine. (same version, rebuilt, both builds tested from C drive (logs are D drive build but same errors))

Running Windows Server 2019
Dual card setup-
Arc 770 16GB in primary PCIE slot
3060 12GB in secondary

@Sawyer73
Copy link

It builds now but on starting I get the attached errors SadArcLogs.txt

The Nvidia version runs just fine. (same version, rebuilt, both builds tested from C drive (logs are D drive build but same errors))

Running Windows Server 2019 Dual card setup- Arc 770 16GB in primary PCIE slot 3060 12GB in secondary

Same here. Adding 'share' flag does remove localhost error message, but when I try to get through localhost or even a gradio link, it loads a blank screen. Basically links works, but there's nothing on them.

@idelacio
Copy link

It now builds and interface loads from main branch version.

Not sure how to run models from the card though, AWQ and GPTQ don't work at all and error out and GGUF just works from the CPU.

@kcyarn
Copy link

kcyarn commented Jan 18, 2024

I'm running an Intel Arc A770 as a non-display GPU on Ubuntu 23.10. (Intel i7-13700k handles the display.) Selecting the Intel GPU option during oobabooga's first run did not load models to the GPU. In case anyone else experiences this problem, here's what worked for me.

This assumes the following in Ubuntu:

  1. Graphics drivers for Intel Arc installed (see Intel for directions).
  2. Intel OneApi installed.
  3. Username is in the renderer group.
  4. Hangcheck timeout disabled.
  5. OneApi is initialized.

Intel suggests several different ways to initialize OneApi. Per their directions, I added the following line to .bashrc and rebooted.

source /opt/intel/oneapi/setvars.sh

This eliminates the error OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory.

The Intel extension for pytorch was correctly installed along with all of the other dependencies. No issues there, but it still wasn't loading anything in the GPU. To fix this, I needed to recompile llama-cpp-python.

I'm leaving the below for now because it did eliminate some errors. However, it's a mirage. It's not actually using the GPU.

cd text-generation-webui
./cmd_linux.sh
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON" pip install llama-cpp-python

For the cmake arguments, I used llama.cpp's Intel OneMKL arguments.

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:

llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU

@djstraylight
Copy link

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:

llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU

Did you use intel-gpu-top to verify that it is actually using the GPU?

@kcyarn
Copy link

kcyarn commented Jan 19, 2024

And now loading llama2-7b (gguf) with n-gpu-layers set to its maximum value results in:

llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU

Did you use intel-gpu-top to verify that it is actually using the GPU?

I'm getting some really odd intel-gpu-top results. It blips when the model loads and then does nothing, leading me to think this is another mirage.

In comparison, in llama.cpp, Blitter hits 80% with 30 layers on the same model. But that's compiled with clblast and needs platform and device environment variables.

@djstraylight
Copy link

I found the same thing. Using -DLLAMA_BLAS_VENDOR=Intel10_64lp doesn't actually offload the processing to the Intel GPU.

I compiled with clblast and that actually was using my ARC GPU but the LLM was spitting out gibbish. Still some bug hunting needed.

@kcyarn
Copy link

kcyarn commented Jan 19, 2024

I found the same thing. Using -DLLAMA_BLAS_VENDOR=Intel10_64lp doesn't actually offload the processing to the Intel GPU.

I compiled with clblast and that actually was using my ARC GPU but the LLM was spitting out gibbish. Still some bug hunting needed.

So after spending a few hours experimenting with llama.cpp and llama-cpp-python, I got them both running on the gpu last night. I got oobabooga running on the Intel arc gpu a few minutes ago.

This is using llama-2-7b-chat.Q5_K_M.gguf with llama.cpp and 30 n-gpu-layers.

intel_gpu_top screenshot while running oobabooga

Oobabooga output using the Intel Arc A770 GPU

No gibberish and it corrected the grammar error in the prompt. :)

I'm not sure how user-friendly we'll be able to make running this nor have I stress tested this beyond a few pithy prompts. For reference, I'm using Ubuntu 23.10 (mantic). To compile with clblast, I needed libclblast-dev >= 1.6.1-1 and the most recent stable Intel drivers. I'm happy to dig into the dependencies more, if needed.

(The below assumes you've run ./start_linux.sh for the first time.)

Step 1

Open 2 terminals.

In the first, run

clinfo - l

In the second, run

cd text-generation-webui
./cmd_linux.sh
clinfo -l

Here's the output from my system. As you can see, conda doesn't know a GPU exists.

Ubuntu output:

Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K
Platform #2: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #3: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) UHD Graphics 770

Inside conda:

Platform #0: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K 

Note, installing olc-icd-system in conda (the semi-official fix) did not work.

Step 2

Conda needs your system's OpenCL vendor .icd files. On Ubuntu, these are at /etc/OpenCL/vendors/.

In terminal, cd into the text-generation-webui directory. (Just the basic terminal, not cmd_linux.sh)

Run

rm -r ./installer_files/env/etc/OpenCL/vendors/
mkdir ./installer_files/env/etc/OpenCL/vendors/
ln -s /etc/OpenCL/vendors/*.icd ./installer_files/env/etc/OpenCL/vendors/

This deletes conda's OpenCL vendors directory, recreates it, and then creates symlinks to Ubuntu's icd files.

.\cmd_linux.sh

Recheck conda's clinfo.

clinfo -l

My output is now:

Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #1: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) UHD Graphics 770
Platform #2: Intel(R) OpenCL
 `-- Device #0: 13th Gen Intel(R) Core(TM) i7-13700K
Platform #3: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device

The platform numbers are different from what they are in Ubuntu, which changes llama.cpp's GGML_OPENCL_PLATFORM environment variable. (For now, just paste the output somewhere. You'll need it in a minute.)

Step 3

Recompile llama-cpp-python in the .\cmd_linux.sh terminal.

pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CLBLAST=ON" FORCE_CMAKE=1 pip install --no-cache-dir llama-cpp-python

Step 4

In terminal (not .\cmd_linux.sh), cd into the text-generation-webui directory if you're not still there.

Go to conda' clinfo -l output and note the platform number for your graphics card and the card name beside it's device. You don't need the full name, just the letters and number.

I'm using this bit:

Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A770 Graphics

Edit your platform number and device name. Then run the exports in the terminal.

export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE=A770
./start_linux.sh

It worked.

Admittedly, it's not as snappy as running llama2-7b in BigDL on the same GPU, but it's a massive speed improvement over the cpu.

On my system, this only works if I use the exports to tell it what to use. I don't know if you'll need to do that on a system that only has one display option. (I'm using the cpu for display.)

Oobabooga was a fresh download.

@kcyarn
Copy link

kcyarn commented Jan 21, 2024

Draft Guide for Running Ooobabooga on Intel Arc

More eyes and testers are needed before considering submission to the main repository.

Installation Notes

Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option.

Working Model Loaders

  • llama.cpp
  • transformers

The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.)

Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community.

Models Tested

  • transformers

    • llama2-7b-chat-hf
    • mistralai_Mistral-7B-Instruct-v0.2
  • llama.cpp

    • llama-2-7b-chat.Q5_K_M.gguf
    • mistral-7b-instruct-v0.2.Q5_K_M.gguf

What Isn't Tested

  • Most models
  • Training
  • Parameters
  • Extensions
  • Regular use beyond "does it load and run a few simple prompts"

Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension.

Install Notes

  • Latest Intel Arc drivers installed. See Intel client GPU installation docs.

  • Intel OneAPI basekit installed

  • Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev

    Note: libclblast-dev >= 1.6

  • Your username is part of the renderer group.

  • You have hangcheck disabled in grub.

The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates.

Test Machine Details

  • Ubuntu 23.10
  • 6.5.0.14.16 generic linux
  • i7-13700k CPU (runs the display)
  • Intel Arc A770 (non-display)

Bash Scripts

Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory.

Getting Started

  1. Download or clone a fresh copy of Oobabooga.

  2. Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc.

  3. Make them executable.

    cd text-generation-webui
    ./install_arch.sh
  4. Check clinfo for your hardware information.

    clinfo -l
  5. In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file.

  6. Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script.

    ./run_arch.sh

Both the scripts below were uploaded to github. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py.

install_arch.sh

#!/bin/bash

# Check if the virtual environment already exists
if [[ ! -d "venv" ]]; then
    # Create the virtual environment
    python -m venv venv
fi

# Activate the virtual environment
source venv/bin/activate

# Intel extension for transformers recently added Arc support.
# See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.
# Working model loaders:
#  - llama.cpp
#  - transformers

pip install intel-extension-for-transformers

# Install xpu intel pytorch, not cpu.

pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

# Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.

# Install a few of the dependencies for the below.
pip install coloredlogs datasets sentencepiece

pip install --no-deps peft==0.7.* optimum==1.16.* optimum-intel accelerate==0.25.*

# Skip llama-cpp-python install and all installed above without their deps.

grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt

pip install -r temp_requirements.txt

# Install the cpuinfo dependency installed by one_click
pip install py-cpuinfo==9.0.0

# Use the correct cmake args for llama-cpp

export CMAKE_ARGS="-DLLAMA_CLBLAST=ON"
export FORCE_CMAKE=1

pip install --no-cache-dir llama-cpp-python

# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
cd extensions

extensions=()  # Create an empty array to store folder names
# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2)

for folder in */; do
    extensions+=($folder)
done

echo "${extensions[*]}"

install_extensions=()

for ext in "${extensions[@]}"; do
    should_exclude=false

    for exclude_ext in "${exclude_extensions[@]}"; do
        if [[ "$ext" == *"$exclude_ext"* ]]; then
            should_exclude=true
            break
        fi
    done

    if [ "$should_exclude" = false ]; then
        install_extensions+=("$ext")
    fi
done

# Print the install_extensions
# echo "${install_extensions[@]}"

for extension in ${install_extensions[@]}; do
    cd "$extension"
    echo -e "\n\n$extension\n\n"
    # Install dependencies from requirements.txt
    if [ -e "requirements.txt" ]; then
        echo "Installing requirements in $dir"
        pip install -r requirements.txt
    else
        echo "No requirements.txt found in $dir"
    fi
    cd ..
done
# Leave the extension directory.
cd ..

# Delete the temp_requirements.txt file.

rm temp_requirements.txt

run_arch.sh

#!/bin/bash
# Uncomment if oneapi is not in your .bashrc
# source /opt/intel/oneapi/setvars.sh
# Activate virtual environment built with install_arc.sh. (Not conda!)
source venv/bin/activate

# Change these values to match your card in clinfo -l
# Needed by llama.cpp

export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770

# Use sudo intel_gpu_top to view your card.

# Capture command-line arguments
flags_from_cmdline=$@

# Read flags from CMD_FLAGS.txt
flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$')
# Combine flags from both sources
all_flags="$flags_from_file $flags_from_cmdline"

# Run the Python script with the combined flags
python server.py $all_flags

@djstraylight
Copy link

@kcyarn Great work on getting XPU/OpenCL more integrated with text-generation-webui!

@thejacer
Copy link

thejacer commented Jan 28, 2024

Draft Guide for Running Ooobabooga on Intel Arc

More eyes and testers are needed before considering submission to the main repository.

Tried this in WSL running Ubuntu 22.04, here are some notes:

  1. libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk)
  2. I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them.
  3. After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv"
  4. Despite no errors other than what I've outlined here I still get 0 platforms for clinfo
  5. Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh

Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are.

@kcyarn
Copy link

kcyarn commented Jan 28, 2024 via email

@thejacer
Copy link

thejacer commented Feb 1, 2024

It sounds like either the GPU isn't passing through to WSL2 or there's a missing dependency. Which version of Ubuntu are you using on WSL2? I'm the using the most recent release, not the LTS, because the newer kernels work better with this card. You may want to try upgrading the release. Have you tried this Intel guide to get the card running in WSL2? https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html It'll be a few days before I can run any WSL2 tests.

________________________________ From: thejacer @.> Sent: Sunday, January 28, 2024 5:30:34 AM To: oobabooga/text-generation-webui @.> Cc: Kristle Chester @.>; Mention @.> Subject: Re: [oobabooga/text-generation-webui] Intel Arc thread (Issue #3761) Draft Guide for Running Ooobabooga on Intel Arc More eyes and testers are needed before considering submission to the main repository. Installation Notes Although editing conda's OpenCL vendor files is a viable option, swapping to a standard python3 install and using a venv resulted in improved performance in tokens/s by approximately 71% across all tested models. It also eliminates possible issues with older conda libraries and bleeding-edge ones needed for Intel Arc. For now, skipping conda and its CDTs appears to be the most reliable option. Working Model Loaders * llama.cpp * transformers The latest Intel extension for transformers added INT4 inference support for Arc. Hugging Face transformers committed XPU support for the trainer in September '23. If any of the other model loaders use transformers, they may run with little effort. (They may also require a fairly major fork. In which case, adding a BigDLhttps://github.com/intel-analytics/BigDL model loader is probably a better use of energy. That's just my opinion. My BigDL experiments are still in Jupyter notebooks, but it's been a good experience on both the Intel GPU and the CPU.) Note: Loaders are hardcoded in modules/loaders.py. Without refactoring this to be more modular like extensions or [shudder] monkeypatching, we just need to remember which ones work with our individual system. Making it more modular and customizable for different combinations of CPUs and GPUs is a much broader discussion than getting this working on the Intel Arc. It would also need a lot of buy-in and commitment from the community. Models Tested * transformers * llama2-7b-chat-hf * mistralai_Mistral-7B-Instruct-v0.2 * llama.cpp * llama-2-7b-chat.Q5_K_M.gguf * mistral-7b-instruct-v0.2.Q5_K_M.gguf What Isn't Tested * Most models * Training * Parameters * Extensions * Regular use beyond "does it load and run a few simple prompts" Note: Coqui_tss, silero_tts, whisper_stt, superbooga, and superboogav2 are all breaking installs. It may be possible to install their requirements without any dependencies and then pick up the additional dependencies during debugging. TTS, in particular, upgrades torch to the wrong version for the Intel extension. Install Notes * Latest Intel Arc drivers installed. See Intel client GPU installation docs.https://dgpu-docs.intel.com/driver/client/overview.html * Intel OneAPI basekit installedhttps://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html * Install opencl-headers ocl-icd libclblast-dev python3 python3-pip python3-venv libgl1 libglib2.0-0 libgomp1 libjemalloc-dev Note: libclblast-dev >= 1.6 * Your username is part of the renderer group. * You have hangcheck disabled in grub. The last two items are just standard things I do with a fresh install or new graphics card. They may no longer be necessary. If you've already installed these, check for updates. Intel kicked off 2024 with a lot of updates. Test Machine Details * Ubuntu 23.10 * 6.5.0.14.16 generic linux * i7-13700k CPU (runs the display) * Intel Arc A770 (non-display) Bash Scripts Below are 2 bash scripts: install_arch.sh and run_arch.sh. They need to be saved or symlinked to the text-generation-webui directory. Getting Started 1. Download or clone a fresh copy of Oobabooga. 2. Save the below scripts into text-generation-webui. These should be in the same folder as one_click.py, cmd_linux.sh, etc. 3. Make them executable. cd text-generation-webui ./install_arch.sh 4. Check clinfo for your hardware information. clinfo -l 5. In run_arc.sh, find GGML_OPENCL_PLATFORM and change it to your platform number. Then change the GGML_OPENCL_DEVICE to your device name. Save the file. 6. Start the server with run_arch.sh. This uses any flags you've saved in CMD_FLAGS.txt. You can also use flags like --listen --api with the script. ./run_arch.sh Both the scripts below were uploaded to githubhttps://github.com/kcyarn/oobabooga_intel_arc. This is just a starting point. Changes welcome. Once it's right in bash, we can decide whether to integrate it with oobabooga's start_linux.sh, requirements files, and one_click.py. install_arch.sh #!/bin/bash # Check if the virtual environment already exists if [[ ! -d "venv" ]]; then # Create the virtual environment python -m venv venv fi # Activate the virtual environment source venv/bin/activate # Intel extension for transformers recently added Arc support. # See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies. # Working model loaders: # - llama.cpp # - transformers pip install intel-extension-for-transformers # Install xpu intel pytorch, not cpu. pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ # Installing these from requriements_cpu_only.txt causes dependency with intel pytorch. # Install a few of the dependencies for the below. pip install coloredlogs datasets sentencepiece pip install --no-deps peft==0.7.* optimum==1.16.* optimum-intel accelerate==0.25.* # Skip llama-cpp-python install and all installed above without their deps. grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt pip install -r temp_requirements.txt # Install the cpuinfo dependency installed by one_click pip install py-cpuinfo==9.0.0 # Use the correct cmake args for llama-cpp export CMAKE_ARGS="-DLLAMA_CLBLAST=ON" export FORCE_CMAKE=1 pip install --no-cache-dir llama-cpp-python # List of extensions to exclude # Exclude coqui_tss because it causes torch dependency issues with intel gpus. # Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation. cd extensions extensions=() # Create an empty array to store folder names # List of extensions to exclude # Exclude coqui_tss because it causes torch dependency issues with intel gpus. # Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation. exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2) for folder in /; do extensions+=($folder) done echo "${extensions[]}" install_extensions=() for ext in "${extensions[@]}"; do should_exclude=false for exclude_ext in "${exclude_extensions[@]}"; do if [[ "$ext" == "$exclude_ext" ]]; then should_exclude=true break fi done if [ "$should_exclude" = false ]; then install_extensions+=("$ext") fi done # Print the install_extensions # echo "${install_extensions[@]}" for extension in ${install_extensions[@]}; do cd "$extension" echo -e "\n\n$extension\n\n" # Install dependencies from requirements.txt if [ -e "requirements.txt" ]; then echo "Installing requirements in $dir" pip install -r requirements.txt else echo "No requirements.txt found in $dir" fi cd .. done # Leave the extension directory. cd .. # Delete the temp_requirements.txt file. rm temp_requirements.txt run_arch.sh #!/bin/bash # Uncomment if oneapi is not in your .bashrc # source /opt/intel/oneapi/setvars.sh # Activate virtual environment built with install_arc.sh. (Not conda!) source venv/bin/activate # Change these values to match your card in clinfo -l # Needed by llama.cpp export GGML_OPENCL_PLATFORM=2 export GGML_OPENCL_DEVICE=A770 # Use sudo intel_gpu_top to view your card. # Capture command-line arguments flags_from_cmdline=$@ # Read flags from CMD_FLAGS.txt flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$') # Combine flags from both sources all_flags="$flags_from_file $flags_from_cmdline" # Run the Python script with the combined flags python server.py $all_flags Tried this in WSL running Ubuntu 22.04, here are some notes: 1. libclblast-dev >= 1.6 - this package is only available via default repos in 1.6+ on Ubuntu 23.10 (might be available on other flavors idk) 2. I was able to go grab and install the 1.6 .deb from the repos plus a libclblast1 package listed as dependency from the repos and install them. 3. After following your instructions on a new Ubuntu "python -m venv venv" wouldn't work, I had to change it to "python3 -m venv venv" 4. Despite no errors other than what I've outlined here I still get 0 platforms for clinfo 5. Again despite no errors other than what's above I get "OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory" when I run run_arch.sh Sorry if this isn't helpful, I've never run WSL before so I'm not sure what the limitations are. — Reply to this email directly, view it on GitHub<#3761 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACVH5H5R22HQ7FL5AKTOWLYQYSEVAVCNFSM6AAAAAA4E2AWXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGU2DQMJRG4. You are receiving this because you were mentioned.Message ID: @.***>

I'm sorry work required I make a short (no) notice trip out of town and I can't experiment remotely cause it might shut down my system. I'll be back in town in a day or two and able to start working on it again. Regarding the WSL version, I was using WSL 1 just because the WSL instructions for oobabooga said to use WSL1 for Windows 10 and WSL2 for Windows 11.

@thejacer
Copy link

thejacer commented Feb 4, 2024

I've ditched my old WSL and restarted with Ubuntu 23.10 using WSL2 however:
clinfo -l

Platform #0: Intel(R) OpenCL Graphics
-- Device #0: Intel(R) Graphics [0x56a0]

clpeak indicates 512 compute units etc.

but Oobabooga fails to find my device.

EDIT: activated the venv and from within it was able to run clinfo -l with the same results as above and clpeak also sees gpu with 512 compute units as well. I honestly don't understand because intel_gpu_top also says there's no gpu installed.

@kcyarn
Copy link

kcyarn commented Feb 4, 2024 via email

@kcyarn
Copy link

kcyarn commented Feb 4, 2024 via email

@thejacer
Copy link

thejacer commented Feb 4, 2024

Platform Name Intel(R) OpenCL Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x56a0]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device UUID 8680a056-0800-0000-0300-000000000000
Driver UUID 32332e33-352e-3237-3139-312e34320000
Valid Device LUID No
Device LUID 4005-9721ff7f0000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 23.35.27191.42
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0)
__opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0)
__opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0)
__opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0)
Latest conformance test passed v2023-05-16-00
Device Type GPU
Device PCI bus info (KHR) PCI-E, 0000:03:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 512
Max clock frequency 2400MHz
Device IP (Intel) 0x30dc008 (12.220.8)
Device ID (Intel) 22176
Slices (Intel) 8
Sub-slices per slice (Intel) 8
EUs per sub-slice (Intel) 8
Threads per EU (Intel) 8
Feature capabilities (Intel) DP4A, DPAS
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (device) 64
Preferred work group size multiple (kernel) 64
Max sub-groups per work group 128
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
External memory handle types DMA buffer
Global memory size 16704737280 (15.56GiB)

clinfo definitely sees my GPU and has it correctly at 16GB vram.

Change these values to match your card in clinfo -l
Needed by llama.cpp

export GGML_OPENCL_PLATFORM=0
export GGML_OPENCL_DEVICE="Intel(R) Graphics [0x56a0]"

Use sudo intel_gpu_top to view your card.

That is the current set up in run_arch.sh but intel_gpu_top is still not finding my gpu.

thejacer@DESKTOP-9DLUMOO:~/text-generation-webui$ glxinfo | grep OpenGL
DRI3 not available
failed to load driver: zink
OpenGL vendor string: Mesa
OpenGL renderer string: llvmpipe (LLVM 15.0.7, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 24.0.0-devel (git-3ca1f35cbf)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 24.0.0-devel (git-3ca1f35cbf)
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.0.0-devel (git-3ca1f35cbf)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

I've seen some comments online that the OpenGL renderer string shouldn't be llvm if I'm using gpu but I haven't figured out how to change that yet.

@kcyarn
Copy link

kcyarn commented Feb 4, 2024 via email

@thejacer
Copy link

thejacer commented Feb 4, 2024 via email

@kcyarn
Copy link

kcyarn commented Feb 7, 2024

Edited above, sorry.

I now have oobabooga llama.cpp (gguf only) working in WSL2 Ubuntu 22.04. This uses the older backend clblast. The newer ones are really nice, but I went with what I was familiar with. I've added everything to the wsl_scripts folder at oobabooga_intel_arc.

Given the complexity on the wsl side, Docker might be the best direction for this one.

Here's a screenshot showing it using the gpu with WSL2 on Windows 11. You may need the insiders version on Windows 10.
Screenshot 2024-02-07 013213

@thejacer
Copy link

thejacer commented Feb 7, 2024 via email

@thejacer
Copy link

thejacer commented Feb 7, 2024

Edited above, sorry.

I now have oobabooga llama.cpp (gguf only) working in WSL2 Ubuntu 22.04. This uses the older backend clblast. The newer ones are really nice, but I went with what I was familiar with. I've added everything to the wsl_scripts folder at oobabooga_intel_arc.

Given the complexity on the wsl side, Docker might be the best direction for this one.

Here's a screenshot showing it using the gpu with WSL2 on Windows 11. You may need the insiders version on Windows 10. Screenshot 2024-02-07 013213

All of this installed new packages:

sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt update
sudo apt upgrade -y

sudo apt install -y vainfo

sudo apt install -y mesa-va-drivers

Run vainfo --display drm --device /dev/dri/card0
Output is the UHD Graphics 770

One of the simpler ways to test whether its using the GPU.
Also grabs a lot of the dependencies needed later.
sudo apt install -y ffmpeg
sudo apt install -y gstreamer1.0-plugins-bad gstreamer1.0-tools gstreamer1.0-vaapi

This was new:

sudo usermod -a -G video ${USER}

And this installed new packages:

sudo apt-get install x11-apps -y

I also added all of those lines, which were all missing, to the bottom of my .bashrc. With those changes I was able to see new information for my cpu and integrated graphics when running clinfo, I could see my gpu when running vainfo and my renderer string is now my A770 when I run glxinfo | grep OpenGL. I still can't see my gpu when I run intel_gpu_top though. All of this resulted in a .gguf loading into my gpu(!) without rebuilding clblast when running on the text gen ui I tried setting up days ago. It still utilized about 80% of my cpu and only ~20% gpu when running inference though. About to rebuild clblast and see what how it goes.

No change after rebuilding clblast and llama.cpp. I might have messed this part up though, I got lost in the comments on that block. I'll keep trying.

@DDXDB
Copy link

DDXDB commented Mar 2, 2024

在英特尔锐炫上运行 Ooobabooga 的指南草案

在考虑提交到主存储库之前,需要更多的眼睛和测试人员。

安装说明

尽管编辑 conda 的 OpenCL 供应商文件是一个可行的选择,但切换到标准 python3 安装并使用 venv 可使所有测试模型的令牌/秒性能提高约 71%。它还消除了较旧的 conda 库和英特尔锐炫所需的尖端库可能出现的问题。就目前而言,跳过 conda 及其 CDT 似乎是最可靠的选择。

工作模型装载机

  • llama.cpp
  • 变形金刚

最新的英特尔转换器扩展增加了对 Arc 的 INT4 推理支持。Hugging Face transformers 在 23 年 9 月承诺为培训师提供 XPU 支持。如果任何其他模型加载器使用变压器,它们可以毫不费力地运行。(他们可能还需要一个相当大的分叉。在这种情况下,添加 BigDL 模型加载器可能更能利用能源。这只是我的看法。我的 BigDL 实验仍在 Jupyter 笔记本中,但在 Intel GPU 和 CPU 上都获得了很好的体验。

注意:加载程序在 modules/loaders.py 中是硬编码的。在不将其重构为更模块化的扩展或[不寒而栗的]猴子修补的情况下,我们只需要记住哪些适用于我们的个人系统。使其更加模块化和可定制,以适应 CPU 和 GPU 的不同组合,这比在英特尔锐炫上工作要广泛得多。它还需要社区的大量支持和承诺。

测试的模型

  • 变形金刚

    • 骆驼2-7B-聊天-HF
    • mistralai_Mistral-7B-指令-v0.2
  • llama.cpp

    • 骆驼-2-7b-聊天。Q5_K_M.gguf
    • 米斯特拉尔-7b-instruct-v0.2.Q5_K_M.gguf

未测试的内容

  • 大多数型号
  • 训练
  • 参数
  • 扩展
  • 除了“它是否加载并运行一些简单的提示”之外,还经常使用

注意:Coqui_tss、silero_tts、whisper_stt、superbooga 和 superboogav2 都是中断安装。可以在没有任何依赖项的情况下安装其要求,然后在调试期间选取其他依赖项。特别是 TTS,将割炬升级到英特尔扩展的错误版本。

安装说明

最后两项只是我使用全新安装或新显卡执行的标准操作。它们可能不再是必需的。如果您已经安装了这些软件,请检查是否有更新。英特尔在 2024 年以大量更新拉开了序幕。

测试机详细信息

  • Ubuntu的 23.10
  • 6.5.0.14.16 通用 Linux
  • i7-13700k CPU(运行显示器)
  • 英特尔锐炫 A770(非显示)

Bash 脚本

以下是 2 个 bash 脚本: 和 .它们需要保存或符号链接到目录。install_arch.sh``run_arch.sh``text-generation-webui

开始

  1. 下载或克隆 Oobabooga 的新副本。
  2. 将以下脚本保存到 .它们应与 、 等位于同一文件夹中。text-generation-webui``one_click.py``cmd_linux.sh
  3. 使它们可执行。
    cd text-generation-webui
    ./install_arch.sh
  4. 检查您的硬件信息。clinfo
    clinfo -l
  5. 在 中,找到并将其更改为您的平台编号。然后更改为您的设备名称。保存文件。run_arc.sh``GGML_OPENCL_PLATFORM``GGML_OPENCL_DEVICE
  6. 使用 启动服务器。这将使用您保存在 中的任何标志。您也可以像脚本一样使用标志。run_arch.sh``CMD_FLAGS.txt``--listen --api
    ./run_arch.sh

下面的两个脚本都上传到了 github。这只是一个起点。欢迎更改。一旦它在 bash 中正确,我们就可以决定是否将其与 oobabooga 的start_linux.sh、需求文件和one_click.py集成。

install_arch.sh

#!/bin/bash

# Check if the virtual environment already exists
if [[ ! -d "venv" ]]; then
    # Create the virtual environment
    python -m venv venv
fi

# Activate the virtual environment
source venv/bin/activate

# Intel extension for transformers recently added Arc support.
# See https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb for additional notes on the dependencies.
# Working model loaders:
#  - llama.cpp
#  - transformers

pip install intel-extension-for-transformers

# Install xpu intel pytorch, not cpu.

pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

# Installing these from requriements_cpu_only.txt causes dependency with intel pytorch.

# Install a few of the dependencies for the below.
pip install coloredlogs datasets sentencepiece

pip install --no-deps peft==0.7.* optimum==1.16.* optimum-intel accelerate==0.25.*

# Skip llama-cpp-python install and all installed above without their deps.

grep -v -e peft -e optimum -e accelerate -e llama-cpp-python requirements_cpu_only.txt > temp_requirements.txt

pip install -r temp_requirements.txt

# Install the cpuinfo dependency installed by one_click
pip install py-cpuinfo==9.0.0

# Use the correct cmake args for llama-cpp

export CMAKE_ARGS="-DLLAMA_CLBLAST=ON"
export FORCE_CMAKE=1

pip install --no-cache-dir llama-cpp-python

# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
cd extensions

extensions=()  # Create an empty array to store folder names
# List of extensions to exclude
# Exclude coqui_tss because it causes torch dependency issues with intel gpus.
# Whisper_stt and silero_tss both force pytorch updates as dependency of dependency situation. May be possible to use without dependency installation.
exclude_extensions=(coqui_tts silero_tts whisper_stt superbooga superboogav2)

for folder in */; do
    extensions+=($folder)
done

echo "${extensions[*]}"

install_extensions=()

for ext in "${extensions[@]}"; do
    should_exclude=false

    for exclude_ext in "${exclude_extensions[@]}"; do
        if [[ "$ext" == *"$exclude_ext"* ]]; then
            should_exclude=true
            break
        fi
    done

    if [ "$should_exclude" = false ]; then
        install_extensions+=("$ext")
    fi
done

# Print the install_extensions
# echo "${install_extensions[@]}"

for extension in ${install_extensions[@]}; do
    cd "$extension"
    echo -e "\n\n$extension\n\n"
    # Install dependencies from requirements.txt
    if [ -e "requirements.txt" ]; then
        echo "Installing requirements in $dir"
        pip install -r requirements.txt
    else
        echo "No requirements.txt found in $dir"
    fi
    cd ..
done
# Leave the extension directory.
cd ..

# Delete the temp_requirements.txt file.

rm temp_requirements.txt

run_arch.sh

#!/bin/bash
# Uncomment if oneapi is not in your .bashrc
# source /opt/intel/oneapi/setvars.sh
# Activate virtual environment built with install_arc.sh. (Not conda!)
source venv/bin/activate

# Change these values to match your card in clinfo -l
# Needed by llama.cpp

export GGML_OPENCL_PLATFORM=2
export GGML_OPENCL_DEVICE=A770

# Use sudo intel_gpu_top to view your card.

# Capture command-line arguments
flags_from_cmdline=$@

# Read flags from CMD_FLAGS.txt
flags_from_file=$(grep -v '^#' CMD_FLAGS.txt | grep -v '^$')
# Combine flags from both sources
all_flags="$flags_from_file $flags_from_cmdline"

# Run the Python script with the combined flags
python server.py $all_flags

Is there a native windows solution?
Existing mian start_windows didn't work for me,
llama.cpp runs on the CPU.
and the program always reminds me that I don't have CUDA

@kcyarn
Copy link

kcyarn commented Apr 8, 2024

Not that I'm aware of. Theoretically, it's possible to install native Windows python and the Intel drivers and then use the Linux install without Anaconda shell scripts as a guide to install and run using pip. It depends on whether the Intel drivers support Windows for the necessary libraries and whether there are wheels. If you want to give it a go, I'd start with llama.cpp. If you can get it running natively on the Windows side, move on to llama-cpp-python. Once you have that running (I used a jupyter notebook when I was troubleshooting this), then you have the foundation for oobabooga.

The WSL2 solutions work, but they're really slow. I suspect WSL needs a major kernel update. It flies in Ubuntu, which is my daily driver.

@DDXDB
Copy link

DDXDB commented Apr 15, 2024

Not that I'm aware of. Theoretically, it's possible to install native Windows python and the Intel drivers and then use the Linux install without Anaconda shell scripts as a guide to install and run using pip. It depends on whether the Intel drivers support Windows for the necessary libraries and whether there are wheels. If you want to give it a go, I'd start with llama.cpp. If you can get it running natively on the Windows side, move on to llama-cpp-python. Once you have that running (I used a jupyter notebook when I was troubleshooting this), then you have the foundation for oobabooga.

The WSL2 solutions work, but they're really slow. I suspect WSL needs a major kernel update. It flies in Ubuntu, which is my daily driver.

I tried compiling llama-cpp-python for sycl and replacing llama-cpp-python for webui, but it didn't work

@github-actions github-actions bot added the stale label Jun 14, 2024
Copy link

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

16 participants