Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD thread #3759

Open
oobabooga opened this issue Aug 30, 2023 · 270 comments
Open

AMD thread #3759

oobabooga opened this issue Aug 30, 2023 · 270 comments

Comments

@oobabooga
Copy link
Owner

oobabooga commented Aug 30, 2023

This thread is dedicated to discussing the setup of the webui on AMD GPUs.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all AMD users.

@oobabooga oobabooga pinned this issue Aug 30, 2023
@MistakingManx
Copy link

Why no AMD for Windows?

@BarfingLemurs
Copy link

@lufixSch
Copy link

Does someone has a working AutoGPTQ setup?

Mine was really slow when I installed the wheel: https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl

When building from source, the text generation is much faster but the output is just gibberish.

I am running on a RX 6750 XT if this is important.

@MistakingManx
Copy link

@MistakingManx there is, you have to diy a llama cpp python build. It will be harder to setup than Linux.

Why exactly do models prefer a GPU instead of a CPU? Mine is running quick on CPU, but OBS kills it off due to OBS using so much.

@BarfingLemurs
Copy link

Why exactly do models prefer

users prefer. Since:

an AMD gpu comparable with 3090 may work at ~20t/s for 34B model.

@MistakingManx
Copy link

MistakingManx commented Sep 3, 2023

I have an AMD Radeon RX 5500 XT, is that good?
My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS.
Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

@CNR0706
Copy link

CNR0706 commented Sep 3, 2023

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split'
--- System ---
GPU: RX 6700XT
CPU: R5 3600
RAM: 16 GiB
OS: openSuSE Tumbleweed (up to date)
Kernel: Linux 6.4.11-1-default
GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6
ROCm: 6.1, from AMD's SuSE repo

@henrittp
Copy link

henrittp commented Sep 3, 2023

I'm having trouble getting the WebUI to even launch. I'm using ROCm 6.1 on openSuSE Tumbleweed Linux with a 6700XT.

I used the 1 click installer to set it up (and I selected ROCm support) but after the installation finished it just threw an error:

cnr07@opensuse-linux-gpc:~/oobabooga_linux> ./start_linux.sh Traceback (most recent call last): File "/home/cnr07/oobabooga_linux/text-generation-webui/server.py", line 28, in <module> from modules import ( File "/home/cnr07/oobabooga_linux/text-generation-webui/modules/training.py", line 21, in <module> from peft import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module> from .auto import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/auto.py", line 31, in <module> from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module> from .peft_model import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module> from .tuners import ( File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module> from .lora import LoraConfig, LoraModel File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module> import bitsandbytes as bnb File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module> from . import nn File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module> setup.run_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup cuda_version_string = get_cuda_version() File "/home/cnr07/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version major, minor = map(int, torch.version.cuda.split(".")) AttributeError: 'NoneType' object has no attribute 'split' --- System --- GPU: RX 6700XT CPU: R5 3600 RAM: 16 GiB OS: openSuSE Tumbleweed (up to date) Kernel: Linux 6.4.11-1-default GPU Driver: AMDGPU FOSS Kernel driver, full Mesa 23.1.6 ROCm: 6.1, from AMD's SuSE repo

same issue here. Still no solution for me. Anyone can gimme some light here? ty in advance.

@CNR0706
Copy link

CNR0706 commented Sep 3, 2023

Okay, so this is definitely not idea but I found that VERY carefully following the manual installation guide and then uninstalling bitsandbytes makes it work. I'm still figuring things out but at least it works now.

@henrittp
Copy link

henrittp commented Sep 3, 2023

then uninstalling bitsandbytes makes it work

Then you installed that modified version of bitsandbytes for rocm? Or..? What exactly did you do? Tks in advance.

@henrittp
Copy link

henrittp commented Sep 3, 2023

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

@lufixSch
Copy link

lufixSch commented Sep 3, 2023

@CNR0706 I managed to install a modified version of bitsandbytes for ROCm. Just follow this tutorial and you should be fine: YT Video. Therefore, you can leverage all of this lib offers (or almost everything, but anyways...)

I am not sure which version is newer, but I used https://github.com/agrocylo/bitsandbytes-rocm.
You need to build it from source with the following commands:

git clone git@github.com:agrocylo/bitsandbytes-rocm.git
cd bitsandbytes-rocm/
export PATH=/opt/rocm/bin:$PATH #Add ROCm to $PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
make hip
python setup.py install

Make sure the environment variables are also set, when you start the webui. Depending on your GPU you might need to change the GPU target or GFX Version.

@lufixSch
Copy link

lufixSch commented Sep 3, 2023

I have an AMD Radeon RX 5500 XT, is that good? My CPU spits fully completed things out within 6 seconds, when the CPU isn't stressed with OBS. Otherwise it takes around 35 seconds, if I could speed that up with my GPU I'd say it's worth the setup

Saying it takes 6 seconds is not that helpful to get an Idea of the performance you have. Because that depends on the length of the output. Take a look at the console. After every generation it spits out the generation speed in t/s. It also depends on what model you are using.

With my RX 6750 XT I got about 35 t/s with a 7B GPTQ Model

@lufixSch
Copy link

lufixSch commented Sep 3, 2023

@henrittp, @CNR0706 Did you try setting up AutoGPTQ? Did it work for you?

@RBNXI
Copy link

RBNXI commented Sep 7, 2023

I have AttributeError: 'NoneType' object has no attribute 'split' error too...
Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

@lufixSch
Copy link

lufixSch commented Sep 7, 2023

I have AttributeError: 'NoneType' object has no attribute 'split' error too...

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

Has ANYONE managed to run this with ROCM at all?, I'm starting to think that AMD is just useless for this stuff

Yes it worked really good on my PC until I broke my Installation with an update of the repository.
I am also running Stable diffusion on my PC with AUTOMATIC1111 and it works great. The AUTOMATIC1111 Setup is much easier, because the install script takes care of everything.

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

@RBNXI
Copy link

RBNXI commented Sep 7, 2023

I plan on improving the one click installer and/or the setup guide of the oobabooga webui for AMD to make the setup easier, if I ever get it running again :)

Cool, I'll be waiting for that then.

@RBNXI This is caused by bitsandbytes. You need to install a specific Version. Take a look at my comment above.

I saw it and tried to build it, but gave an error and got tired of trying stuff, I just thought "well, having to do so many steps and then having so many errors must mean it's just not ready yet...". But I could try another day when I have more time if I can fix that error, thanks.

@lufixSch
Copy link

lufixSch commented Sep 7, 2023

@RBNXI What Error did you get?
Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

"well, having to do so many steps and then having so many errors must mean it's just not ready yet..."

Yes I can understand that. The setup with NVIDIA is definitely easier.

@RBNXI
Copy link

RBNXI commented Sep 7, 2023

@RBNXI What Error did you get? Make sure the repo is located on a path without spaces. This seems to cause issues sometimes. And you need the rocm-hip-sdk package (at least on arch linux it is called that way)

I don't remember the error, I'm sorry. But I had a question for when I try again, the command you used to clone (git clone git@github.com:agrocylo/bitsandbytes-rocm.git) I remember it gave me an error, is it ok to just clone with the default link to the repo? It said the link you used is private or something like that

@lufixSch
Copy link

lufixSch commented Sep 7, 2023

Yes you can of course use the link from the repo directly. You probably mean this one: https://github.com/agrocylo/bitsandbytes-rocm.git

@RBNXI
Copy link

RBNXI commented Sep 8, 2023

I tried again and same result. I followed the installation tutorial, everything works fine, then run and get the split error, then I compiled bitsandbytes from that repo (now it worked) and then tried to run again and same split error again...
Edit: I managed to fix that error, now everything is apparently working, but I try to load a model and says: assert self.model is not None
Errors are never ending...

@containerblaq1
Copy link

installing bitsandbytes-rocm is the only way I've been able to make this work. The new install doesn't seem to work for the 7900XTX

@lufixSch
Copy link

lufixSch commented Sep 9, 2023

AMD Setup Step-by-Step Guide (WIP)

I finally got my setup working again (by reinstalling everything). Here is a step by step guide on how I got it running:

I tested all steps on Manjaro but they should work on other Linux distros. I have no Idea how the steps can be transferred to windows. Please leave a comment if you have a solution for Windows.

NOTE: At the start of each step I assume you have the terminal opened at the root of the project and that you have ROCm installed (on you need to install the rocm-hip-sdk package).
Furthermore consider creating an virtual environment (for example with miniconda or venv) and activating it

NOTE: If you have a 7xxx Gen AMD GPU please read the notes at the End of this guide

Step 1: Install dependencies (should be similar to the one click installer except the last step)

  1. pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
  2. pip install -r requirements_nocuda.txt
  3. export HSA_OVERRIDE_GFX_VERSION=10.3.0, export HCC_AMDGPU_TARGET=gfx1030 and export PATH=/opt/rocm/bin:$PATH (consider adding those lines to your .bash_profile, .zprofile or .profile as you need to run them every time you start the webui) (the gfx version might change depending on your GPU -> https://www.llvm.org/docs/AMDGPUUsage.html#processors)

If you get an error installing torch try running pip install -r requirements_nocuda.txt first. After this run the torch install command with the --force-reinstall option

Step 2: Fix bitsandbytes

This step did not work properly for me.
If you only want to get it working and don't want to use bitsandbytes on your GPU just run pip install bitsandbytes==0.38.1. I mostly run GPTQ models and this was fine for me.
It seems like the official bitsandbytes project is working on supporting ROCm but this will take a while until there is a working version

  1. mkdir repositories && cd repositories
  2. git clone https://github.com/broncotc/bitsandbytes-rocm.git (or another fork listed below)
  3. make hip
  4. python setup.py install

I found the following forks which should work for ROCm but got none of them working. If you find a working version please give some feedback.

Step 3: Install AutoGPTQ

This is only neccessary if you want to run GPTQ models

  1. mkdir repositories && cd repositories
  2. git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
  3. ROCM_VERSION=5.4.2 pip install -v .

If the installation fails try applying the patch provided by this article.
Run git apply with the patch provided below as argument

diff --git a/autogptq_extension/exllama/hip_compat.cuh b/autogptq_cuda/exllama/hip_compat.cuh
index 5cd2e85..79e0930 100644
--- a/autogptq_cuda/exllama/hip_compat.cuh
+++ b/autogptq_cuda/exllama/hip_compat.cuh
@@ -46,4 +46,6 @@ __host__ __forceinline__ hipblasStatus_t __compat_hipblasHgemm(hipblasHandle_t
 #define rocblas_set_stream hipblasSetStream
 #define rocblas_hgemm __compat_hipblasHgemm
 
+#define hipblasHgemm __compat_hipblasHgemm
+
 #endif

Step 4: Exllama

This is only neccessary if you want to use this model loader (Faster for GPTQ models)

  1. mkdir repositories && cd repositories
  2. git clone https://github.com/turboderp/exllama && cd exllama
  3. pip install -r requirments.txt

Step 4.5: ExllamaV2

ExllamaV2 works out of the box and will be installed automatically when installing requirements_nocuda.txt

If you get an error running ExllamaV2 try installing the nightly version of torch for ROCm5.6 (Should be released as stable version soon)

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6 --force-reinstall

Step 5: llama-cpp-python

Did not work for me today but it worked before (not sure what I did wrong today)

  1. CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DCMAKE_CXX_FLAGS='-fPIC'" FORCE_CMAKE=1 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ pip install llama-cpp-python

You might need to add the --no-cache-dir and --force-reinstall option if you installed llama-cpp-python before

I hope you can get it working with this guide :) I would appreciate some feedback on how this guide worked for you so we can create a complete and robust setup guide for AMD devices (and maybe even updated the one click installer based on the guide)

Notes on 7xxx AMD GPUs

Remember that you have to change the GFX Version for the envrionment variables: export HSA_OVERRIDE_GFX_VERSION=11.0.0, export HCC_AMDGPU_TARGET=gfx1100

As described by this article you should make sure to install/setup ROCm without opencl as this might cause problems with hip.

You also need to install the nighly version of torch for ROCm 5.6 instead of ROCm 5.4.2 (Should be released as stable version soon):

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6

@lufixSch
Copy link

lufixSch commented Sep 9, 2023

I try to load a model and says: assert self.model is not None Errors are never ending...

@RBNXI What model are you using? Which loader are you using? Usually this error means the loader failed to load the model.

As explained by my guide above you have to do extra steps for AutoGPTQ and Exllama/Exllama_HF.

Also note that with AutoGPTQ you often have to define the wbits and groupsize otherwise it will fail.

@RBNXI
Copy link

RBNXI commented Sep 9, 2023

Awesome guide, thanks, I'll try it when I can.
You mentioned that llama-cpp-python didn't work today and you don't know why. The model I was using was one of those, I think that there's currently a known bug that doesn't let us load llama models, could that be the problem?.
Also, I think my GPU doesn't appear here https://www.llvm.org/docs/AMDGPUUsage.html#processors
I have a RX 6600, is that one also 1030?
Edit: I was able to load the model with llama.cpp, but they run in CPU, do I have to do anything special for it to run in GPU? I launch it with this: python server.py --chat --api --auto-devices --n-gpu-layers 1000000000 --n_ctx 4096 --mlock --verbose --model mythomax-l2-13b.Q5_K_M.gguf
Don't tell me my GPU doesn't support ROCM please...

I tried with different --n-gpu-layers and same result.

Also, AutoGPTQ installation failed with

 Total number of replaced kernel launches: 4
  running clean
  removing 'build/temp.linux-x86_64-cpython-310' (and everything under it)
  removing 'build/lib.linux-x86_64-cpython-310' (and everything under it)
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.10' does not exist -- can't clean it
  removing 'build'
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects

Edit 2: I tried running a GPTQ model anyways, and it starts to load in VRam so the GPU is detected, but fails with:

Traceback (most recent call last):

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/ui_model_menu.py”, line 196, in load_model_wrapper

shared.model, shared.tokenizer = load_model(shared.model_name, loader)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 79, in load_model

output = load_func_map[loader](model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/models.py”, line 320, in AutoGPTQ_loader

return modules.AutoGPTQ_loader.load_quantized(model_name)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/modules/AutoGPTQ_loader.py”, line 57, in load_quantized

model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py”, line 108, in from_quantized

return quant_func(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py”, line 875, in from_quantized

accelerate.utils.modeling.load_checkpoint_in_model(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 1392, in load_checkpoint_in_model

set_module_tensor_to_device(

File “/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/accelerate/utils/modeling.py”, line 281, in set_module_tensor_to_device

raise ValueError(

ValueError: Trying to set a tensor of shape torch.Size([108, 640]) in “qzeros” (which has shape torch.Size([432, 640])), this look incorrect.

@lufixSch
Copy link

@RBNXI I found this Issue in the ROCm Repo discussing the RX 6600. According to this the RX 6600 should work. Usually for all 6xxx cards gfx1030 works. You could check if your GPU is working by running rocminfo and clinfo. Both commands should mention your GPU

llama.cpp probably runs on CPU because the prebuild python package is only build with CPU support. This is why you need to install it with the command from my guide.

Regarding AutoGPTQ: I think you just copied the last lines not the real error that broke the installation. Therefore I am not sure what the problem is. Maybe check your ROCm Version and change the ROCM_VERSION Variable accordingly.
Did you install the rocm-hip-sdk package (or whatever it is called on your distro). What Linux Distro are you running by the way?

I usually run the webui with python server.py and load the models using the GUI. This way the GUI usually chooses the default parameters by itself and it is easier to get it working. I also should note, that I run the newest version from the main branch. If you are using the one-click installer v1.5 your using the old requirements.txt which might explain where llama.cpp with cpu support is installed and why AutoGPTQ kind of works even though you did not install it.

@RBNXI
Copy link

RBNXI commented Sep 10, 2023

I don't have rocminfo installed, should I?. But clinfo shows my GPU indeed.

I'll try to reinstall again and see if it works now.

I did install rocm-hip-sdk. And I'm using Arch.

Also I'm running it in a miniconda environment, is that a problem?

Also the ROCM I have installed is from arch repository, I think it's 5.6.0, is that a problem? if I change the version in the command (pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 -> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6.0) it says

ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision

@RBNXI
Copy link

RBNXI commented Sep 10, 2023

I'm trying to install, still errors everywhere. First of all the bitsandbytes installation fails, so I have to use the pip one.
Then I try to install AutoGPTQ and can't, gives this error:
(tried with both ROCM versions)


(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.6.0 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-spo0oczo
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(textgen) [ruben@ruben AutoGPTQ]$ ROCM_VERSION=5.4.2 pip install -v .
Using pip 23.2.1 from /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/lib/python3.10/site-packages/pip (python 3.10)
Processing /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ
  Running command python setup.py egg_info
  Trying to compile auto-gptq for RoCm, but PyTorch 2.0.1+cu117 is installed without RoCm support.
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 255
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/miniconda/envs/textgen/bin/python -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-3151ooou
  cwd: /run/media/ruben/Prime/CharacterAI/oobabooga_linux/text-generation-webui/text-generation-webui/repositories/AutoGPTQ/
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

What am I doing wrong? I'm following the guide... this is so frustrating... could it be that I have to install ROCM 5.4.2 from some rare repository or compile it myself or something obscure like that? It says pytorch is installed without ROCM support? even if I installed it with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Edit: The 1 and 2 steeps in the install dependencies section are in different orders, if you run pip install requirements_nocuda first, it will install pytorch without ROCM support...

@Taylantz
Copy link

Taylantz commented Apr 2, 2024

@nktice

(My apologies for not posting this sooner) Here's a guide I wrote... it's for Ubuntu for AMD GPUs step-by-step commands. https://github.com/nktice/AMD-AI [ stable / rocm 5.7.3 ] https://github.com/nktice/AMD-AI/blob/main/ROCm6.0.md [ dev / rocm 6.0.2 ] I note it side-steps Oobabooga's install scripts, which I haven't used... - one place it differs is this newer bits and bytes ( compiled from source ) https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6

Can you confirm that flash_attn is working for you?

I have read and tried to replicate your write-up on ROCm 6.0.2 dev, but I am unable to get flash_attn to load through exllamav2. The updated comments inside your repo seem to indicate that you got flash_attn to work.

Tried the following on a AMD 7900XTX (tried most minimal setup to isolate components):

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HCC_AMDGPU_TARGET=gfx1100
export ROCM_PATH=/opt/rocm
  • Started with a fresh Ubuntu 22.04 minimal Desktop install
  • Updated the system and installed dependencies as well as header files for current kernel
  • Installed ROCm 6.0.2 via amdgpu-install with use cases graphics, rocm, hiplibsdk and added user to video and render groups
  • Created Miniconda3 env with python=3.11
  • installed necessary torch, torchtext, triton, pytorch-triton, pytorch-triton-rocm via nightly whls for rocm
  • cloned and compiled rocm/flash_attn repo and turboderp/exllamav2 repo successfully and installed them into the conda env with pip install .
  • downloaded a exl2 model from hf and tried running it with turboderp/exllamav2/examples/inference.py
  • flash_attn is not loaded
  • manually loading flash_attn and exllamav2 inside a python shell works without problem (import flash_attn, import exllamav2)
  • applied the version fix from this AMD thread #3759 (comment)
  • now exllamav2 tries to run flash_attn when loading the model but throws an error. The stack trace shows that the error is being thrown by torch saying that amd flash_attn only works on MI250X

Without flash_attn used the exl2 model from hf (8GB) uses aroud 60% VRAM so around 14 GB after being loaded.

I presume we are still bound by rocm/flash_attn not being updated to upstream see ROCm/flash-attention#35

@nktice
Copy link

nktice commented Apr 3, 2024

@Taylantz

Can you confirm that flash_attn is working for you?

As the guide does note - it had been giving me warnings, when it wasn't there...
such as when I was having errors compiling or other issues.
With all that resolved, I do not see it mentioned in the shell console.
I have yet to see signs that it's actually working... so thanks for checking.

I note that I found some versions of Ubuntu didn't like new card - so I needed to run newer versions, that included some support for 7900s... As such I have found 23.04 and .10 have been functional. [ Note there is a line that links the old packages in through etc sources ]
I'm also running ROCm 6.0.3 and it works just fine. ]

While I am at it, I'll maybe save folks some time and some frustration - The last time I tried ( as of a few days ago the newest won't work... I wasn't able to get through the installed for 24.04 - previously when I managed to get it installed drivers didn't work - the errors it gave, looked like they're kernel issues that weren't supported... now that may have changed with the newest ROCm drivers that are on their site...

@Taylantz

* Installed ROCm 6.0.2 via amdgpu-install with use cases graphics, rocm, hiplibsdk and added user to video and render groups

Do you mean you used their automatic install driver system?
[ My instructions show how to add their packages through apt / sources, this is different than using their installer script program system... I initially tried using their installer, but things got complex... ]

@Taylantz

* cloned and compiled rocm/flash_attn repo and turboderp/exllamav2 repo **successfully** and installed them into the conda env with pip install .

Do these appear in pip list in conda environment that you're running from?

@Taylantz

* downloaded a exl2 model from hf and tried running it with turboderp/exllamav2/examples/inference.py

Can you share which one(s)? Here are some that work for me, that I've tested -
https://huggingface.co/n810x
In particular this one is 2.8G so may work for you -
https://huggingface.co/n810x/Mistral-7B-Instruct-v0.2-3Bit-exl2
This model loaded brings memory up to near 7.5 G. [ viewed in nvtop... with the following output to the shell console... ]

00:07:55-764239 INFO     Loading "Mistral-7B-Instruct-v0.2-3bit-exl2"           
00:08:11-392993 INFO     LOADER: "ExLlamav2"                                    
00:08:11-393687 INFO     TRUNCATION LENGTH: 32768                               
00:08:11-394145 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model     
                         metadata)"                                             
00:08:11-394617 INFO     Loaded the model in 15.63 seconds.     

I tried the same thing with "no_flash_attn" checked, and get the same thing...
So it does appear that it isn't being used, although it is there.
Thanks for the heads up, I'd love to have it running, and will keep at it.

@Taylantz

* flash_attn is not loaded

Is that what it says in the TWG shell console like it's not installed?

Are you sure that you are running inside conda where it is installed?
I for example found that my script needed to be called with the source command - as the way shells work, they don't play well with conda from inside shell scripts.
So when I run it, I type in the following ( after the install )

source run.sh

@Taylantz

* applied the version fix from this [AMD thread #3759 (comment)](https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1889069311)

I went and tried this, reinstalled, and alas it still says the same thing.

@Taylantz

* now exllamav2 tries to run flash_attn when loading the model but throws an error. The stack trace shows that the error is being thrown by torch saying that amd flash_attn only works on MI250X

I have made a thread on exllamav2 github about the issue...
turboderp/exllamav2#397

@Taylantz

I presume we are still bound by rocm/flash_attn not being updated to upstream see ROCm/flash-attention#35

It appears it's stuck waiting for AMD folks to do an update...
I didn't see any motivation on either side to integrate it.
Will we see a tool to mimic updates like theirs that we could run on
the main project fork? Seems like that'd have many uses and help lots of folks.

@Taylantz
Copy link

Taylantz commented Apr 4, 2024

@nktice
Thanks for opening the issue with exllamav2 and getting clarification from turboderp!

Will we see a tool to mimic updates like theirs that we could run on
the main project fork? Seems like that'd have many uses and help lots of folks.

The ROCm Flash Attention repo is waiting for the next AMD Composable Kernel update to update their version of Flash Attention as mentioned here ROCm/flash-attention#35 (comment)

My guess is that without the updated kernel it would be quite hard to create a forked version to bump up the version?
Maybe in the future repos like https://github.com/tinygrad/tinygrad which compose their own agnostic kernels could be useful (because they skip amd user space drivers and kernal space drivers).

I am hoping that it will be soon to fully use my current setup of 3x7900xtx

@RSAStudioGames
Copy link

Could anyone help me out with this?
I'm on an R7525 with 3x Mi100s using Ubuntu Server 22.04

21:15:37-307119 INFO     llama.cpp weights detected: "models/dolphin-2_6-phi-2.Q6_K.gguf"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 3 ROCm devices:
  Device 0: AMD Instinct MI100, compute capability 9.0, VMM: no
  Device 1: AMD Instinct MI100, compute capability 9.0, VMM: no
  Device 2: AMD Instinct MI100, compute capability 9.0, VMM: no
Segmentation fault (core dumped)

No matter what type of model I load, I get a fault of some kind unless I load to CPU only. I've been trying to get this working for hours and I have no clue what's going wrong.

@RealArtsn
Copy link

These are my steps for running on Arch (ROCm 6.0)

I'm using a 6700xt.
Be sure to install rocm-core and rocm-hip-sdk.

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui

Remove the # from the following lines in one_click.py:

# os.environ["ROCM_PATH"] = '/opt/rocm'
# os.environ["HSA_OVERRIDE_GFX_VERSION"] = '10.3.0'

Also replace the index url for PyTorch in one_click.py for ROCm 6.0:

sed -i 's/TORCH_VERSION = "2.2.1"/TORCH_VERSION = "2.4.0.dev20240413+rocm6.0"/' one_click.py
sed -i 's/0.17.1/0.19.0.dev20240413+rocm6.0/g' one_click.py
sed -i 's/TORCHAUDIO_VERSION = "2.2.1"/TORCHAUDIO_VERSION = "2.2.0.dev20240413+rocm6.0"/' one_click.py
sed -i 's/whl\/rocm5.6/whl\/nightly\/rocm6.0/g' one_click.py

I'm using exllamav2, and I have had the most success with building from the repository.

Prevent exllamav2 from automatically installing:

sed -i '/exllamav2/d' ./requirements_amd.txt

Run the install script for text-generation-webui and exit once it's done:

./start_linux.sh

Enter the conda environment:

./cmd_linux.sh

Install exllamav2:

git clone https://github.com/turboderp/exllamav2.git
pip install -r exllamav2/requirements.txt
pip install ./exllamav2

From here I can exit the conda environment and use the program normally.

@Beinsezii
Copy link

As of a few days ago the ROCm BitsAndBytes passes tests on my machine and no longer spits inf when I tested on Llama 3.

I built a wheel with simply

git clone --recurse https://github.com/ROCm/bitsandbytes --branch rocm_enabled --depth 1
cd bitsandbytes
cmake -DCOMPUTE_BACKEND=hip -S .  # hipcc defaults to march=native
make
pip wheel --no-deps .

Which resulted in this wheel that passes all un-skipped tests on my 7900 XTX
bitsandbytes-0.44.0.dev0-cp311-cp311-linux_x86_64.zip

A lot of tests are still skipped on ROCm so it may not work with all models. It's also wicked slow, like 1/4 of FP16 speed. But if you're vram starved and can't use a different backend it works at least.

It'd be nice if someone on RDNA 2 could try. I don't know if the blasLT lib will compile on those cards. If it works maybe oobabooga could set up an action.

@dgdguk
Copy link

dgdguk commented Apr 24, 2024

Given #5921, which effectively breaks llama.cpp support on AMD/Intel cards, I somehow doubt that @oobabooga is particularly interested in setting up more actions.

For those who want a working GPU accelerated llama.cpp, the following commands should sort it on Linux:

git checkout 26d822f # go to version with working llama.cpp
./cmd_linux.sh
python -c "import one_click; one_click.update_requirements(pull=False)"  # Update dependencies without updating to broken version

(Edited to go to a more stable commit)

@Beinsezii
Copy link

Why so verbose?
CC='/opt/rocm/llvm/bin/clang' CXX='/opt/rocm/llvm/bin/clang++' CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=native" pip install -U llama-cpp-python

@dgdguk
Copy link

dgdguk commented Apr 24, 2024

I'm assuming not everyone has the full ROCM stack set up on their computer - the entire point of the project is it's supposed to be self-contained.

@Beinsezii
Copy link

Beinsezii commented Apr 24, 2024

If you don't have the ROCm SDK you'll run into issues with other libs anyways. Its installable through the native package manager on most distros nowadays. Using your distro's ROCm and the appropriate nightly/RC pytorch to match completely fixes all the random page faults.

@dgdguk
Copy link

dgdguk commented Apr 24, 2024

I think it really depends on what you're doing. Llama.cpp support was pretty darn solid, as far as I could tell, and I image that quite a few people would primarily be using that.

@oobabooga
Copy link
Owner Author

llama.cpp wheels for AMD used to be provided by jllllll. After he stopped updating his wheels, I started running his workflows myself in a fork. At first this didn't work at all due to rate limit errors, but then I added long timeouts of 30 minutes between the sub-jobs of the workflow and it started working reliably as an overnight run of several hours.

That changed a week ago when GitHub stopped uploading the compiled wheels for my jobs, possibly because I am using too much storage (with all the python, CUDA, ROCm, AVX, llama-cpp-python combinations, that must have added up to many GB over time). Debugging and maintaining this takes exponentially long due to how unreliable Github Actions are, so I have upstreamed the responsibility to the main llama-cpp-python repository simply due to lack of time.

I think that abetlen would probably be open to ROCm workflows if someone wants to be a hero and come up with something reliable to submit in a PR (https://github.com/abetlen/llama-cpp-python).

@dgdguk
Copy link

dgdguk commented Apr 25, 2024

How likely is it that upstreaming those workflows causes the same issues to befall abetlen, assuming it is some kind of resource limit? They're already pushing 50 binaries for their releases.

@Beinsezii
Copy link

Worth mentioning that Vulkan is pretty good too. Compiling llama.cpp with -DLLAMA_VULKAN=1 I only lose like 10% speed but my GPU pulls 150w instead of 330w. ROCm is way more bloated than the Mesa drivers it seems. Older GPUs may work even better because ROCm only really supports the RDNA 3 cards.

@netrunnereve
Copy link
Contributor

How likely is it that upstreaming those workflows causes the same issues to befall abetlen, assuming it is some kind of resource limit? They're already pushing 50 binaries for their releases.

Likely that'll happen as well. You could work around the compute limits by having one job build multiple wheels sequentially, but that would be annoying to manage and you might still hit the storage limits. Personally I feel that wheels should only be provided for Windows users due to the difficulty in compiling for that platform, Mac and Linux users can build it themselves. It's not hard to do if you have all the dependencies installed.

@Beinsezii
Copy link

Most of the wheels can be built with a single command on Linux assuming torch and nvcc/hipcc are visible so maybe it could be allocated to an install script?

@dgdguk
Copy link

dgdguk commented Apr 29, 2024

@oobabooga Following on from @Beinsezii's comment - Is there a case for modularising this project into two components? Specifically:

a) The Gradio application
b) The Python distribution you've set up to run it

I'm specifically thinking that this may be worth doing because if the Gradio app is packaged separately, it could be just another Python wheel published on PyPI. That would potentially make things quite a lot easier on your end, because then you can just follow the same route as llama-cpp-python: recommend build from source if possible, but support a specific set of hardware / software with prebuilt binaries.

@oobabooga
Copy link
Owner Author

Personally I feel that wheels should only be provided for Windows users due to the difficulty in compiling for that platform, Mac and Linux users can build it themselves. It's not hard to do if you have all the dependencies installed.

@netrunnereve I agree that compiling wheels is not very difficult on Linux/Mac, but my previous experience with this project has shown that compiling anything at all is a daunting task for users on all platforms.

@dgdguk that's not a bad idea but I think it would limit the project too much for the reason above.

I thought about it and reached the conclusion that not providing custom llama.cpp wheels is too much functionality loss, so I took the third route: simply paying GitHub for the excess storage/compute so that I can continue running the jobs. With that I managed to compile a new version successfully again (#5964).

@dgdguk
Copy link

dgdguk commented Apr 30, 2024

@oobabooga I think splitting the Gradio app out of the Python distribution stuff might be a good idea regardless - if nothing else, it makes things substantially easier for someone else to step in and provide builds for a different platform. Right now, any such support has to step around your distribution code.

For example, we're currently staring down ARM AI laptops, AMD's XDNA accelerators, and a whole bunch of other consumer facing AI accelerators - is it really reasonable for your support matrix to have everything in it? I don't necessarily think it matters where you draw the box around what you want to support, but I think it is worth acknowledging the box exists and that some hardware is likely to be out of what you want to support. At least until some proper cross-vendor APIs come into existence.

@Beinsezii
Copy link

I thought about it and reached the conclusion that not providing custom llama.cpp wheels is too much functionality loss, so I took the third route: simply paying GitHub for the excess storage/compute so that I can continue running the jobs. With that I managed to compile a new version successfully again (#5964).

@oobabooga Why not just use the official wheels for everything you can and only compile your own for unsuppported platforms? Might avoid the github storage issue.

@netrunnereve
Copy link
Contributor

netrunnereve commented May 1, 2024

I thought about it and reached the conclusion that not providing custom llama.cpp wheels is too much functionality loss, so I took the third route: simply paying GitHub for the excess storage/compute so that I can continue running the jobs. With that I managed to compile a new version successfully again (#5964).

I took a look at your releases page and you've got like 4000 😮 wheels on there with some of the big ones over 50MB in size. Many of them are outdated and having a script to prune them will save a lot of space.

Also Github Actions per minute billing gets expensive pretty fast with those 30 minute CUDA Windows builds, and it might be worth looking into a local CI runner for those long builds and rely on free minutes only for the stuff that you can't build yourself. Since you don't need to spin up a new VM and install CUDA every time on a local machine it should complete much faster.

@MaelHan
Copy link

MaelHan commented May 2, 2024

So we're on May 2024, Ollama and even LMStudio (and others) use the ROCm technology on Windows for AMD GPU but not Oobabooga who need Linux ?

@dgdguk
Copy link

dgdguk commented May 17, 2024

@oobabooga Is there a particular holdup or lack of capability to update to ROCm 6.0 (as in, it seems like you might not have access to AMD hardware)? I'm asking because I've been playing around with some ROCm 6.0 stuff and it seems that it's a pre-req to closing a lot of the feature gaps. For example, AMDs bitsandbytes version seems to work, as well as LORA training.

@userbox020
Copy link

@dgdguk have you tried use vulkan or kompute drivers? i bet they will work with all major ooba features with all AMD gpus even the old ones. Still surprice how they emulate bit level instructions

@dgdguk
Copy link

dgdguk commented May 21, 2024

@userbox020 No, but that's not actually an option for this project. As I mentioned before, this project would likely be more useful if the WebUI was separated from the distribution needed to run it. Right now, I think one of the problems is that the WebUI has a very proscribed environment, which discourages people from trying things - and I think this'll get worse with NPUs.

@druggedhippo
Copy link

Vulkan is already supported by llama.cpp on windows, it has both Kompute and Vulkan builds available https://github.com/ggerganov/llama.cpp/releases/tag/b2979.

You just need to overwite the DLLs in the site-package for llama_cpp with the vulkan build. Copy the other DLLs into the env dir, and it works perfectly (once you reduce context and maybe use a smaller model as memory issues are a problem).

There isn't really any reason the Vulkan llama DLLs can't be downloaded by the installer automatically and patched into the site-packages if the user choose to do so.

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 6700 XT | uma: 0 | fp16: 1 | warp size: 64
llm_load_tensors: ggml ctx size =    0.37 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:        CPU buffer size =    87.89 MiB
llm_load_tensors:    Vulkan0 buffer size =  7412.96 MiB

I wouldn't suggest Kompute, whilst it can load models, it is still extremely buggy and only works on a limited set of models(ggerganov/llama.cpp#5540 (comment)) . That could just be my video card though.

I havn't messed with using it for training, just plain generation, but Vulkan definitely works for generation, and it's definitely quicker than CPU for me.

@dgdguk
Copy link

dgdguk commented May 23, 2024

@druggedhippo While that's may be true, I'd point out that anything that starts with "overwrite the DLLs" is necessarily not supported by this project. Of course, it may be worth a feature request on it: if the Vulkan code path works sufficiently well, then in principle that provides a common target and saves @oobabooga having to target 20+ different configs.

Of course, I'll point out that "runs faster than CPU" is an extremely low bar to clear.

@userbox020
Copy link

@userbox020 No, but that's not actually an option for this project. As I mentioned before, this project would likely be more useful if the WebUI was separated from the distribution needed to run it. Right now, I think one of the problems is that the WebUI has a very proscribed environment, which discourages people from trying things - and I think this'll get worse with NPUs.

I think instead of the word discourages it must be mean ignorant and lazy to find a solution by themselves. This is an opensource code bro, you very welcome to propose any PR for improvements

@dgdguk
Copy link

dgdguk commented May 24, 2024

@userbox020 Indeed, but the idea of modularizing web-ui has already been quashed by @oobabooga earlier in this thread, so presumably any PR along those lines will be rejected, and unfortunately I don't have the time to maintain a fork to any level of acceptable quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests