Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not working since commit 31f04dc bitsandbytes problem #614

Closed
1 task done
Marc899 opened this issue Mar 28, 2023 · 16 comments
Closed
1 task done

not working since commit 31f04dc bitsandbytes problem #614

Marc899 opened this issue Mar 28, 2023 · 16 comments
Labels
bug Something isn't working stale

Comments

@Marc899
Copy link

Marc899 commented Mar 28, 2023

Describe the bug

Starting with commit 31f04dc I am getting a lot of CUDA errors related to bitsandbytes when running the start-webui.bat

RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

Reverting to 966168b makes it run again.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Update to the latest version and run the start-webui.bat

Screenshot

No response

Logs

RuntimeError:
        CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
        If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
        https://github.com/TimDettmers/bitsandbytes/issues

System Info

Windows 11 64bit
RTX 3090
@Marc899 Marc899 added the bug Something isn't working label Mar 28, 2023
@oivio
Copy link

oivio commented Mar 28, 2023

I can confirm the same issue on my side
Windows 10
RTX4080
CUDA 11.7.0_516.01

@ARandomUserFromGithub
Copy link

Same. I get `Starting the web UI...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('A')}
warn(msg)
A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: A:\TGWU---\installer_files\env did not contain libcudart.so as expected! Searching further paths...
warn(msg)
A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/A'), WindowsPath('file'), WindowsPath('/TGWU---/installer_files/env/etc/xml/catalog')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "A:\TGWU---\text-generation-webui\server.py", line 13, in
from modules import chat, shared, training, ui
File "A:\TGWU---\text-generation-webui\modules\training.py", line 11, in
from peft import (LoraConfig, get_peft_model, get_peft_model_state_dict,
File "A:\TGWU---\installer_files\env\lib\site-packages\peft_init_.py", line 22, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
File "A:\TGWU---\installer_files\env\lib\site-packages\peft\mapping.py", line 16, in
from .peft_model import (
File "A:\TGWU---\installer_files\env\lib\site-packages\peft\peft_model.py", line 31, in
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
File "A:\TGWU---\installer_files\env\lib\site-packages\peft\tuners_init_.py", line 20, in
from .lora import LoraConfig, LoraModel
File "A:\TGWU---\installer_files\env\lib\site-packages\peft\tuners\lora.py", line 36, in
import bitsandbytes as bnb
File "A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\autograd_init
.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "A:\TGWU---\installer_files\env\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
Press any key to continue . . .`

@RJSprod
Copy link

RJSprod commented Mar 28, 2023

I'm facing the exact same issue, windows 11 w/3090. The libcuda and libcudart files requested don't seem to exist on my system.

@bmoconno
Copy link
Contributor

I was having the same issue, once I found this and saw everyone having it seemed to be using windows, I thought this was probably the culprit, it fixed the issue for me:

from how_to_install_llama_8bit_and_4bit

  1. Download libbitsandbytes_cuda116.dll and put it in C:\Users\xxx\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\
  1. In \bitsandbytes\cuda_setup\main.py search for: if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None and replace with: if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None
  1. In \bitsandbytes\cuda_setup\main.py search for this twice: self.lib = ct.cdll.LoadLibrary(binary_path) and replace with: self.lib = ct.cdll.LoadLibrary(str(binary_path))

After re-doing those steps, I ran into another issue that I believe was caused by #615, it was complaining about kernel_switch_threshold not being a valid argument or something while trying to use the llama 30b 128 model. To fix this I modified the modules\GPTQ_loader.py file by doing the follow:

  • re-add import llama line under sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa")))
  • replace load_quant = _load_quant with load_quant = llama.load_quant
  • replace model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) with model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize)

It's possible you won't need to modify the modules\GPTQ_loader.py, so try to load your model before making those changes.

@oobabooga oobabooga mentioned this issue Mar 29, 2023
1 task
@oobabooga
Copy link
Owner

8-bit should work more reliably with the new one-click installer

https://github.com/oobabooga/text-generation-webui#one-click-installers

@hdkiller
Copy link

hdkiller commented Mar 29, 2023

I had similar issue on linux, probably caused by #615 since if I revert changes as @bmoconno mentioned it loads llama.

/home/hdkiller/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/hdkiller/miniconda3/envs/textgen did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/hdkiller/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda121.so...
Loading llama-7b-hf...
Found models/llama-7b-4bit.safetensors
Traceback (most recent call last):
  File "/home/hdkiller/text-generation-webui/server.py", line 273, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/hdkiller/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/home/hdkiller/text-generation-webui/modules/GPTQ_loader.py", line 113, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/home/hdkiller/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'kernel_switch_threshold'`

So I had to re-install GPTQ-for-LLaMa in ./repositories and then it works.

@Azeirah
Copy link

Azeirah commented Mar 29, 2023

I have a similar error @hdkiller,

Loading llama-7b-hf...
CUDA extension not installed.
Found models/llama-7b-4bit.pt
Traceback (most recent call last):
  File "/home/lb/Downloads/LLaMA/text-generation-webui/server.py", line 273, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/lb/Downloads/LLaMA/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/home/lb/Downloads/LLaMA/text-generation-webui/modules/GPTQ_loader.py", line 113, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/home/lb/Downloads/LLaMA/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'

How did you "reinstall" gptq-for-llama? I did

cd repositories/GPTQ-for-LLaMa
git pull
pip install -r requirements.txt

Still getting the same error with python server.py --model_type llama --wbits 4 --groupsize 128

Edit:

It did work after removing the GPTQ-for-LLama directory and literally performing a new git clone and pip install. No idea why.

@remghoost
Copy link

remghoost commented Mar 29, 2023

So I had to re-install GPTQ-for-LLaMa in ./repositories and then it works.

It did work after removing the GPTQ-for-LLama directory and literally performing a new git clone and pip install. No idea why.

This worked for me as well.

Seems like a fairly common occurrence. Happens every few commits.
Nice to know this fix works.
I've had this problem before (a few weeks ago) and literally spent days trying to fix it.

Might make myself a quick script to automate this for fixing it in the future. haha.

edit - hmm. I thought it did, but maybe it didn't....?

edit2 - Okay, so it says that it won't use my GPU, yet my GPU clock speed still spikes when I generate text and nvidia-smi shows that my VRAM is populated with the model. So maybe it's just lying....? I'm using the ozcur/alpaca-native-4bit. It's definitely using my GPU though.

@davidliudev
Copy link

davidliudev commented Mar 30, 2023

Same issue for me too under Windows 11. Tried removing the GPTQ folder and re-pull and reinstall but it is not working. Had to temporarily revert to 966168b

@StefanDanielSchwarz
Copy link
Contributor

StefanDanielSchwarz commented Mar 30, 2023

Same here, fresh WSL install, got the "TypeError: make_quant() got an unexpected keyword argument 'faster'" message when trying to load ozcur's alpaca-native-4bit.

@oobabooga
Copy link
Owner

It's necessary to clone the GPTQ-for-llama repository with

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda

now. The default branch in that repository has been changed for one that breaks backward compatibility.

This has been updated in the one-click installer, which must be re-downloaded manually (just the install.bat script) oobabooga/one-click-installers@85e4ec6

@StefanDanielSchwarz
Copy link
Contributor

Excellent, that fixes it! 👍 Glad to be able to use the latest version of your text-generation-webui again (and special thanks for merging my PR ❤).

@hdkiller
Copy link

hdkiller commented Apr 1, 2023

Seems like something going on in that cuda branch of GPTQ-for-LLaMa

I had to revert to this commit.

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda
git reset --hard b820805
python setup_cuda.py install

This is the commit which removed a parameter from the function definition of make_quant which throws the error @Azeirah had.

This way now I am able to CPU offload llama:

python3 server.py --listen --wbits 4 --groupsize 128 --pre_layer 30 --model llama-7b-4bit-128g

@StefanDanielSchwarz
Copy link
Contributor

StefanDanielSchwarz commented Apr 1, 2023

Confirming that - both the problem and the workaround. Thanks @hdkiller for figuring out the commit that broke compatibility (f1af89a).

Here's what my WSL console reported before I reverted:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: ~/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary ~/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading ozcur_alpaca-native-4bit...
Traceback (most recent call last):
  File "~/text-generation-webui/server.py", line 275, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "~/text-generation-webui/modules/models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "~/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "~/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'

The last working commit is 608f3ba. Reverting to that made text-generation-webui work again:

cd repositories/GPTQ-for-LLaMa

git reset --hard 608f3ba71e40596c75f8864d73506eaf57323c6e

pip install -r requirements.txt
python setup_cuda.py install
cd ../..

@oobabooga
Copy link
Owner

Please use my fork of GPTQ-for-LLaMa. It corresponds to commit a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773 in the cuda branch.

# activate the conda environment
conda activate textgen

# remove the existing GPT-for-LLaMa
cd text-generation-webui/repositories
rm -rf GPTQ-for-LLaMa
pip uninstall quant-cuda

# reinstall
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

I will keep using this until qwopqwop's branch stabilizes. Upstream changes will not be supported. This works with @USBhost's torrents for llama that are linked here.

@github-actions github-actions bot added the stale label Nov 26, 2023
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests