Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

Closed
1 task done
plhosk opened this issue Mar 18, 2023 · 33 comments
Closed
1 task done
Labels
bug Something isn't working stale

Comments

@plhosk
Copy link

plhosk commented Mar 18, 2023

Describe the bug

Link to issue in GPTQ-for-LLaMa repo: qwopqwop200/GPTQ-for-LLaMa#59 (comment)

When running python setup_cuda.py install in GPTQ-for-LLaMa, I'm now getting this error.

Traceback (most recent call last):
  File "~/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 6, in <module>
    ext_modules=[cpp_extension.CUDAExtension(
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
    library_dirs += library_paths(cuda=True)
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
    if (not os.path.exists(_join_cuda_home(lib_dir)) and
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
    raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install

Screenshot

No response

Logs

n/a

System Info

Linux with nvidia GPU
@plhosk plhosk added the bug Something isn't working label Mar 18, 2023
@BarfingLemurs
Copy link

I just installed using this method, setup.py didn't work for me
#177 (comment)
its pre-assembled

@plhosk
Copy link
Author

plhosk commented Mar 18, 2023

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

That may work for Windows but my issue is in Linux

@maluhia
Copy link

maluhia commented Mar 18, 2023

I'm getting this as well under WSL Ubuntu, after trying to set up 4-bit

@oobabooga
Copy link
Owner

oobabooga commented Mar 19, 2023

I can confirm the issue. The problem is that nvcc is not available.

@plhosk
Copy link
Author

plhosk commented Mar 19, 2023

See comment here for possible workaround: qwopqwop200/GPTQ-for-LLaMa#59 (comment)

@oobabooga
Copy link
Owner

oobabooga commented Mar 19, 2023

I have managed to install nvcc with

conda install -c conda-forge cudatoolkit-dev

The command above takes some 10 minutes to run and shows no progress bar or updates along the way.

This allows me to run

python setup_cuda.py install

for GPTQ-for-LLaMa installation, but then python server.py --listen --model llama-7b --gptq-bits 4 fails with

raise RuntimeError('Attempting to deserialize object on a CUDA
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are > running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

@LTSarc
Copy link

LTSarc commented Mar 19, 2023

I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev

So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step.

@oobabooga oobabooga pinned this issue Mar 19, 2023
@oobabooga
Copy link
Owner

oobabooga commented Mar 19, 2023

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

  1. Set up a clean textgen environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found  #400 (comment)
  2. Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
  1. Restart the computer/WSL.
  2. Remove the existing GPTQ-for-LLaMa folder:
rm -rf repositories/GPTQ-for-LLaMa/
  1. Clone GPTQ-for-LLaMa again
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
  1. Install GPTQ-for-LLaMa:
cd GPTQ-for-LLaMa
python setup_cuda.py install
  1. 4-bit now works:
python server.py --listen --model llama-7b  --gptq-bits 4

@LTSarc
Copy link

LTSarc commented Mar 19, 2023

Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs.

Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs).

@oobabooga
Copy link
Owner

Hopefully all this will become more streamlined in the future.

@xNul
Copy link
Contributor

xNul commented Mar 19, 2023

I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work.

Windows (no WSL) LLaMA install/setup (normal/8bit/4bit)

Normal & 8bit LLaMA Setup

  1. Install Anaconda
  2. Install Git for Windows
  3. Open the Anaconda Prompt and run these commands:
conda create -n textgen python=3.10.9
conda activate textgen
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
  1. Follow the instructions here to fix the bitsandbytes library for Windows.

4bit LLaMA Setup

Run these commands:

conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the cl compiler to the environment may help. The last command needs a C++ compiler and an Nvidia CUDA compiler.

Downloading LLaMA Models

  1. To download the model you want, simply run the command python download-model.py decapoda-research/llama-Xb-hf where X is the size of the model you want to download like 7 or 13.
  2. Once downloaded, you have to fix the outdated config of the model. Open models/llama-Xb-hf/tokenizer_config.json and change LLaMATokenizer to LlamaTokenizer.
  3. If you only want to run a normal or 8bit model, you're done. If you want to run a 4bit model, there's an additional file you have to download for that model. There is no central location for all of these files at the moment. 7B can be found here. 13B can be found here. 30B can be found here. This one might work for 65B.
  4. Once downloaded, move the .pt file into model/llama-Xb-hf and you should be done.

Running the LLaMA Models

Normal LLaMA Model

python server.py --model llama-Xb-hf

8bit LLaMA Model

python server.py --model llama-Xb-hf --load-in-8bit

4bit LLaMA Model

python server.py --model llama-Xb-hf --gptq-bits 4

@jllllll
Copy link
Contributor

jllllll commented Mar 19, 2023

I would recommend changing the pytorch install instructions to:

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install.
You could even combine it with the environment creation:

conda create -n textgen pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with cuda-toolkit -c 'nvidia/label/cuda-11.7.0' or cuda-toolkit -c 'nvidia/label/cuda-11.7.1'

I haven't tried it yet, but it is possible to install just nvcc with: cuda-nvcc -c 'nvidia/label/cuda-11.7.0'

@cyperium
Copy link

When doing python setup_cuda.py install I get:

(textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] Det går inte att hitta filen
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
error: [WinError 2] Det går inte att hitta filen

(det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point.

I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration.

@cyperium
Copy link

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

Got it to work using this method.

@BarfingLemurs
Copy link

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

@jllllll
Copy link
Contributor

jllllll commented Mar 19, 2023

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

You can build the wheel yourself for future use with: python setup_cuda.py bdist_wheel
This will place the wheel in a dist folder next to setup_cuda.py.

@BarfingLemurs
Copy link

Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process.

@jllllll
Copy link
Contributor

jllllll commented Mar 19, 2023

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

@BarfingLemurs
Copy link

BarfingLemurs commented Mar 19, 2023

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

sorry about the weird paste in advance, I don't know what it's doing

(textgen) ubuntu@DESKTOP-LMFT8S4:/text-generation-webui/repositories/GPTQ-for-LLaMa$ python setup_cuda.py bdist_wheel No CUDA runtime is found, using CUDA_HOME='/home/ubuntu/miniconda3/envs/textgen' running bdist_wheel /home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in setup( File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run self.run_command(cmd_name) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 485, in build_extensions compiler_name, compiler_version = self._check_abi() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 869, in _check_abi _, version = get_compiler_abi_compatibility_and_version(compiler) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 337, in get_compiler_abi_compatibility_and_version if not check_compiler_ok_for_platform(compiler): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 291, in check_compiler_ok_for_platform which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1. (textgen) ubuntu@DESKTOP-LMFT8S4:/text-generation-webui/repositories/GPTQ-for-LLaMa$

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

(textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 11, in import chardet ModuleNotFoundError: No module named 'chardet' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/server.py", line 10, in import gradio as gr File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/init.py", line 3, in import gradio.components as components File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/components.py", line 34, in from gradio import media_data, processing_utils, utils File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/processing_utils.py", line 19, in import requests File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/init.py", line 45, in from .exceptions import RequestsDependencyWarning File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/exceptions.py", line 9, in from .compat import JSONDecodeError as CompatJSONDecodeError File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 13, in import charset_normalizer as chardet File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/init.py", line 23, in from charset_normalizer.api import from_fp, from_path, from_bytes, normalize File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in from charset_normalizer.md import mess_ratio File "charset_normalizer/md.py", line 5, in ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/constant.py)

@jllllll
Copy link
Contributor

jllllll commented Mar 19, 2023

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

Here is a freshly compiled wheel:
quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl.zip
Make sure that you performed both of the pip install -r requirements.txt steps. You may need to install cuda into wsl using these commands:

wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run

Make sure not to use the driver installation option. That isn't for wsl.
It also wouldn't hurt to try restarting wsl manually with wsl --shutdown in powershell or cmd.

@BarfingLemurs
Copy link

@jllllll I really appreciate that, thanks.

@oobabooga oobabooga unpinned this issue Mar 19, 2023
@NenadZG
Copy link

NenadZG commented Mar 20, 2023

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

@trrahul
Copy link

trrahul commented Mar 20, 2023

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

I am also getting the same error

@MarvinLong
Copy link

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Thanks,this help me a lot. I had been stack with this problem for a day now

@ncoder
Copy link

ncoder commented Mar 20, 2023

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly.

But then you get this error:

Error(s) in loading state_dict for LlamaForCausalLM:
	Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", 
...

Looks like were running the wrong version of GPTQ for the data we have.

@gianfra-t
Copy link

To solve the load_quant error, which is indeed a problem with a new version of GPTQ, you need to roll back. See: #445 (comment)

Also in my case I had to change the name of the tokenizer in tokenizer_config.json to "tokenizer_class": "LlamaTokenizer". That is I think an update in the transformer's repo class.

@NenadZG
Copy link

NenadZG commented Mar 21, 2023

Thank you, the problem was a new version of GTPQ, as you said. I rolled back as in #445(comment) . After that I got this error:
ImportError: cannot import name 'LLaMAConfig' from 'transformers'.
Then I deleted my environment and reinstalled everything, and now it works!

The whole process of installation I did was:

conda create -n textgen
conda activate textgen
conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

After that I changed "LLaMATokenizer" to "LlamaTokenizer" in tokenizer_config.json file.

@xNul
Copy link
Contributor

xNul commented Mar 21, 2023

Thanks @NenadZG. I've updated my instructions with your GPTQ rollback fix.

@ncoder
Copy link

ncoder commented Mar 21, 2023 via email

@xNul
Copy link
Contributor

xNul commented Mar 21, 2023

Good to know that's possible. I'll update my instructions when all versions of the model have become requantized.

@jllllll
Copy link
Contributor

jllllll commented Apr 2, 2023

the repo has changed. which branch should we use now?

The cuda branch. However, I would recommend using oobabooga's fork for the time being: #708 (comment)

The webui is currently not updated to work with the latest version of GPTQ-for-LLaMa.

@benkuku
Copy link

benkuku commented May 16, 2023

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

  1. Set up a clean textgen environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found  #400 (comment)
  2. Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
  1. Restart the computer/WSL.
  2. Remove the existing GPTQ-for-LLaMa folder:
rm -rf repositories/GPTQ-for-LLaMa/
  1. Clone GPTQ-for-LLaMa again
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
  1. Install GPTQ-for-LLaMa:
cd GPTQ-for-LLaMa
python setup_cuda.py install
  1. 4-bit now works:
python server.py --listen --model llama-7b  --gptq-bits 4

great!

@github-actions github-actions bot added the stale label Dec 3, 2023
Copy link

github-actions bot commented Dec 3, 2023

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@github-actions github-actions bot closed this as completed Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests