Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

plhosk · 2023-03-18T19:30:40Z

Describe the bug

Link to issue in GPTQ-for-LLaMa repo: qwopqwop200/GPTQ-for-LLaMa#59 (comment)

When running python setup_cuda.py install in GPTQ-for-LLaMa, I'm now getting this error.

Traceback (most recent call last):
  File "~/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 6, in <module>
    ext_modules=[cpp_extension.CUDAExtension(
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
    library_dirs += library_paths(cuda=True)
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
    if (not os.path.exists(_join_cuda_home(lib_dir)) and
  File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
    raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install

Screenshot

No response

Logs

n/a

System Info

Linux with nvidia GPU

The text was updated successfully, but these errors were encountered:

BarfingLemurs · 2023-03-18T19:45:00Z

I just installed using this method, setup.py didn't work for me
#177 (comment)
its pre-assembled

plhosk · 2023-03-18T20:14:56Z

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

That may work for Windows but my issue is in Linux

maluhia · 2023-03-18T21:10:12Z

I'm getting this as well under WSL Ubuntu, after trying to set up 4-bit

oobabooga · 2023-03-19T01:25:53Z

I can confirm the issue. The problem is that nvcc is not available.

plhosk · 2023-03-19T01:36:18Z

See comment here for possible workaround: qwopqwop200/GPTQ-for-LLaMa#59 (comment)

oobabooga · 2023-03-19T02:00:26Z

I have managed to install nvcc with

conda install -c conda-forge cudatoolkit-dev

The command above takes some 10 minutes to run and shows no progress bar or updates along the way.

This allows me to run

python setup_cuda.py install

for GPTQ-for-LLaMa installation, but then python server.py --listen --model llama-7b --gptq-bits 4 fails with

raise RuntimeError('Attempting to deserialize object on a CUDA
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are > running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

LTSarc · 2023-03-19T02:22:04Z

I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev

So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step.

oobabooga · 2023-03-19T02:39:52Z

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

Set up a clean textgen environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found #400 (comment)
Run this command that takes 10 minutes to finish without any progress bar:

conda activate textgen
conda install -c conda-forge cudatoolkit-dev

Restart the computer/WSL.
Remove the existing GPTQ-for-LLaMa folder:

rm -rf repositories/GPTQ-for-LLaMa/

Clone GPTQ-for-LLaMa again

cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa

Install GPTQ-for-LLaMa:

cd GPTQ-for-LLaMa
python setup_cuda.py install

4-bit now works:

python server.py --listen --model llama-7b  --gptq-bits 4

LTSarc · 2023-03-19T02:44:52Z

Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs.

Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs).

oobabooga · 2023-03-19T02:49:46Z

Hopefully all this will become more streamlined in the future.

xNul · 2023-03-19T05:01:38Z

I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work.

Windows (no WSL) LLaMA install/setup (normal/8bit/4bit)

Normal & 8bit LLaMA Setup

Install Anaconda
Install Git for Windows
Open the Anaconda Prompt and run these commands:

conda create -n textgen python=3.10.9
conda activate textgen
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

Follow the instructions here to fix the bitsandbytes library for Windows.

4bit LLaMA Setup

Run these commands:

conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the cl compiler to the environment may help. The last command needs a C++ compiler and an Nvidia CUDA compiler.

Downloading LLaMA Models

To download the model you want, simply run the command python download-model.py decapoda-research/llama-Xb-hf where X is the size of the model you want to download like 7 or 13.
Once downloaded, you have to fix the outdated config of the model. Open models/llama-Xb-hf/tokenizer_config.json and change LLaMATokenizer to LlamaTokenizer.
If you only want to run a normal or 8bit model, you're done. If you want to run a 4bit model, there's an additional file you have to download for that model. There is no central location for all of these files at the moment. 7B can be found here. 13B can be found here. 30B can be found here. This one might work for 65B.
Once downloaded, move the .pt file into model/llama-Xb-hf and you should be done.

Running the LLaMA Models

Normal LLaMA Model

python server.py --model llama-Xb-hf

8bit LLaMA Model

python server.py --model llama-Xb-hf --load-in-8bit

4bit LLaMA Model

python server.py --model llama-Xb-hf --gptq-bits 4

jllllll · 2023-03-19T10:01:10Z

I would recommend changing the pytorch install instructions to:

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install.
You could even combine it with the environment creation:

conda create -n textgen pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia

It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with cuda-toolkit -c 'nvidia/label/cuda-11.7.0' or cuda-toolkit -c 'nvidia/label/cuda-11.7.1'

I haven't tried it yet, but it is possible to install just nvcc with: cuda-nvcc -c 'nvidia/label/cuda-11.7.0'

cyperium · 2023-03-19T12:51:23Z

When doing python setup_cuda.py install I get:

(textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] Det går inte att hitta filen
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
error: [WinError 2] Det går inte att hitta filen

(det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point.

I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration.

cyperium · 2023-03-19T13:21:39Z

I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled

Got it to work using this method.

BarfingLemurs · 2023-03-19T17:17:11Z

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

jllllll · 2023-03-19T17:30:01Z

@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.

You can build the wheel yourself for future use with: python setup_cuda.py bdist_wheel
This will place the wheel in a dist folder next to setup_cuda.py.

BarfingLemurs · 2023-03-19T17:34:54Z

Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process.

jllllll · 2023-03-19T17:36:22Z

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

BarfingLemurs · 2023-03-19T18:36:22Z

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

sorry about the weird paste in advance, I don't know what it's doing

(textgen) ubuntu@DESKTOP-LMFT8S4:/text-generation-webui/repositories/GPTQ-for-LLaMa$ python setup_cuda.py bdist_wheel No CUDA runtime is found, using CUDA_HOME='/home/ubuntu/miniconda3/envs/textgen' running bdist_wheel /home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in setup( File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run self.run_command(cmd_name) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 485, in build_extensions compiler_name, compiler_version = self._check_abi() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 869, in _check_abi _, version = get_compiler_abi_compatibility_and_version(compiler) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 337, in get_compiler_abi_compatibility_and_version if not check_compiler_ok_for_platform(compiler): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 291, in check_compiler_ok_for_platform which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1. (textgen) ubuntu@DESKTOP-LMFT8S4:/text-generation-webui/repositories/GPTQ-for-LLaMa$

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

(textgen) ubuntu@DESKTOP-LMFT8S4:~/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 11, in import chardet ModuleNotFoundError: No module named 'chardet' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/server.py", line 10, in import gradio as gr File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/init.py", line 3, in import gradio.components as components File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/components.py", line 34, in from gradio import media_data, processing_utils, utils File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/processing_utils.py", line 19, in import requests File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/init.py", line 45, in from .exceptions import RequestsDependencyWarning File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/exceptions.py", line 9, in from .compat import JSONDecodeError as CompatJSONDecodeError File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 13, in import charset_normalizer as chardet File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/init.py", line 23, in from charset_normalizer.api import from_fp, from_path, from_bytes, normalize File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in from charset_normalizer.md import mess_ratio File "charset_normalizer/md.py", line 5, in ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/constant.py)

jllllll · 2023-03-19T20:35:33Z

@jllllll do you have a .whl file?

I'm stuck on certain issues which I'm unsure about.

I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia and restarted WSL.

I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194

Here is a freshly compiled wheel:
quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl.zip
Make sure that you performed both of the pip install -r requirements.txt steps. You may need to install cuda into wsl using these commands:

wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run

Make sure not to use the driver installation option. That isn't for wsl.
It also wouldn't hurt to try restarting wsl manually with wsl --shutdown in powershell or cmd.

BarfingLemurs · 2023-03-19T20:37:42Z

@jllllll I really appreciate that, thanks.

NenadZG · 2023-03-20T13:05:34Z

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

trrahul · 2023-03-20T13:31:55Z

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

I am also getting the same error

MarvinLong · 2023-03-20T15:15:10Z

Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Thanks，this help me a lot. I had been stack with this problem for a day now

ncoder · 2023-03-20T16:12:59Z

Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:

Loading llama-7b-hf...
Traceback (most recent call last):
  File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
    model = load_quantized(model_name)
  File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly.

But then you get this error:

Error(s) in loading state_dict for LlamaForCausalLM:
	Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", 
...

Looks like were running the wrong version of GPTQ for the data we have.

gianfra-t · 2023-03-21T00:07:23Z

To solve the load_quant error, which is indeed a problem with a new version of GPTQ, you need to roll back. See: #445 (comment)

Also in my case I had to change the name of the tokenizer in tokenizer_config.json to "tokenizer_class": "LlamaTokenizer". That is I think an update in the transformer's repo class.

NenadZG · 2023-03-21T13:06:57Z

Thank you, the problem was a new version of GTPQ, as you said. I rolled back as in #445(comment) . After that I got this error:
ImportError: cannot import name 'LLaMAConfig' from 'transformers'.
Then I deleted my environment and reinstalled everything, and now it works!

The whole process of installation I did was:

conda create -n textgen
conda activate textgen
conda install torchvision torchaudio pytorch-cuda=11.7 git -c pytorch -c nvidia
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4
python setup_cuda.py install

After that I changed "LLaMATokenizer" to "LlamaTokenizer" in tokenizer_config.json file.

xNul · 2023-03-21T14:43:25Z

Thanks @NenadZG. I've updated my instructions with your GPTQ rollback fix.

ncoder · 2023-03-21T17:27:24Z

FYI, I've also managed to get it to work with the new version of GPTQ, but i had to re-quantize the weights.

…

On Tue, Mar 21, 2023 at 7:43 AM Blake Wyatt ***@***.***> wrote: Thanks @NenadZG <https://github.com/NenadZG>. I've updated my instructions with your GPTQ rollback fix. — Reply to this email directly, view it on GitHub <#416 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKFINT4O2ZLBNBPVRMOG43W5G5BRANCNFSM6AAAAAAV7UQRYU> . You are receiving this because you commented.Message ID: ***@***.***>

xNul · 2023-03-21T21:47:05Z

Good to know that's possible. I'll update my instructions when all versions of the model have become requantized.

jllllll · 2023-04-02T00:44:14Z

the repo has changed. which branch should we use now?

The cuda branch. However, I would recommend using oobabooga's fork for the time being: #708 (comment)

The webui is currently not updated to work with the latest version of GPTQ-for-LLaMa.

benkuku · 2023-05-16T08:27:32Z

Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:

Set up a clean textgen environment following undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found #400 (comment)

Run this command that takes 10 minutes to finish without any progress bar:
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
Restart the computer/WSL.

Remove the existing GPTQ-for-LLaMa folder:
rm -rf repositories/GPTQ-for-LLaMa/
Clone GPTQ-for-LLaMa again
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
Install GPTQ-for-LLaMa:
cd GPTQ-for-LLaMa
python setup_cuda.py install
4-bit now works:
python server.py --listen --model llama-7b  --gptq-bits 4

great!

github-actions · 2023-12-03T23:16:57Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

plhosk added the bug Something isn't working label Mar 18, 2023

oobabooga pinned this issue Mar 19, 2023

bartman081523 mentioned this issue Mar 19, 2023

Running the Alpaca LoRA does not work on GPU #412

Closed

1 task

oobabooga unpinned this issue Mar 19, 2023

Noospherix mentioned this issue Mar 28, 2023

Testing new GPTQ models with group-size error. #577

Closed

1 task

github-actions bot added the stale label Dec 3, 2023

github-actions bot closed this as completed Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

plhosk commented Mar 18, 2023 •

edited

BarfingLemurs commented Mar 18, 2023

plhosk commented Mar 18, 2023

maluhia commented Mar 18, 2023

oobabooga commented Mar 19, 2023 •

edited

plhosk commented Mar 19, 2023 •

edited

oobabooga commented Mar 19, 2023 •

edited

LTSarc commented Mar 19, 2023 •

edited

oobabooga commented Mar 19, 2023 •

edited

LTSarc commented Mar 19, 2023

oobabooga commented Mar 19, 2023

xNul commented Mar 19, 2023 •

edited

jllllll commented Mar 19, 2023 •

edited

cyperium commented Mar 19, 2023

cyperium commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023 •

edited

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

NenadZG commented Mar 20, 2023 •

edited

trrahul commented Mar 20, 2023

MarvinLong commented Mar 20, 2023

ncoder commented Mar 20, 2023

gianfra-t commented Mar 21, 2023

NenadZG commented Mar 21, 2023 •

edited

xNul commented Mar 21, 2023

ncoder commented Mar 21, 2023 via email

xNul commented Mar 21, 2023

jllllll commented Apr 2, 2023

benkuku commented May 16, 2023

github-actions bot commented Dec 3, 2023

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set) #416

Comments

plhosk commented Mar 18, 2023 • edited

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

BarfingLemurs commented Mar 18, 2023

plhosk commented Mar 18, 2023

maluhia commented Mar 18, 2023

oobabooga commented Mar 19, 2023 • edited

plhosk commented Mar 19, 2023 • edited

oobabooga commented Mar 19, 2023 • edited

LTSarc commented Mar 19, 2023 • edited

oobabooga commented Mar 19, 2023 • edited

LTSarc commented Mar 19, 2023

oobabooga commented Mar 19, 2023

xNul commented Mar 19, 2023 • edited

Windows (no WSL) LLaMA install/setup (normal/8bit/4bit)

Normal & 8bit LLaMA Setup

4bit LLaMA Setup

Downloading LLaMA Models

Running the LLaMA Models

Normal LLaMA Model

8bit LLaMA Model

4bit LLaMA Model

jllllll commented Mar 19, 2023 • edited

cyperium commented Mar 19, 2023

cyperium commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023 • edited

jllllll commented Mar 19, 2023

BarfingLemurs commented Mar 19, 2023

NenadZG commented Mar 20, 2023 • edited

trrahul commented Mar 20, 2023

MarvinLong commented Mar 20, 2023

ncoder commented Mar 20, 2023

gianfra-t commented Mar 21, 2023

NenadZG commented Mar 21, 2023 • edited

xNul commented Mar 21, 2023

ncoder commented Mar 21, 2023 via email

xNul commented Mar 21, 2023

jllllll commented Apr 2, 2023

benkuku commented May 16, 2023

github-actions bot commented Dec 3, 2023

plhosk commented Mar 18, 2023 •

edited

oobabooga commented Mar 19, 2023 •

edited

plhosk commented Mar 19, 2023 •

edited

oobabooga commented Mar 19, 2023 •

edited

LTSarc commented Mar 19, 2023 •

edited

oobabooga commented Mar 19, 2023 •

edited

xNul commented Mar 19, 2023 •

edited

jllllll commented Mar 19, 2023 •

edited

BarfingLemurs commented Mar 19, 2023 •

edited

NenadZG commented Mar 20, 2023 •

edited

NenadZG commented Mar 21, 2023 •

edited