Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple errors while compiling the kernel #11

Closed
athu16 opened this issue Mar 9, 2023 · 34 comments
Closed

Multiple errors while compiling the kernel #11

athu16 opened this issue Mar 9, 2023 · 34 comments

Comments

@athu16
Copy link

athu16 commented Mar 9, 2023

Hello, while trying to run python setup_cuda.py install, I get this error:

(venv) C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:387: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\TH -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\include "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\include" "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\Include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /EHsc /Tpquant_cuda.cpp /Fobuild\temp.win-amd64-cpython-310\Release\quant_cuda.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0
quant_cuda.cpp
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline

Then after a long list of errors, I get this at the end:

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc" -c quant_cuda_kernel.cu -o build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\TH -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\include "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\include" "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\Include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --use-local-env
quant_cuda_kernel.cu
C:/Users/Username/Documents/GitHub/GPTQ-for-LLaMa/venv/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here

C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here

2 errors detected in the compilation of "quant_cuda_kernel.cu".
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.4\\bin\\nvcc.exe' failed with exit code 1

Any idea what could be causing this? I've tried installing CUDA Toolkit 11.3 and Torch 1.12.1, but they too give the same error.

@xiscoding
Copy link

xiscoding commented Mar 9, 2023

Where you able to find the torch/all.h or the torch/python.h files?
And what IDE do you use if any?

@lxe
Copy link

lxe commented Mar 9, 2023

Mine fails a lot less verbosely on Windows:

(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> python setup_cuda.py install --verbose
running install
C:\Users\lxe\miniconda3\envs\chatbots\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
C:\Users\lxe\miniconda3\envs\chatbots\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
error: [WinError 2] The system cannot find the file specified

All the compilers are there:

(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30147 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
(chatbots) PS C:\Users\lxe\llama-webui-gptq\GPTQ-for-LLaMa> python -V
Python 3.10.9

@lxe
Copy link

lxe commented Mar 9, 2023

EDIT This comment is linked from elsewhere. Here's a more coherent guide: https://gist.github.com/lxe/82eb87db25fdb75b92fa18a6d494ee3c


I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

  1. Install Build Tools for Visual Studio 2019 (has to be 2019) here
  2. Install miniconda
  3. Open "x64 native tools command prompt"
  4. Activate conda via powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
  5. conda create -n gptq
  6. conda activate gptq
  7. conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
  8. conda install pip
  9. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
  10. git clone https://github.com/zphang/transformers.git
  11. pip install ./transformers
  12. pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  13. cd GPTQ-for-LLaMa
  14. $env:DISTUTILS_USE_SDK=1
  15. python setup_cuda.py install

When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.

@xiscoding
Copy link

RuntimeError: The current installed version of g++ (11.3.0) is greater than the maximum required version by CUDA 11.3 (10.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=10.0.0).

This is making me downgrade my g++ is there a way to that inside a conda environment you know of?
sudo apt-get remove gcc g++ -- this link is system wide.
conda install -c conda-forge gcc=9 -- I tried setting gcc=9 and it installed but I still got the error
I guess on windows you have Visual Studio so you probably don't need to do this. It looks promising thank you

@xiscoding
Copy link

I think the real issue is not properly using/installing Libtorch. DId you install that successfully if so how?

@g0hm4
Copy link

g0hm4 commented Mar 9, 2023

I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022)

2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html)

3. Open "x64 native tools command prompt"

4. Activate conda via `powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "`

5. `conda create -n gptq`

6. `conda activate gptq`

7. `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1`

8. `conda install pip`

9. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git`

10. `git clone https://github.com/zphang/transformers.git`

11. `pip install ./transformers`

12. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html`

13. `cd GPTQ-for-LLaMa`

14. `$env:DISTUTILS_USE_SDK=1`

15. `python setup_cuda.py install`

When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.

I'm getting this error even explicitly following those steps. No idea what's causing it:

error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

@athu16
Copy link
Author

athu16 commented Mar 10, 2023

I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

Thanks a lot for this! I still got a lot of errors during compilation, but at the end, it said this
Finished processing dependencies for quant-cuda==0.0.0
Does that mean it built successfully?

@qwopqwop200
Copy link
Owner

qwopqwop200 commented Mar 10, 2023

yes

@underlines
Copy link

underlines commented Mar 11, 2023

@lxe

git clone https://github.com/zphang/transformers.git

This repo only contains a readme.md:

March 6th, 2019:

Repository has been moved to https://github.com/zphang/bert_on_stilts

Should we use the one mentioned in the readme.md, which is also from March 2019? I doubt.
If not, which transformers repo should we install? The live one, or the one with the llama push via git clone --branch llama_push https://github.com/zphang/transformers.git?

@iChristGit
Copy link

iChristGit commented Mar 11, 2023

I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022)

2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html)

3. Open "x64 native tools command prompt"

4. Activate conda via `powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "`

5. `conda create -n gptq`

6. `conda activate gptq`

7. `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1`

8. `conda install pip`

9. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git`

10. `git clone https://github.com/zphang/transformers.git`

11. `pip install ./transformers`

12. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html`

13. `cd GPTQ-for-LLaMa`

14. `$env:DISTUTILS_USE_SDK=1`

15. `python setup_cuda.py install`

When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.

followed the steps, got Finished processing dependencies for quant-cuda==0.0.0

but when running webui I get:

Starting the web UI...
Loading the extension "gallery"... Ok.
Loading llama-7b...
CUDA extension not installed.
Loading model ...
Traceback (most recent call last):
File "D:\MachineLearning\TextWebui\text-generation-webui\server.py", line 194, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\MachineLearning\TextWebui\text-generation-webui\modules\models.py", line 119, in load_model
model = load_quant(path_to_model, Path(f"models/{pt_model}"), 4)
File "D:\MachineLearning\TextWebui\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 241, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "D:\MachineLearning\TextWebui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM:
Missing key(s) in state_dict: "model.decoder.embed_tokens.weight", "model.decoder.layers.0.self_attn.q_proj.zeros", "model.decoder.layers.0.self_attn.q_proj.scales", "model.decoder.layers.0.self_attn.q_proj.bias", "model.decoder.layers.0.self_attn.q_proj.qweight", "model.decoder.layers.0.self_attn.k_proj.zeros", "model.decoder.layers.0.self_attn.k_proj.scales", "model.decoder.layers.0.self_attn.k_proj.bias", "model.decoder.layers.0.self_attn.k_proj.qweight", "model.decoder.layers.0.self_attn.v_proj.zeros", "model.decoder.layers.0.self_attn.v_proj.scales", "model.decoder.layers.0.self_attn.v_proj.bias", "model.decoder.layers.0.self_attn.v_proj.qweight", "model.decoder.layers.0.self_attn.o_proj.zeros", "model.decoder.layers.0.self_attn.o_proj.scales", "model.decoder.layers.0.self_attn.o_proj.bias", "model.decoder.layers.0.self_attn.o_proj.qweight", "model.decoder.layers.0.feed_forward.w1.zeros", "model.decoder.layers.0.feed_forward.w1.scales", "model.decoder.layers.0.feed_forward.w1.bias", "model.decoder.layers.0.feed_forward.w1.qweight", "model.decoder.layers.0.feed_forward.w2.zeros", "model.decoder.layers.0.feed_forward.w2.scales", "model.decoder.layers.0.feed_forward.w2.bias", "model.decoder.layers.0.feed_forward.w2.qweight", "model.decoder.layers.0.feed_forward.w3.zeros", "model.decoder.layers.0.feed_forward.w3.scales", "model.decoder.layers.0.feed_forward.w3.bias", "model.decoder.layers.0.feed_forward.w3.qweight", "model.decoder.layers.0.attention_norm.weight", "model.decoder.layers.0.ffn_norm.weight", "model.decoder.layers.1.self_attn.q_proj.zeros", "model.decoder.layers.1.self_attn.q_proj.scales", "model.decoder.layers.1.self_attn.q_proj.bias", "model.decoder.layers.1.self_attn.q_proj.qweight", "model.decoder.layers.1.self_attn.k_proj.zeros", "model.decoder.layers.1.self_attn.k_proj.scales", "model.decoder.layers.1.self_attn.k_proj.bias", "model.decoder.layers.1.self_attn.k_proj.qweight", "model.decoder.layers.1.self_attn.v_proj.zeros", "model.decoder.layers.1.self_attn.v_proj.scales", "model.decoder.layers.1.self_attn.v_proj.bias", "model.decoder.layers.1.self_attn.v_proj.qweight", "model.decoder.layers.1.self_attn.o_proj.zeros", "model.decoder.layers.1.self_attn.o_proj.scales", "model.decoder.layers.1.self_attn.o_proj.bias", "model.decoder.layers.1.self_attn.o_proj.qweight", "model.decoder.layers.1.feed_forward.w1.zeros", "model.decoder.layers.1.feed_forward.w1.scales", "model.decoder.layers.1.feed_forward.w1.bias", "model.decoder.layers.1.feed_forward.w1.qweight", "model.decoder.layers.1.feed_forward.w2.zeros", "model.decoder.layers.1.feed_forward.w2.scales", "model.decoder.layers.1.feed_forward.w2.bias", "model.decoder.layers.1.feed_forward.w2.qweight", "model.decoder.layers.1.feed_forward.w3.zeros", "model.decoder.layers.1.feed_forward.w3.scales", "model.decoder.layers.1.feed_forward.w3.bias", "model.decoder.layers.1.feed_forward.w3.qweight", "model.decoder.layers.1.attention_norm.weight", "model.decoder.layers.1.ffn_norm.weight", "model.decoder.layers.2.self_attn.q_proj.zeros", "model.decoder.layers.2.self_attn.q_proj.scales", "model.decoder.layers.2.self_attn.q_proj.bias", "model.decoder.layers.2.self_attn.q_proj.qweight", "model.decoder.layers.2.self_attn.k_proj.zeros", "model.decoder.layers.2.self_attn.k_proj.scales", "model.decoder.layers.2.self_attn.k_proj.bias", "model.decoder.layers.2.self_attn.k_proj.qweight", "model.decoder.layers.2.self_attn.v_proj.zeros", "model.decoder.layers.2.self_attn.v_proj.scales", "model.decoder.layers.2.self_attn.v_proj.bias", "model.decoder.layers.2.self_attn.v_proj.qweight", "model.decoder.layers.2.self_attn.o_proj.zeros", "model.decoder.layers.2.self_attn.o_proj.scales", "model.decoder.layers.2.self_attn.o_proj.bias", "model.decoder.layers.2.self_attn.o_proj.qweight", "model.decoder.layers.2.feed_forward.w1.zeros", "model.decoder.layers.2.feed_forward.w1.scales", "model.decoder.layers.2.feed_forward.w1.bias", "model.decoder.layers.2.feed_forward.w1.qweight", "model.decoder.layers.2.feed_forward.w2.zeros", "model.decoder.layers.2.feed_forward.w2.scales", "model.decoder.layers.2.feed_forward.w2.bias", "model.decoder.layers.2.feed_forward.w2.qweight", "model.decoder.layers.2.feed_forward.w3.zeros", "model.decoder.layers.2.feed_forward.w3.scales", "model.decoder.layers.2.feed_forward.w3.bias", "model.decoder.layers.2.feed_forward.w3.qweight", "model.decoder.layers.2.attention_norm.weight", "model.decoder.layers.2.ffn_norm.weight", "model.decoder.layers.3.self_attn.q_proj.zeros", "model.decoder.layers.3.self_attn.q_proj.scales", "model.decoder.layers.3.self_attn.q_proj.bias", "model.decoder.layers.3.self_attn.q_proj.qweight", "model.decoder.layers.3.self_attn.k_proj.zeros", "model.decoder.layers.3.self_attn.k_proj.scales", "model.decoder.layers.3.self_attn.k_proj.bias", "model.decoder.layers.3.self_attn.k_proj.qweight", "model.decoder.layers.3.self_attn.v_proj.zeros", "model.decoder.layers.3.self_attn.v_proj.scales", "model.decoder.layers.3.self_attn.v_proj.bias", "model.decoder.layers.3.self_attn.v_proj.qweight", "model.decoder.layers.3.self_attn.o_proj.zeros", "model.decoder.layers.3.self_attn.o_proj.scales", "model.decoder.layers.3.self_attn.o_proj.bias", "model.decoder.layers.3.self_attn.o_proj.qweight", "model.decoder.layers.3.feed_forward.w1.zeros", "model.decoder.layers.3.feed_forward.w1.scales", "model.decoder.layers.3.feed_forward.w1.bias", "model.decoder.layers.3.feed_forward.w1.qweight", "model.decoder.layers.3.feed_forward.w2.zeros", "model.decoder.layers.3.feed_forward.w2.scales", "model.decoder.layers.3.feed_forward.w2.bias", "model.decoder.layers.3.feed_forward.w2.qweight", "model.decoder.layers.3.feed_forward.w3.zeros", "model.decoder.layers.3.feed_forward.w3.scales", "model.decoder.layers.3.feed_forward.w3.bias", "model.decoder.layers.3.feed_forward.w3.qweight", "model.decoder.layers.3.attention_norm.weight", "model.decoder.layers.3.ffn_norm.weight", "model.decoder.layers.4.self_attn.q_proj.zeros", "model.decoder.layers.4.self_attn.q_proj.scales", "model.decoder.layers.4.self_attn.q_proj.bias", "model.decoder.layers.4.self_attn.q_proj.qweight", "model.decoder.layers.4.self_attn.k_proj.zeros", "model.decoder.layers.4.self_attn.k_proj.scales", "model.decoder.layers.4.self_attn.k_proj.bias", "model.decoder.layers.4.self_attn.k_proj.qweight", "model.decoder.layers.4.self_attn.v_proj.zeros", "model.decoder.layers.4.self_attn.v_proj.scales", "model.decoder.layers.4.self_attn.v_proj.bias", "model.decoder.layers.4.self_attn.v_proj.qweight", "model.decoder.layers.4.self_attn.o_proj.zeros", "model.decoder.layers.4.self_attn.o_proj.scales", "model.decoder.layers.4.self_attn.o_proj.bias", "model.decoder.layers.4.self_attn.o_proj.qweight", "model.decoder.layers.4.feed_forward.w1.zeros", "model.decoder.layers.4.feed_forward.w1.scales", "model.decoder.layers.4.feed_forward.w1.bias", "model.decoder.layers.4.feed_forward.w1.qweight", "model.decoder.layers.4.feed_forward.w2.zeros", "model.decoder.layers.4.feed_forward.w2.scales", "model.decoder.layers.4.feed_forward.w2.bias", "model.decoder.layers.4.feed_forward.w2.qweight", "model.decoder.layers.4.feed_forward.w3.zeros", "model.decoder.layers.4.feed_forward.w3.scales", "model.decoder.layers.4.feed_forward.w3.bias", "model.decoder.layers.4.feed_forward.w3.qweight", "model.decoder.layers.4.attention_norm.weight", "model.decoder.layers.4.ffn_norm.weight", "model.decoder.layers.5.self_attn.q_proj.zeros", "model.decoder.layers.5.self_attn.q_proj.scales", "model.decoder.layers.5.self_attn.q_proj.bias", "model.decoder.layers.5.self_attn.q_proj.qweight", "model.decoder.layers.5.self_attn.k_proj.zeros", "model.decoder.layers.5.self_attn.k_proj.scales", "model.decoder.layers.5.self_attn.k_proj.bias", "model.decoder.layers.5.self_attn.k_proj.qweight", "model.decoder.layers.5.self_attn.v_proj.zeros", "model.decoder.layers.5.self_attn.v_proj.scales", "model.decoder.layers.5.self_attn.v_proj.bias", "model.decoder.layers.5.self_attn.v_proj.qweight", "model.decoder.layers.5.self_attn.o_proj.zeros", "model.decoder.layers.5.self_attn.o_proj.scales", "model.decoder.layers.5.self_attn.o_proj.bias", "model.decoder.layers.5.self_attn.o_proj.qweight", "model.decoder.layers.5.feed_forward.w1.zeros", "model.decoder.layers.5.feed_forward.w1.scales", "model.decoder.layers.5.feed_forward.w1.bias", "model.decoder.layers.5.feed_forward.w1.qweight", "model.decoder.layers.5.feed_forward.w2.zeros", "model.decoder.layers.5.feed_forward.w2.scales", "model.decoder.layers.5.feed_forward.w2.bias", "model.decoder.layers.5.feed_forward.w2.qweight", "model.decoder.layers.5.feed_forward.w3.zeros", "model.decoder.layers.5.feed_forward.w3.scales", "model.decoder.layers.5.feed_forward.w3.bias", "model.decoder.layers.5.feed_forward.w3.qweight", "model.decoder.layers.5.attention_norm.weight", "model.decoder.layers.5.ffn_norm.weight", "model.decoder.layers.6.self_attn.q_proj.zeros", "model.decoder.layers.6.self_attn.q_proj.scales", "model.decoder.layers.6.self_attn.q_proj.bias", "model.decoder.layers.6.self_attn.q_proj.qweight", "model.decoder.layers.6.self_attn.k_proj.zeros", "model.decoder.layers.6.self_attn.k_proj.scales", "model.decoder.layers.6.self_attn.k_proj.bias", "model.decoder.layers.6.self_attn.k_proj.qweight", "model.decoder.layers.6.self_attn.v_proj.zeros", "model.decoder.layers.6.self_attn.v_proj.scales", "model.decoder.layers.6.self_attn.v_proj.bias", "model.decoder.layers.6.self_attn.v_proj.qweight", "model.decoder.layers.6.self_attn.o_proj.zeros", "model.decoder.layers.6.self_attn.o_proj.scales", "model.decoder.layers.6.self_attn.o_proj.bias", "model.decoder.layers.6.self_attn.o_proj.qweight", "model.decoder.layers.6.feed_forward.w1.zeros", "model.decoder.layers.6.feed_forward.w1.scales", "model.decoder.layers.6.feed_forward.w1.bias", "model.decoder.layers.6.feed_forward.w1.qweight", "model.decoder.layers.6.feed_forward.w2.zeros", "model.decoder.layers.6.feed_forward.w2.scales", "model.decoder.layers.6.feed_forward.w2.bias", "model.decoder.layers.6.feed_forward.w2.qweight", "model.decoder.layers.6.feed_forward.w3.zeros", "model.decoder.layers.6.feed_forward.w3.scales", "model.decoder.layers.6.feed_forward.w3.bias", "model.decoder.layers.6.feed_forward.w3.qweight", "model.decoder.layers.6.attention_norm.weight", "model.decoder.layers.6.ffn_norm.weight", "model.decoder.layers.7.self_attn.q_proj.zeros", "model.decoder.layers.7.self_attn.q_proj.scales", "model.decoder.layers.7.self_attn.q_proj.bias", "model.decoder.layers.7.self_attn.q_proj.qweight", "model.decoder.layers.7.self_attn.k_proj.zeros", "model.decoder.layers.7.self_attn.k_proj.scales", "model.decoder.layers.7.self_attn.k_proj.bias", "model.decoder.layers.7.self_attn.k_proj.qweight", "model.decoder.layers.7.self_attn.v_proj.zeros", "model.decoder.layers.7.self_attn.v_proj.scales", "model.decoder.layers.7.self_attn.v_proj.bias", "model.decoder.layers.7.self_attn.v_proj.qweight", "model.decoder.layers.7.self_attn.o_proj.zeros", "model.decoder.layers.7.self_attn.o_proj.scales", "model.decoder.layers.7.self_attn.o_proj.bias", "model.decoder.layers.7.self_attn.o_proj.qweight", "model.decoder.layers.7.feed_forward.w1.zeros", "model.decoder.layers.7.feed_forward.w1.scales", "model.decoder.layers.7.feed_forward.w1.bias", "model.decoder.layers.7.feed_forward.w1.qweight", "model.decoder.layers.7.feed_forward.w2.zeros", "model.decoder.layers.7.feed_forward.w2.scales", "model.decoder.layers.7.feed_forward.w2.bias", "model.decoder.layers.7.feed_forward.w2.qweight", "model.decoder.layers.7.feed_forward.w3.zeros", "model.decoder.layers.7.feed_forward.w3.scales", "model.decoder.layers.7.feed_forward.w3.bias", "model.decoder.layers.7.feed_forward.w3.qweight", "model.decoder.layers.7.attention_norm.weight", "model.decoder.layers.7.ffn_norm.weight", "model.decoder.layers.8.self_attn.q_proj.zeros", "model.decoder.layers.8.self_attn.q_proj.scales", "model.decoder.layers.8.self_attn.q_proj.bias", "model.decoder.layers.8.self_attn.q_proj.qweight", "model.decoder.layers.8.self_attn.k_proj.zeros", "model.decoder.layers.8.self_attn.k_proj.scales", "model.decoder.layers.8.self_attn.k_proj.bias", "model.decoder.layers.8.self_attn.k_proj.qweight", "model.decoder.layers.8.self_attn.v_proj.zeros", "model.decoder.layers.8.self_attn.v_proj.scales", "model.decoder.layers.8.self_attn.v_proj.bias", "model.decoder.layers.8.self_attn.v_proj.qweight", "model.decoder.layers.8.self_attn.o_proj.zeros", "model.decoder.layers.8.self_attn.o_proj.scales", "model.decoder.layers.8.self_attn.o_proj.bias", "model.decoder.layers.8.self_attn.o_proj.qweight", "model.decoder.layers.8.feed_forward.w1.zeros", "model.decoder.layers.8.feed_forward.w1.scales", "model.decoder.layers.8.feed_forward.w1.bias", "model.decoder.layers.8.feed_forward.w1.qweight", "model.decoder.layers.8.feed_forward.w2.zeros", "model.decoder.layers.8.feed_forward.w2.scales", "model.decoder.layers.8.feed_forward.w2.bias", "model.decoder.layers.8.feed_forward.w2.qweight", "model.decoder.layers.8.feed_forward.w3.zeros", "model.decoder.layers.8.feed_forward.w3.scales", "model.decoder.layers.8.feed_forward.w3.bias", "model.decoder.layers.8.feed_forward.w3.qweight", "model.decoder.layers.8.attention_norm.weight", "model.decoder.layers.8.ffn_norm.weight", "model.decoder.layers.9.self_attn.q_proj.zeros", "model.decoder.layers.9.self_attn.q_proj.scales", "model.decoder.layers.9.self_attn.q_proj.bias", "model.decoder.layers.9.self_attn.q_proj.qweight", "model.decoder.layers.9.self_attn.k_proj.zeros", "model.decoder.layers.9.self_attn.k_proj.scales", "model.decoder.layers.9.self_attn.k_proj.bias", "model.decoder.layers.9.self_attn.k_proj.qweight", "model.decoder.layers.9.self_attn.v_proj.zeros", "model.decoder.layers.9.self_attn.v_proj.scales", "model.decoder.layers.9.self_attn.v_proj.bias", "model.decoder.layers.9.self_attn.v_proj.qweight", "model.decoder.layers.9.self_attn.o_proj.zeros", "model.decoder.layers.9.self_attn.o_proj.scales", "model.decoder.layers.9.self_attn.o_proj.bias", "model.decoder.layers.9.self_attn.o_proj.qweight", "model.decoder.layers.9.feed_forward.w1.zeros", "model.decoder.layers.9.feed_forward.w1.scales", "model.decoder.layers.9.feed_forward.w1.bias", "model.decoder.layers.9.feed_forward.w1.qweight", "model.decoder.layers.9.feed_forward.w2.zeros", "model.decoder.layers.9.feed_forward.w2.scales", "model.decoder.layers.9.feed_forward.w2.bias", "model.decoder.layers.9.feed_forward.w2.qweight", "model.decoder.layers.9.feed_forward.w3.zeros", "model.decoder.layers.9.feed_forward.w3.scales", "model.decoder.layers.9.feed_forward.w3.bias", "model.decoder.layers.9.feed_forward.w3.qweight", "model.decoder.layers.9.attention_norm.weight", "model.decoder.layers.9.ffn_norm.weight", "model.decoder.layers.10.self_attn.q_proj.zeros", "model.decoder.layers.10.self_attn.q_proj.scales", "model.decoder.layers.10.self_attn.q_proj.bias", "model.decoder.layers.10.self_attn.q_proj.qweight", "model.decoder.layers.10.self_attn.k_proj.zeros", "model.decoder.layers.10.self_attn.k_proj.scales", "model.decoder.layers.10.self_attn.k_proj.bias", "model.decoder.layers.10.self_attn.k_proj.qweight", "model.decoder.layers.10.self_attn.v_proj.zeros", "model.decoder.layers.10.self_attn.v_proj.scales", "model.decoder.layers.10.self_attn.v_proj.bias", "model.decoder.layers.10.self_attn.v_proj.qweight", "model.decoder.layers.10.self_attn.o_proj.zeros", "model.decoder.layers.10.self_attn.o_proj.scales", "model.decoder.layers.10.self_attn.o_proj.bias", "model.decoder.layers.10.self_attn.o_proj.qweight", "model.decoder.layers.10.feed_forward.w1.zeros", "model.decoder.layers.10.feed_forward.w1.scales", "model.decoder.layers.10.feed_forward.w1.bias", "model.decoder.layers.10.feed_forward.w1.qweight", "model.decoder.layers.10.feed_forward.w2.zeros", "model.decoder.layers.10.feed_forward.w2.scales", "model.decoder.layers.10.feed_forward.w2.bias", "model.decoder.layers.10.feed_forward.w2.qweight", "model.decoder.layers.10.feed_forward.w3.zeros", "model.decoder.layers.10.feed_forward.w3.scales", "model.decoder.layers.10.feed_forward.w3.bias", "model.decoder.layers.10.feed_forward.w3.qweight", "model.decoder.layers.10.attention_norm.weight", "model.decoder.layers.10.ffn_norm.weight", "model.decoder.layers.11.self_attn.q_proj.zeros", "model.decoder.layers.11.self_attn.q_proj.scales", "model.decoder.layers.11.self_attn.q_proj.bias", "model.decoder.layers.11.self_attn.q_proj.qweight", "model.decoder.layers.11.self_attn.k_proj.zeros", "model.decoder.layers.11.self_attn.k_proj.scales", "model.decoder.layers.11.self_attn.k_proj.bias", "model.decoder.layers.11.self_attn.k_proj.qweight", "model.decoder.layers.11.self_attn.v_proj.zeros", "model.decoder.layers.11.self_attn.v_proj.scales", "model.decoder.layers.11.self_attn.v_proj.bias", "model.decoder.layers.11.self_attn.v_proj.qweight", "model.decoder.layers.11.self_attn.o_proj.zeros", "model.decoder.layers.11.self_attn.o_proj.scales", "model.decoder.layers.11.self_attn.o_proj.bias", "model.decoder.layers.11.self_attn.o_proj.qweight", "model.decoder.layers.11.feed_forward.w1.zeros", "model.decoder.layers.11.feed_forward.w1.scales", "model.decoder.layers.11.feed_forward.w1.bias", "model.decoder.layers.11.feed_forward.w1.qweight", "model.decoder.layers.11.feed_forward.w2.zeros", "model.decoder.layers.11.feed_forward.w2.scales", "model.decoder.layers.11.feed_forward.w2.bias", "model.decoder.layers.11.feed_forward.w2.qweight", "model.decoder.layers.11.feed_forward.w3.zeros", "model.decoder.layers.11.feed_forward.w3.scales", "model.decoder.layers.11.feed_forward.w3.bias", "model.decoder.layers.11.feed_forward.w3.qweight", "model.decoder.layers.11.attention_norm.weight", "model.decoder.layers.11.ffn_norm.weight", "model.decoder.layers.12.self_attn.q_proj.zeros", "model.decoder.layers.12.self_attn.q_proj.scales", "model.decoder.layers.12.self_attn.q_proj.bias", "model.decoder.layers.12.self_attn.q_proj.qweight", "model.decoder.layers.12.self_attn.k_proj.zeros", "model.decoder.layers.12.self_attn.k_proj.scales", "model.decoder.layers.12.self_attn.k_proj.bias", "model.decoder.layers.12.self_attn.k_proj.qweight", "model.decoder.layers.12.self_attn.v_proj.zeros", "model.decoder.layers.12.self_attn.v_proj.scales", "model.decoder.layers.12.self_attn.v_proj.bias", "model.decoder.layers.12.self_attn.v_proj.qweight", "model.decoder.layers.12.self_attn.o_proj.zeros", "model.decoder.layers.12.self_attn.o_proj.scales", "model.decoder.layers.12.self_attn.o_proj.bias", "model.decoder.layers.12.self_attn.o_proj.qweight", "model.decoder.layers.12.feed_forward.w1.zeros", "model.decoder.layers.12.feed_forward.w1.scales", "model.decoder.layers.12.feed_forward.w1.bias", "model.decoder.layers.12.feed_forward.w1.qweight", "model.decoder.layers.12.feed_forward.w2.zeros", "model.decoder.layers.12.feed_forward.w2.scales", "model.decoder.layers.12.feed_forward.w2.bias", "model.decoder.layers.12.feed_forward.w2.qweight", "model.decoder.layers.12.feed_forward.w3.zeros", "model.decoder.layers.12.feed_forward.w3.scales", "model.decoder.layers.12.feed_forward.w3.bias", "model.decoder.layers.12.feed_forward.w3.qweight", "model.decoder.layers.12.attention_norm.weight", "model.decoder.layers.12.ffn_norm.weight", "model.decoder.layers.13.self_attn.q_proj.zeros", "model.decoder.layers.13.self_attn.q_proj.scales", "model.decoder.layers.13.self_attn.q_proj.bias", "model.decoder.layers.13.self_attn.q_proj.qweight", "model.decoder.layers.13.self_attn.k_proj.zeros", "model.decoder.layers.13.self_attn.k_proj.scales", "model.decoder.layers.13.self_attn.k_proj.bias", "model.decoder.layers.13.self_attn.k_proj.qweight", "model.decoder.layers.13.self_attn.v_proj.zeros", "model.decoder.layers.13.self_attn.v_proj.scales", "model.decoder.layers.13.self_attn.v_proj.bias", "model.decoder.layers.13.self_attn.v_proj.qweight", "model.decoder.layers.13.self_attn.o_proj.zeros", "model.decoder.layers.13.self_attn.o_proj.scales", "model.decoder.layers.13.self_attn.o_proj.bias", "model.decoder.layers.13.self_attn.o_proj.qweight", "model.decoder.layers.13.feed_forward.w1.zeros", "model.decoder.layers.13.feed_forward.w1.scales", "model.decoder.layers.13.feed_forward.w1.bias", "model.decoder.layers.13.feed_forward.w1.qweight", "model.decoder.layers.13.feed_forward.w2.zeros", "model.decoder.layers.13.feed_forward.w2.scales", "model.decoder.layers.13.feed_forward.w2.bias", "model.decoder.layers.13.feed_forward.w2.qweight", "model.decoder.layers.13.feed_forward.w3.zeros", "model.decoder.layers.13.feed_forward.w3.scales", "model.decoder.layers.13.feed_forward.w3.bias", "model.decoder.layers.13.feed_forward.w3.qweight", "model.decoder.layers.13.attention_norm.weight", "model.decoder.layers.13.ffn_norm.weight", "model.decoder.layers.14.self_attn.q_proj.zeros", "model.decoder.layers.14.self_attn.q_proj.scales", "model.decoder.layers.14.self_attn.q_proj.bias", "model.decoder.layers.14.self_attn.q_proj.qweight", "model.decoder.layers.14.self_attn.k_proj.zeros", "model.decoder.layers.14.self_attn.k_proj.scales", "model.decoder.layers.14.self_attn.k_proj.bias", "model.decoder.layers.14.self_attn.k_proj.qweight", "model.decoder.layers.14.self_attn.v_proj.zeros", "model.decoder.layers.14.self_attn.v_proj.scales", "model.decoder.layers.14.self_attn.v_proj.bias", "model.decoder.layers.14.self_attn.v_proj.qweight", "model.decoder.layers.14.self_attn.o_proj.zeros", "model.decoder.layers.14.self_attn.o_proj.scales", "model.decoder.layers.14.self_attn.o_proj.bias", "model.decoder.layers.14.self_attn.o_proj.qweight", "model.decoder.layers.14.feed_forward.w1.zeros", "model.decoder.layers.14.feed_forward.w1.scales", "model.decoder.layers.14.feed_forward.w1.bias", "model.decoder.layers.14.feed_forward.w1.qweight", "model.decoder.layers.14.feed_forward.w2.zeros", "model.decoder.layers.14.feed_forward.w2.scales", "model.decoder.layers.14.feed_forward.w2.bias", "model.decoder.layers.14.feed_forward.w2.qweight", "model.decoder.layers.14.feed_forward.w3.zeros", "model.decoder.layers.14.feed_forward.w3.scales", "model.decoder.layers.14.feed_forward.w3.bias", "model.decoder.layers.14.feed_forward.w3.qweight", "model.decoder.layers.14.attention_norm.weight", "model.decoder.layers.14.ffn_norm.weight", "model.decoder.layers.15.self_attn.q_proj.zeros", "model.decoder.layers.15.self_attn.q_proj.scales", "model.decoder.layers.15.self_attn.q_proj.bias", "model.decoder.layers.15.self_attn.q_proj.qweight", "model.decoder.layers.15.self_attn.k_proj.zeros", "model.decoder.layers.15.self_attn.k_proj.scales", "model.decoder.layers.15.self_attn.k_proj.bias", "model.decoder.layers.15.self_attn.k_proj.qweight", "model.decoder.layers.15.self_attn.v_proj.zeros", "model.decoder.layers.15.self_attn.v_proj.scales", "model.decoder.layers.15.self_attn.v_proj.bias", "model.decoder.layers.15.self_attn.v_proj.qweight", "model.decoder.layers.15.self_attn.o_proj.zeros", "model.decoder.layers.15.self_attn.o_proj.scales", "model.decoder.layers.15.self_attn.o_proj.bias", "model.decoder.layers.15.self_attn.o_proj.qweight", "model.decoder.layers.15.feed_forward.w1.zeros", "model.decoder.layers.15.feed_forward.w1.scales", "model.decoder.layers.15.feed_forward.w1.bias", "model.decoder.layers.15.feed_forward.w1.qweight", "model.decoder.layers.15.feed_forward.w2.zeros", "model.decoder.layers.15.feed_forward.w2.scales", "model.decoder.layers.15.feed_forward.w2.bias", "model.decoder.layers.15.feed_forward.w2.qweight", "model.decoder.layers.15.feed_forward.w3.zeros", "model.decoder.layers.15.feed_forward.w3.scales", "model.decoder.layers.15.feed_forward.w3.bias", "model.decoder.layers.15.feed_forward.w3.qweight", "model.decoder.layers.15.attention_norm.weight", "model.decoder.layers.15.ffn_norm.weight", "model.decoder.layers.16.self_attn.q_proj.zeros", "model.decoder.layers.16.self_attn.q_proj.scales", "model.decoder.layers.16.self_attn.q_proj.bias", "model.decoder.layers.16.self_attn.q_proj.qweight", "model.decoder.layers.16.self_attn.k_proj.zeros", "model.decoder.layers.16.self_attn.k_proj.scales", "model.decoder.layers.16.self_attn.k_proj.bias", "model.decoder.layers.16.self_attn.k_proj.qweight", "model.decoder.layers.16.self_attn.v_proj.zeros", "model.decoder.layers.16.self_attn.v_proj.scales", "model.decoder.layers.16.self_attn.v_proj.bias", "model.decoder.layers.16.self_attn.v_proj.qweight", "model.decoder.layers.16.self_attn.o_proj.zeros", "model.decoder.layers.16.self_attn.o_proj.scales", "model.decoder.layers.16.self_attn.o_proj.bias", "model.decoder.layers.16.self_attn.o_proj.qweight", "model.decoder.layers.16.feed_forward.w1.zeros", "model.decoder.layers.16.feed_forward.w1.scales", "model.decoder.layers.16.feed_forward.w1.bias", "model.decoder.layers.16.feed_forward.w1.qweight", "model.decoder.layers.16.feed_forward.w2.zeros", "model.decoder.layers.16.feed_forward.w2.scales", "model.decoder.layers.16.feed_forward.w2.bias", "model.decoder.layers.16.feed_forward.w2.qweight", "model.decoder.layers.16.feed_forward.w3.zeros", "model.decoder.layers.16.feed_forward.w3.scales", "model.decoder.layers.16.feed_forward.w3.bias", "model.decoder.layers.16.feed_forward.w3.qweight", "model.decoder.layers.16.attention_norm.weight", "model.decoder.layers.16.ffn_norm.weight", "model.decoder.layers.17.self_attn.q_proj.zeros", "model.decoder.layers.17.self_attn.q_proj.scales", "model.decoder.layers.17.self_attn.q_proj.bias", "model.decoder.layers.17.self_attn.q_proj.qweight", "model.decoder.layers.17.self_attn.k_proj.zeros", "model.decoder.layers.17.self_attn.k_proj.scales", "model.decoder.layers.17.self_attn.k_proj.bias", "model.decoder.layers.17.self_attn.k_proj.qweight", "model.decoder.layers.17.self_attn.v_proj.zeros", "model.decoder.layers.17.self_attn.v_proj.scales", "model.decoder.layers.17.self_attn.v_proj.bias", "model.decoder.layers.17.self_attn.v_proj.qweight", "model.decoder.layers.17.self_attn.o_proj.zeros", "model.decoder.layers.17.self_attn.o_proj.scales", "model.decoder.layers.17.self_attn.o_proj.bias", "model.decoder.layers.17.self_attn.o_proj.qweight", "model.decoder.layers.17.feed_forward.w1.zeros", "model.decoder.layers.17.feed_forward.w1.scales", "model.decoder.layers.17.feed_forward.w1.bias", "model.decoder.layers.17.feed_forward.w1.qweight", "model.decoder.layers.17.feed_forward.w2.zeros", "model.decoder.layers.17.feed_forward.w2.scales", "model.decoder.layers.17.feed_forward.w2.bias", "model.decoder.layers.17.feed_forward.w2.qweight", "model.decoder.layers.17.feed_forward.w3.zeros", "model.decoder.layers.17.feed_forward.w3.scales", "model.decoder.layers.17.feed_forward.w3.bias", "model.decoder.layers.17.feed_forward.w3.qweight", "model.decoder.layers.17.attention_norm.weight", "model.decoder.layers.17.ffn_norm.weight", "model.decoder.layers.18.self_attn.q_proj.zeros", "model.decoder.layers.18.self_attn.q_proj.scales", "model.decoder.layers.18.self_attn.q_proj.bias", "model.decoder.layers.18.self_attn.q_proj.qweight", "model.decoder.layers.18.self_attn.k_proj.zeros", "model.decoder.layers.18.self_attn.k_proj.scales", "model.decoder.layers.18.self_attn.k_proj.bias", "model.decoder.layers.18.self_attn.k_proj.qweight", "model.decoder.layers.18.self_attn.v_proj.zeros", "model.decoder.layers.18.self_attn.v_proj.scales", "model.decoder.layers.18.self_attn.v_proj.bias", "model.decoder.layers.18.self_attn.v_proj.qweight", "model.decoder.layers.18.self_attn.o_proj.zeros", "model.decoder.layers.18.self_attn.o_proj.scales", "model.decoder.layers.18.self_attn.o_proj.bias", "model.decoder.layers.18.self_attn.o_proj.qweight", "model.decoder.layers.18.feed_forward.w1.zeros", "model.decoder.layers.18.feed_forward.w1.scales", "model.decoder.layers.18.feed_forward.w1.bias", "model.decoder.layers.18.feed_forward.w1.qweight", "model.decoder.layers.18.feed_forward.w2.zeros", "model.decoder.layers.18.feed_forward.w2.scales", "model.decoder.layers.18.feed_forward.w2.bias", "model.decoder.layers.18.feed_forward.w2.qweight", "model.decoder.layers.18.feed_forward.w3.zeros", "model.decoder.layers.18.feed_forward.w3.scales", "model.decoder.layers.18.feed_forward.w3.bias", "model.decoder.layers.18.feed_forward.w3.qweight", "model.decoder.layers.18.attention_norm.weight", "model.decoder.layers.18.ffn_norm.weight", "model.decoder.layers.19.self_attn.q_proj.zeros", "model.decoder.layers.19.self_attn.q_proj.scales", "model.decoder.layers.19.self_attn.q_proj.bias", "model.decoder.layers.19.self_attn.q_proj.qweight", "model.decoder.layers.19.self_attn.k_proj.zeros", "model.decoder.layers.19.self_attn.k_proj.scales", "model.decoder.layers.19.self_attn.k_proj.bias", "model.decoder.layers.19.self_attn.k_proj.qweight", "model.decoder.layers.19.self_attn.v_proj.zeros", "model.decoder.layers.19.self_attn.v_proj.scales", "model.decoder.layers.19.self_attn.v_proj.bias", "model.decoder.layers.19.self_attn.v_proj.qweight", "model.decoder.layers.19.self_attn.o_proj.zeros", "model.decoder.layers.19.self_attn.o_proj.scales", "model.decoder.layers.19.self_attn.o_proj.bias", "model.decoder.layers.19.self_attn.o_proj.qweight", "model.decoder.layers.19.feed_forward.w1.zeros", "model.decoder.layers.19.feed_forward.w1.scales", "model.decoder.layers.19.feed_forward.w1.bias", "model.decoder.layers.19.feed_forward.w1.qweight", "model.decoder.layers.19.feed_forward.w2.zeros", "model.decoder.layers.19.feed_forward.w2.scales", "model.decoder.layers.19.feed_forward.w2.bias", "model.decoder.layers.19.feed_forward.w2.qweight", "model.decoder.layers.19.feed_forward.w3.zeros", "model.decoder.layers.19.feed_forward.w3.scales", "model.decoder.layers.19.feed_forward.w3.bias", "model.decoder.layers.19.feed_forward.w3.qweight", "model.decoder.layers.19.attention_norm.weight", "model.decoder.layers.19.ffn_norm.weight", "model.decoder.layers.20.self_attn.q_proj.zeros", "model.decoder.layers.20.self_attn.q_proj.scales", "model.decoder.layers.20.self_attn.q_proj.bias", "model.decoder.layers.20.self_attn.q_proj.qweight", "model.decoder.layers.20.self_attn.k_proj.zeros", "model.decoder.layers.20.self_attn.k_proj.scales", "model.decoder.layers.20.self_attn.k_proj.bias", "model.decoder.layers.20.self_attn.k_proj.qweight", "model.decoder.layers.20.self_attn.v_proj.zeros", "model.decoder.layers.20.self_attn.v_proj.scales", "model.decoder.layers.20.self_attn.v_proj.bias", "model.decoder.layers.20.self_attn.v_proj.qweight", "model.decoder.layers.20.self_attn.o_proj.zeros", "model.decoder.layers.20.self_attn.o_proj.scales", "model.decoder.layers.20.self_attn.o_proj.bias", "model.decoder.layers.20.self_attn.o_proj.qweight", "model.decoder.layers.20.feed_forward.w1.zeros", "model.decoder.layers.20.feed_forward.w1.scales", "model.decoder.layers.20.feed_forward.w1.bias", "model.decoder.layers.20.feed_forward.w1.qweight", "model.decoder.layers.20.feed_forward.w2.zeros", "model.decoder.layers.20.feed_forward.w2.scales", "model.decoder.layers.20.feed_forward.w2.bias", "model.decoder.layers.20.feed_forward.w2.qweight", "model.decoder.layers.20.feed_forward.w3.zeros", "model.decoder.layers.20.feed_forward.w3.scales", "model.decoder.layers.20.feed_forward.w3.bias", "model.decoder.layers.20.feed_forward.w3.qweight", "model.decoder.layers.20.attention_norm.weight", "model.decoder.layers.20.ffn_norm.weight", "model.decoder.layers.21.self_attn.q_proj.zeros", "model.decoder.layers.21.self_attn.q_proj.scales", "model.decoder.layers.21.self_attn.q_proj.bias", "model.decoder.layers.21.self_attn.q_proj.qweight", "model.decoder.layers.21.self_attn.k_proj.zeros", "model.decoder.layers.21.self_attn.k_proj.scales", "model.decoder.layers.21.self_attn.k_proj.bias", "model.decoder.layers.21.self_attn.k_proj.qweight", "model.decoder.layers.21.self_attn.v_proj.zeros", "model.decoder.layers.21.self_attn.v_proj.scales", "model.decoder.layers.21.self_attn.v_proj.bias", "model.decoder.layers.21.self_attn.v_proj.qweight", "model.decoder.layers.21.self_attn.o_proj.zeros", "model.decoder.layers.21.self_attn.o_proj.scales", "model.decoder.layers.21.self_attn.o_proj.bias", "model.decoder.layers.21.self_attn.o_proj.qweight", "model.decoder.layers.21.feed_forward.w1.zeros", "model.decoder.layers.21.feed_forward.w1.scales", "model.decoder.layers.21.feed_forward.w1.bias", "model.decoder.layers.21.feed_forward.w1.qweight", "model.decoder.layers.21.feed_forward.w2.zeros", "model.decoder.layers.21.feed_forward.w2.scales", "model.decoder.layers.21.feed_forward.w2.bias", "model.decoder.layers.21.feed_forward.w2.qweight", "model.decoder.layers.21.feed_forward.w3.zeros", "model.decoder.layers.21.feed_forward.w3.scales", "model.decoder.layers.21.feed_forward.w3.bias", "model.decoder.layers.21.feed_forward.w3.qweight", "model.decoder.layers.21.attention_norm.weight", "model.decoder.layers.21.ffn_norm.weight", "model.decoder.layers.22.self_attn.q_proj.zeros", "model.decoder.layers.22.self_attn.q_proj.scales", "model.decoder.layers.22.self_attn.q_proj.bias", "model.decoder.layers.22.self_attn.q_proj.qweight", "model.decoder.layers.22.self_attn.k_proj.zeros", "model.decoder.layers.22.self_attn.k_proj.scales", "model.decoder.layers.22.self_attn.k_proj.bias", "model.decoder.layers.22.self_attn.k_proj.qweight", "model.decoder.layers.22.self_attn.v_proj.zeros", "model.decoder.layers.22.self_attn.v_proj.scales", "model.decoder.layers.22.self_attn.v_proj.bias", "model.decoder.layers.22.self_attn.v_proj.qweight", "model.decoder.layers.22.self_attn.o_proj.zeros", "model.decoder.layers.22.self_attn.o_proj.scales", "model.decoder.layers.22.self_attn.o_proj.bias", "model.decoder.layers.22.self_attn.o_proj.qweight", "model.decoder.layers.22.feed_forward.w1.zeros", "model.decoder.layers.22.feed_forward.w1.scales", "model.decoder.layers.22.feed_forward.w1.bias", "model.decoder.layers.22.feed_forward.w1.qweight", "model.decoder.layers.22.feed_forward.w2.zeros", "model.decoder.layers.22.feed_forward.w2.scales", "model.decoder.layers.22.feed_forward.w2.bias", "model.decoder.layers.22.feed_forward.w2.qweight", "model.decoder.layers.22.feed_forward.w3.zeros", "model.decoder.layers.22.feed_forward.w3.scales", "model.decoder.layers.22.feed_forward.w3.bias", "model.decoder.layers.22.feed_forward.w3.qweight", "model.decoder.layers.22.attention_norm.weight", "model.decoder.layers.22.ffn_norm.weight", "model.decoder.layers.23.self_attn.q_proj.zeros", "model.decoder.layers.23.self_attn.q_proj.scales", "model.decoder.layers.23.self_attn.q_proj.bias", "model.decoder.layers.23.self_attn.q_proj.qweight", "model.decoder.layers.23.self_attn.k_proj.zeros", "model.decoder.layers.23.self_attn.k_proj.scales", "model.decoder.layers.23.self_attn.k_proj.bias", "model.decoder.layers.23.self_attn.k_proj.qweight", "model.decoder.layers.23.self_attn.v_proj.zeros", "model.decoder.layers.23.self_attn.v_proj.scales", "model.decoder.layers.23.self_attn.v_proj.bias", "model.decoder.layers.23.self_attn.v_proj.qweight", "model.decoder.layers.23.self_attn.o_proj.zeros", "model.decoder.layers.23.self_attn.o_proj.scales", "model.decoder.layers.23.self_attn.o_proj.bias", "model.decoder.layers.23.self_attn.o_proj.qweight", "model.decoder.layers.23.feed_forward.w1.zeros", "model.decoder.layers.23.feed_forward.w1.scales", "model.decoder.layers.23.feed_forward.w1.bias", "model.decoder.layers.23.feed_forward.w1.qweight", "model.decoder.layers.23.feed_forward.w2.zeros", "model.decoder.layers.23.feed_forward.w2.scales", "model.decoder.layers.23.feed_forward.w2.bias", "model.decoder.layers.23.feed_forward.w2.qweight", "model.decoder.layers.23.feed_forward.w3.zeros", "model.decoder.layers.23.feed_forward.w3.scales", "model.decoder.layers.23.feed_forward.w3.bias", "model.decoder.layers.23.feed_forward.w3.qweight", "model.decoder.layers.23.attention_norm.weight", "model.decoder.layers.23.ffn_norm.weight", "model.decoder.layers.24.self_attn.q_proj.zeros", "model.decoder.layers.24.self_attn.q_proj.scales", "model.decoder.layers.24.self_attn.q_proj.bias", "model.decoder.layers.24.self_attn.q_proj.qweight", "model.decoder.layers.24.self_attn.k_proj.zeros", "model.decoder.layers.24.self_attn.k_proj.scales", "model.decoder.layers.24.self_attn.k_proj.bias", "model.decoder.layers.24.self_attn.k_proj.qweight", "model.decoder.layers.24.self_attn.v_proj.zeros", "model.decoder.layers.24.self_attn.v_proj.scales", "model.decoder.layers.24.self_attn.v_proj.bias", "model.decoder.layers.24.self_attn.v_proj.qweight", "model.decoder.layers.24.self_attn.o_proj.zeros", "model.decoder.layers.24.self_attn.o_proj.scales", "model.decoder.layers.24.self_attn.o_proj.bias", "model.decoder.layers.24.self_attn.o_proj.qweight", "model.decoder.layers.24.feed_forward.w1.zeros", "model.decoder.layers.24.feed_forward.w1.scales", "model.decoder.layers.24.feed_forward.w1.bias", "model.decoder.layers.24.feed_forward.w1.qweight", "model.decoder.layers.24.feed_forward.w2.zeros", "model.decoder.layers.24.feed_forward.w2.scales", "model.decoder.layers.24.feed_forward.w2.bias", "model.decoder.layers.24.feed_forward.w2.qweight", "model.decoder.layers.24.feed_forward.w3.zeros", "model.decoder.layers.24.feed_forward.w3.scales", "model.decoder.layers.24.feed_forward.w3.bias", "model.decoder.layers.24.feed_forward.w3.qweight", "model.decoder.layers.24.attention_norm.weight", "model.decoder.layers.24.ffn_norm.weight", "model.decoder.layers.25.self_attn.q_proj.zeros", "model.decoder.layers.25.self_attn.q_proj.scales", "model.decoder.layers.25.self_attn.q_proj.bias", "model.decoder.layers.25.self_attn.q_proj.qweight", "model.decoder.layers.25.self_attn.k_proj.zeros", "model.decoder.layers.25.self_attn.k_proj.scales", "model.decoder.layers.25.self_attn.k_proj.bias", "model.decoder.layers.25.self_attn.k_proj.qweight", "model.decoder.layers.25.self_attn.v_proj.zeros", "model.decoder.layers.25.self_attn.v_proj.scales", "model.decoder.layers.25.self_attn.v_proj.bias", "model.decoder.layers.25.self_attn.v_proj.qweight", "model.decoder.layers.25.self_attn.o_proj.zeros", "model.decoder.layers.25.self_attn.o_proj.scales", "model.decoder.layers.25.self_attn.o_proj.bias", "model.decoder.layers.25.self_attn.o_proj.qweight", "model.decoder.layers.25.feed_forward.w1.zeros", "model.decoder.layers.25.feed_forward.w1.scales", "model.decoder.layers.25.feed_forward.w1.bias", "model.decoder.layers.25.feed_forward.w1.qweight", "model.decoder.layers.25.feed_forward.w2.zeros", "model.decoder.layers.25.feed_forward.w2.scales", "model.decoder.layers.25.feed_forward.w2.bias", "model.decoder.layers.25.feed_forward.w2.qweight", "model.decoder.layers.25.feed_forward.w3.zeros", "model.decoder.layers.25.feed_forward.w3.scales", "model.decoder.layers.25.feed_forward.w3.bias", "model.decoder.layers.25.feed_forward.w3.qweight", "model.decoder.layers.25.attention_norm.weight", "model.decoder.layers.25.ffn_norm.weight", "model.decoder.layers.26.self_attn.q_proj.zeros", "model.decoder.layers.26.self_attn.q_proj.scales", "model.decoder.layers.26.self_attn.q_proj.bias", "model.decoder.layers.26.self_attn.q_proj.qweight", "model.decoder.layers.26.self_attn.k_proj.zeros", "model.decoder.layers.26.self_attn.k_proj.scales", "model.decoder.layers.26.self_attn.k_proj.bias", "model.decoder.layers.26.self_attn.k_proj.qweight", "model.decoder.layers.26.self_attn.v_proj.zeros", "model.decoder.layers.26.self_attn.v_proj.scales", "model.decoder.layers.26.self_attn.v_proj.bias", "model.decoder.layers.26.self_attn.v_proj.qweight", "model.decoder.layers.26.self_attn.o_proj.zeros", "model.decoder.layers.26.self_attn.o_proj.scales", "model.decoder.layers.26.self_attn.o_proj.bias", "model.decoder.layers.26.self_attn.o_proj.qweight", "model.decoder.layers.26.feed_forward.w1.zeros", "model.decoder.layers.26.feed_forward.w1.scales", "model.decoder.layers.26.feed_forward.w1.bias", "model.decoder.layers.26.feed_forward.w1.qweight", "model.decoder.layers.26.feed_forward.w2.zeros", "model.decoder.layers.26.feed_forward.w2.scales", "model.decoder.layers.26.feed_forward.w2.bias", "model.decoder.layers.26.feed_forward.w2.qweight", "model.decoder.layers.26.feed_forward.w3.zeros", "model.decoder.layers.26.feed_forward.w3.scales", "model.decoder.layers.26.feed_forward.w3.bias", "model.decoder.layers.26.feed_forward.w3.qweight", "model.decoder.layers.26.attention_norm.weight", "model.decoder.layers.26.ffn_norm.weight", "model.decoder.layers.27.self_attn.q_proj.zeros", "model.decoder.layers.27.self_attn.q_proj.scales", "model.decoder.layers.27.self_attn.q_proj.bias", "model.decoder.layers.27.self_attn.q_proj.qweight", "model.decoder.layers.27.self_attn.k_proj.zeros", "model.decoder.layers.27.self_attn.k_proj.scales", "model.decoder.layers.27.self_attn.k_proj.bias", "model.decoder.layers.27.self_attn.k_proj.qweight", "model.decoder.layers.27.self_attn.v_proj.zeros", "model.decoder.layers.27.self_attn.v_proj.scales", "model.decoder.layers.27.self_attn.v_proj.bias", "model.decoder.layers.27.self_attn.v_proj.qweight", "model.decoder.layers.27.self_attn.o_proj.zeros", "model.decoder.layers.27.self_attn.o_proj.scales", "model.decoder.layers.27.self_attn.o_proj.bias", "model.decoder.layers.27.self_attn.o_proj.qweight", "model.decoder.layers.27.feed_forward.w1.zeros", "model.decoder.layers.27.feed_forward.w1.scales", "model.decoder.layers.27.feed_forward.w1.bias", "model.decoder.layers.27.feed_forward.w1.qweight", "model.decoder.layers.27.feed_forward.w2.zeros", "model.decoder.layers.27.feed_forward.w2.scales", "model.decoder.layers.27.feed_forward.w2.bias", "model.decoder.layers.27.feed_forward.w2.qweight", "model.decoder.layers.27.feed_forward.w3.zeros", "model.decoder.layers.27.feed_forward.w3.scales", "model.decoder.layers.27.feed_forward.w3.bias", "model.decoder.layers.27.feed_forward.w3.qweight", "model.decoder.layers.27.attention_norm.weight", "model.decoder.layers.27.ffn_norm.weight", "model.decoder.layers.28.self_attn.q_proj.zeros", "model.decoder.layers.28.self_attn.q_proj.scales", "model.decoder.layers.28.self_attn.q_proj.bias", "model.decoder.layers.28.self_attn.q_proj.qweight", "model.decoder.layers.28.self_attn.k_proj.zeros", "model.decoder.layers.28.self_attn.k_proj.scales", "model.decoder.layers.28.self_attn.k_proj.bias", "model.decoder.layers.28.self_attn.k_proj.qweight", "model.decoder.layers.28.self_attn.v_proj.zeros", "model.decoder.layers.28.self_attn.v_proj.scales", "model.decoder.layers.28.self_attn.v_proj.bias", "model.decoder.layers.28.self_attn.v_proj.qweight", "model.decoder.layers.28.self_attn.o_proj.zeros", "model.decoder.layers.28.self_attn.o_proj.scales", "model.decoder.layers.28.self_attn.o_proj.bias", "model.decoder.layers.28.self_attn.o_proj.qweight", "model.decoder.layers.28.feed_forward.w1.zeros", "model.decoder.layers.28.feed_forward.w1.scales", "model.decoder.layers.28.feed_forward.w1.bias", "model.decoder.layers.28.feed_forward.w1.qweight", "model.decoder.layers.28.feed_forward.w2.zeros", "model.decoder.layers.28.feed_forward.w2.scales", "model.decoder.layers.28.feed_forward.w2.bias", "model.decoder.layers.28.feed_forward.w2.qweight", "model.decoder.layers.28.feed_forward.w3.zeros", "model.decoder.layers.28.feed_forward.w3.scales", "model.decoder.layers.28.feed_forward.w3.bias", "model.decoder.layers.28.feed_forward.w3.qweight", "model.decoder.layers.28.attention_norm.weight", "model.decoder.layers.28.ffn_norm.weight", "model.decoder.layers.29.self_attn.q_proj.zeros", "model.decoder.layers.29.self_attn.q_proj.scales", "model.decoder.layers.29.self_attn.q_proj.bias", "model.decoder.layers.29.self_attn.q_proj.qweight", "model.decoder.layers.29.self_attn.k_proj.zeros", "model.decoder.layers.29.self_attn.k_proj.scales", "model.decoder.layers.29.self_attn.k_proj.bias", "model.decoder.layers.29.self_attn.k_proj.qweight", "model.decoder.layers.29.self_attn.v_proj.zeros", "model.decoder.layers.29.self_attn.v_proj.scales", "model.decoder.layers.29.self_attn.v_proj.bias", "model.decoder.layers.29.self_attn.v_proj.qweight", "model.decoder.layers.29.self_attn.o_proj.zeros", "model.decoder.layers.29.self_attn.o_proj.scales", "model.decoder.layers.29.self_attn.o_proj.bias", "model.decoder.layers.29.self_attn.o_proj.qweight", "model.decoder.layers.29.feed_forward.w1.zeros", "model.decoder.layers.29.feed_forward.w1.scales", "model.decoder.layers.29.feed_forward.w1.bias", "model.decoder.layers.29.feed_forward.w1.qweight", "model.decoder.layers.29.feed_forward.w2.zeros", "model.decoder.layers.29.feed_forward.w2.scales", "model.decoder.layers.29.feed_forward.w2.bias", "model.decoder.layers.29.feed_forward.w2.qweight", "model.decoder.layers.29.feed_forward.w3.zeros", "model.decoder.layers.29.feed_forward.w3.scales", "model.decoder.layers.29.feed_forward.w3.bias", "model.decoder.layers.29.feed_forward.w3.qweight", "model.decoder.layers.29.attention_norm.weight", "model.decoder.layers.29.ffn_norm.weight", "model.decoder.layers.30.self_attn.q_proj.zeros", "model.decoder.layers.30.self_attn.q_proj.scales", "model.decoder.layers.30.self_attn.q_proj.bias", "model.decoder.layers.30.self_attn.q_proj.qweight", "model.decoder.layers.30.self_attn.k_proj.zeros", "model.decoder.layers.30.self_attn.k_proj.scales", "model.decoder.layers.30.self_attn.k_proj.bias", "model.decoder.layers.30.self_attn.k_proj.qweight", "model.decoder.layers.30.self_attn.v_proj.zeros", "model.decoder.layers.30.self_attn.v_proj.scales", "model.decoder.layers.30.self_attn.v_proj.bias", "model.decoder.layers.30.self_attn.v_proj.qweight", "model.decoder.layers.30.self_attn.o_proj.zeros", "model.decoder.layers.30.self_attn.o_proj.scales", "model.decoder.layers.30.self_attn.o_proj.bias", "model.decoder.layers.30.self_attn.o_proj.qweight", "model.decoder.layers.30.feed_forward.w1.zeros", "model.decoder.layers.30.feed_forward.w1.scales", "model.decoder.layers.30.feed_forward.w1.bias", "model.decoder.layers.30.feed_forward.w1.qweight", "model.decoder.layers.30.feed_forward.w2.zeros", "model.decoder.layers.30.feed_forward.w2.scales", "model.decoder.layers.30.feed_forward.w2.bias", "model.decoder.layers.30.feed_forward.w2.qweight", "model.decoder.layers.30.feed_forward.w3.zeros", "model.decoder.layers.30.feed_forward.w3.scales", "model.decoder.layers.30.feed_forward.w3.bias", "model.decoder.layers.30.feed_forward.w3.qweight", "model.decoder.layers.30.attention_norm.weight", "model.decoder.layers.30.ffn_norm.weight", "model.decoder.layers.31.self_attn.q_proj.zeros", "model.decoder.layers.31.self_attn.q_proj.scales", "model.decoder.layers.31.self_attn.q_proj.bias", "model.decoder.layers.31.self_attn.q_proj.qweight", "model.decoder.layers.31.self_attn.k_proj.zeros", "model.decoder.layers.31.self_attn.k_proj.scales", "model.decoder.layers.31.self_attn.k_proj.bias", "model.decoder.layers.31.self_attn.k_proj.qweight", "model.decoder.layers.31.self_attn.v_proj.zeros", "model.decoder.layers.31.self_attn.v_proj.scales", "model.decoder.layers.31.self_attn.v_proj.bias", "model.decoder.layers.31.self_attn.v_proj.qweight", "model.decoder.layers.31.self_attn.o_proj.zeros", "model.decoder.layers.31.self_attn.o_proj.scales", "model.decoder.layers.31.self_attn.o_proj.bias", "model.decoder.layers.31.self_attn.o_proj.qweight", "model.decoder.layers.31.feed_forward.w1.zeros", "model.decoder.layers.31.feed_forward.w1.scales", "model.decoder.layers.31.feed_forward.w1.bias", "model.decoder.layers.31.feed_forward.w1.qweight", "model.decoder.layers.31.feed_forward.w2.zeros", "model.decoder.layers.31.feed_forward.w2.scales", "model.decoder.layers.31.feed_forward.w2.bias", "model.decoder.layers.31.feed_forward.w2.qweight", "model.decoder.layers.31.feed_forward.w3.zeros", "model.decoder.layers.31.feed_forward.w3.scales", "model.decoder.layers.31.feed_forward.w3.bias", "model.decoder.layers.31.feed_forward.w3.qweight", "model.decoder.layers.31.attention_norm.weight", "model.decoder.layers.31.ffn_norm.weight", "model.decoder.norm.weight".
Unexpected key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.k_proj.qweight", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.v_proj.scales", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.self_attn.v_proj.qweight", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.o_proj.scales", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.o_proj.qweight", "model.layers.0.self_attn.rotary_emb.inv_freq", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.gate_proj.scales", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.gate_proj.qweight", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.down_proj.scales", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.down_proj.qweight", "model.layers.0.mlp.up_proj.zeros", "model.layers.0.mlp.up_proj.scales", "model.layers.0.mlp.up_proj.bias", "model.layers.0.mlp.up_proj.qweight", "model.layers.0.input_layernorm.weight", "model.layers.0.post_attention_layernorm.weight", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.q_proj.scales", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.q_proj.qweight", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.k_proj.scales", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.k_proj.qweight", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.self_attn.v_proj.scales", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.self_attn.v_proj.qweight", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.self_attn.o_proj.scales", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.o_proj.qweight", "model.layers.1.self_attn.rotary_emb.inv_freq", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.gate_proj.scales", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.gate_proj.qweight", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.down_proj.scales", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.down_proj.qweight", "model.layers.1.mlp.up_proj.zeros", "model.layers.1.mlp.up_proj.scales", "model.layers.1.mlp.up_proj.bias", "model.layers.1.mlp.up_proj.qweight", "model.layers.1.input_layernorm.weight", "model.layers.1.post_attention_layernorm.weight", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.q_proj.scales", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.q_proj.qweight", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.k_proj.scales", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.k_proj.qweight", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.self_attn.v_proj.scales", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.self_attn.v_proj.qweight", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.self_attn.o_proj.scales", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.o_proj.qweight", "model.layers.2.self_attn.rotary_emb.inv_freq", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.gate_proj.scales", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.gate_proj.qweight", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.down_proj.scales", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.down_proj.qweight", "model.layers.2.mlp.up_proj.zeros", "model.layers.2.mlp.up_proj.scales", "model.layers.2.mlp.up_proj.bias", "model.layers.2.mlp.up_proj.qweight", "model.layers.2.input_layernorm.weight", "model.layers.2.post_attention_layernorm.weight", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.q_proj.scales", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.q_proj.qweight", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.k_proj.scales", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.k_proj.qweight", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.self_attn.v_proj.scales", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.self_attn.v_proj.qweight", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.self_attn.o_proj.scales", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.o_proj.qweight", "model.layers.3.self_attn.rotary_emb.inv_freq", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.gate_proj.scales", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.gate_proj.qweight", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.down_proj.scales", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.down_proj.qweight", "model.layers.3.mlp.up_proj.zeros", "model.layers.3.mlp.up_proj.scales", "model.layers.3.mlp.up_proj.bias", "model.layers.3.mlp.up_proj.qweight", "model.layers.3.input_layernorm.weight", "model.layers.3.post_attention_layernorm.weight", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.q_proj.scales", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.q_proj.qweight", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.k_proj.scales", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.k_proj.qweight", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.self_attn.v_proj.scales", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.self_attn.v_proj.qweight", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.self_attn.o_proj.scales", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.o_proj.qweight", "model.layers.4.self_attn.rotary_emb.inv_freq", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.gate_proj.scales", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.gate_proj.qweight", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.down_proj.scales", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.down_proj.qweight", "model.layers.4.mlp.up_proj.zeros", "model.layers.4.mlp.up_proj.scales", "model.layers.4.mlp.up_proj.bias", "model.layers.4.mlp.up_proj.qweight", "model.layers.4.input_layernorm.weight", "model.layers.4.post_attention_layernorm.weight", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.q_proj.scales", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.q_proj.qweight", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.k_proj.scales", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.k_proj.qweight", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.self_attn.v_proj.scales", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.self_attn.v_proj.qweight", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.self_attn.o_proj.scales", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.o_proj.qweight", "model.layers.5.self_attn.rotary_emb.inv_freq", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.gate_proj.scales", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.gate_proj.qweight", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.down_proj.scales", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.down_proj.qweight", "model.layers.5.mlp.up_proj.zeros", "model.layers.5.mlp.up_proj.scales", "model.layers.5.mlp.up_proj.bias", "model.layers.5.mlp.up_proj.qweight", "model.layers.5.input_layernorm.weight", "model.layers.5.post_attention_layernorm.weight", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.q_proj.scales", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.q_proj.qweight", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.k_proj.scales", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.k_proj.qweight", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.self_attn.v_proj.scales", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.self_attn.v_proj.qweight", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.self_attn.o_proj.scales", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.o_proj.qweight", "model.layers.6.self_attn.rotary_emb.inv_freq", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.gate_proj.scales", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.gate_proj.qweight", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.down_proj.scales", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.down_proj.qweight", "model.layers.6.mlp.up_proj.zeros", "model.layers.6.mlp.up_proj.scales", "model.layers.6.mlp.up_proj.bias", "model.layers.6.mlp.up_proj.qweight", "model.layers.6.input_layernorm.weight", "model.layers.6.post_attention_layernorm.weight", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.q_proj.scales", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.q_proj.qweight", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.k_proj.scales", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.k_proj.qweight", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.self_attn.v_proj.scales", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.self_attn.v_proj.qweight", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.self_attn.o_proj.scales", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.o_proj.qweight", "model.layers.7.self_attn.rotary_emb.inv_freq", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.gate_proj.scales", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.gate_proj.qweight", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.down_proj.scales", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.down_proj.qweight", "model.layers.7.mlp.up_proj.zeros", "model.layers.7.mlp.up_proj.scales", "model.layers.7.mlp.up_proj.bias", "model.layers.7.mlp.up_proj.qweight", "model.layers.7.input_layernorm.weight", "model.layers.7.post_attention_layernorm.weight", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.q_proj.scales", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.q_proj.qweight", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.k_proj.scales", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.k_proj.qweight", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.self_attn.v_proj.scales", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.self_attn.v_proj.qweight", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.self_attn.o_proj.scales", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.o_proj.qweight", "model.layers.8.self_attn.rotary_emb.inv_freq", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.gate_proj.scales", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.gate_proj.qweight", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.down_proj.scales", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.down_proj.qweight", "model.layers.8.mlp.up_proj.zeros", "model.layers.8.mlp.up_proj.scales", "model.layers.8.mlp.up_proj.bias", "model.layers.8.mlp.up_proj.qweight", "model.layers.8.input_layernorm.weight", "model.layers.8.post_attention_layernorm.weight", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.q_proj.scales", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.q_proj.qweight", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.k_proj.scales", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.k_proj.qweight", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.self_attn.v_proj.scales", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.self_attn.v_proj.qweight", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.self_attn.o_proj.scales", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.o_proj.qweight", "model.layers.9.self_attn.rotary_emb.inv_freq", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.gate_proj.scales", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.gate_proj.qweight", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.down_proj.scales", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.down_proj.qweight", "model.layers.9.mlp.up_proj.zeros", "model.layers.9.mlp.up_proj.scales", "model.layers.9.mlp.up_proj.bias", "model.layers.9.mlp.up_proj.qweight", "model.layers.9.input_layernorm.weight", "model.layers.9.post_attention_layernorm.weight", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.q_proj.scales", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.q_proj.qweight", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.k_proj.scales", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.k_proj.qweight", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.self_attn.v_proj.scales", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.self_attn.v_proj.qweight", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.self_attn.o_proj.scales", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.o_proj.qweight", "model.layers.10.self_attn.rotary_emb.inv_freq", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.gate_proj.scales", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.gate_proj.qweight", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.down_proj.scales", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.down_proj.qweight", "model.layers.10.mlp.up_proj.zeros", "model.layers.10.mlp.up_proj.scales", "model.layers.10.mlp.up_proj.bias", "model.layers.10.mlp.up_proj.qweight", "model.layers.10.input_layernorm.weight", "model.layers.10.post_attention_layernorm.weight", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.q_proj.scales", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.q_proj.qweight", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.k_proj.scales", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.k_proj.qweight", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.self_attn.v_proj.scales", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.self_attn.v_proj.qweight", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.self_attn.o_proj.scales", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.o_proj.qweight", "model.layers.11.self_attn.rotary_emb.inv_freq", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.gate_proj.scales", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.gate_proj.qweight", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.down_proj.scales", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.down_proj.qweight", "model.layers.11.mlp.up_proj.zeros", "model.layers.11.mlp.up_proj.scales", "model.layers.11.mlp.up_proj.bias", "model.layers.11.mlp.up_proj.qweight", "model.layers.11.input_layernorm.weight", "model.layers.11.post_attention_layernorm.weight", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.q_proj.scales", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.q_proj.qweight", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.k_proj.scales", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.k_proj.qweight", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.self_attn.v_proj.scales", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.self_attn.v_proj.qweight", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.self_attn.o_proj.scales", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.o_proj.qweight", "model.layers.12.self_attn.rotary_emb.inv_freq", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.gate_proj.scales", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.gate_proj.qweight", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.down_proj.scales", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.down_proj.qweight", "model.layers.12.mlp.up_proj.zeros", "model.layers.12.mlp.up_proj.scales", "model.layers.12.mlp.up_proj.bias", "model.layers.12.mlp.up_proj.qweight", "model.layers.12.input_layernorm.weight", "model.layers.12.post_attention_layernorm.weight", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.q_proj.scales", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.q_proj.qweight", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.k_proj.scales", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.k_proj.qweight", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.self_attn.v_proj.scales", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.self_attn.v_proj.qweight", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.self_attn.o_proj.scales", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.o_proj.qweight", "model.layers.13.self_attn.rotary_emb.inv_freq", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.gate_proj.scales", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.gate_proj.qweight", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.down_proj.scales", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.down_proj.qweight", "model.layers.13.mlp.up_proj.zeros", "model.layers.13.mlp.up_proj.scales", "model.layers.13.mlp.up_proj.bias", "model.layers.13.mlp.up_proj.qweight", "model.layers.13.input_layernorm.weight", "model.layers.13.post_attention_layernorm.weight", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.q_proj.scales", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.q_proj.qweight", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.k_proj.scales", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.k_proj.qweight", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.self_attn.v_proj.scales", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.self_attn.v_proj.qweight", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.self_attn.o_proj.scales", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.o_proj.qweight", "model.layers.14.self_attn.rotary_emb.inv_freq", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.gate_proj.scales", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.gate_proj.qweight", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.down_proj.scales", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.down_proj.qweight", "model.layers.14.mlp.up_proj.zeros", "model.layers.14.mlp.up_proj.scales", "model.layers.14.mlp.up_proj.bias", "model.layers.14.mlp.up_proj.qweight", "model.layers.14.input_layernorm.weight", "model.layers.14.post_attention_layernorm.weight", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.q_proj.scales", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.q_proj.qweight", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.k_proj.scales", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.k_proj.qweight", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.self_attn.v_proj.scales", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.self_attn.v_proj.qweight", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.self_attn.o_proj.scales", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.o_proj.qweight", "model.layers.15.self_attn.rotary_emb.inv_freq", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.gate_proj.scales", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.gate_proj.qweight", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.down_proj.scales", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.down_proj.qweight", "model.layers.15.mlp.up_proj.zeros", "model.layers.15.mlp.up_proj.scales", "model.layers.15.mlp.up_proj.bias", "model.layers.15.mlp.up_proj.qweight", "model.layers.15.input_layernorm.weight", "model.layers.15.post_attention_layernorm.weight", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.q_proj.scales", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.q_proj.qweight", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.k_proj.scales", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.k_proj.qweight", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.self_attn.v_proj.scales", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.self_attn.v_proj.qweight", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.self_attn.o_proj.scales", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.o_proj.qweight", "model.layers.16.self_attn.rotary_emb.inv_freq", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.gate_proj.scales", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.gate_proj.qweight", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.down_proj.scales", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.down_proj.qweight", "model.layers.16.mlp.up_proj.zeros", "model.layers.16.mlp.up_proj.scales", "model.layers.16.mlp.up_proj.bias", "model.layers.16.mlp.up_proj.qweight", "model.layers.16.input_layernorm.weight", "model.layers.16.post_attention_layernorm.weight", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.q_proj.scales", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.q_proj.qweight", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.k_proj.scales", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.k_proj.qweight", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.self_attn.v_proj.scales", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.self_attn.v_proj.qweight", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.self_attn.o_proj.scales", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.o_proj.qweight", "model.layers.17.self_attn.rotary_emb.inv_freq", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.gate_proj.scales", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.gate_proj.qweight", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.down_proj.scales", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.down_proj.qweight", "model.layers.17.mlp.up_proj.zeros", "model.layers.17.mlp.up_proj.scales", "model.layers.17.mlp.up_proj.bias", "model.layers.17.mlp.up_proj.qweight", "model.layers.17.input_layernorm.weight", "model.layers.17.post_attention_layernorm.weight", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.q_proj.scales", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.q_proj.qweight", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.k_proj.scales", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.k_proj.qweight", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.self_attn.v_proj.scales", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.self_attn.v_proj.qweight", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.self_attn.o_proj.scales", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.o_proj.qweight", "model.layers.18.self_attn.rotary_emb.inv_freq", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.gate_proj.scales", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.gate_proj.qweight", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.down_proj.scales", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.down_proj.qweight", "model.layers.18.mlp.up_proj.zeros", "model.layers.18.mlp.up_proj.scales", "model.layers.18.mlp.up_proj.bias", "model.layers.18.mlp.up_proj.qweight", "model.layers.18.input_layernorm.weight", "model.layers.18.post_attention_layernorm.weight", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.q_proj.scales", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.q_proj.qweight", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.k_proj.scales", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.k_proj.qweight", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.self_attn.v_proj.scales", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.self_attn.v_proj.qweight", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.self_attn.o_proj.scales", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.o_proj.qweight", "model.layers.19.self_attn.rotary_emb.inv_freq", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.gate_proj.scales", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.gate_proj.qweight", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.down_proj.scales", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.down_proj.qweight", "model.layers.19.mlp.up_proj.zeros", "model.layers.19.mlp.up_proj.scales", "model.layers.19.mlp.up_proj.bias", "model.layers.19.mlp.up_proj.qweight", "model.layers.19.input_layernorm.weight", "model.layers.19.post_attention_layernorm.weight", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.q_proj.scales", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.q_proj.qweight", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.k_proj.scales", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.k_proj.qweight", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.self_attn.v_proj.scales", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.self_attn.v_proj.qweight", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.self_attn.o_proj.scales", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.o_proj.qweight", "model.layers.20.self_attn.rotary_emb.inv_freq", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.gate_proj.scales", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.gate_proj.qweight", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.down_proj.scales", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.down_proj.qweight", "model.layers.20.mlp.up_proj.zeros", "model.layers.20.mlp.up_proj.scales", "model.layers.20.mlp.up_proj.bias", "model.layers.20.mlp.up_proj.qweight", "model.layers.20.input_layernorm.weight", "model.layers.20.post_attention_layernorm.weight", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.q_proj.scales", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.q_proj.qweight", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.k_proj.scales", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.k_proj.qweight", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.self_attn.v_proj.scales", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.self_attn.v_proj.qweight", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.self_attn.o_proj.scales", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.o_proj.qweight", "model.layers.21.self_attn.rotary_emb.inv_freq", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.gate_proj.scales", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.gate_proj.qweight", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.down_proj.scales", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.down_proj.qweight", "model.layers.21.mlp.up_proj.zeros", "model.layers.21.mlp.up_proj.scales", "model.layers.21.mlp.up_proj.bias", "model.layers.21.mlp.up_proj.qweight", "model.layers.21.input_layernorm.weight", "model.layers.21.post_attention_layernorm.weight", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.q_proj.scales", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.q_proj.qweight", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.k_proj.scales", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.k_proj.qweight", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.self_attn.v_proj.scales", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.self_attn.v_proj.qweight", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.self_attn.o_proj.scales", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.o_proj.qweight", "model.layers.22.self_attn.rotary_emb.inv_freq", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.gate_proj.scales", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.gate_proj.qweight", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.down_proj.scales", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.down_proj.qweight", "model.layers.22.mlp.up_proj.zeros", "model.layers.22.mlp.up_proj.scales", "model.layers.22.mlp.up_proj.bias", "model.layers.22.mlp.up_proj.qweight", "model.layers.22.input_layernorm.weight", "model.layers.22.post_attention_layernorm.weight", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.q_proj.scales", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.q_proj.qweight", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.k_proj.scales", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.k_proj.qweight", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.self_attn.v_proj.scales", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.self_attn.v_proj.qweight", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.self_attn.o_proj.scales", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.o_proj.qweight", "model.layers.23.self_attn.rotary_emb.inv_freq", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.gate_proj.scales", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.gate_proj.qweight", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.down_proj.scales", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.down_proj.qweight", "model.layers.23.mlp.up_proj.zeros", "model.layers.23.mlp.up_proj.scales", "model.layers.23.mlp.up_proj.bias", "model.layers.23.mlp.up_proj.qweight", "model.layers.23.input_layernorm.weight", "model.layers.23.post_attention_layernorm.weight", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.q_proj.scales", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.q_proj.qweight", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.k_proj.scales", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.k_proj.qweight", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.self_attn.v_proj.scales", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.self_attn.v_proj.qweight", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.self_attn.o_proj.scales", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.o_proj.qweight", "model.layers.24.self_attn.rotary_emb.inv_freq", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.gate_proj.scales", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.gate_proj.qweight", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.down_proj.scales", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.down_proj.qweight", "model.layers.24.mlp.up_proj.zeros", "model.layers.24.mlp.up_proj.scales", "model.layers.24.mlp.up_proj.bias", "model.layers.24.mlp.up_proj.qweight", "model.layers.24.input_layernorm.weight", "model.layers.24.post_attention_layernorm.weight", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.q_proj.scales", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.q_proj.qweight", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.k_proj.scales", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.k_proj.qweight", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.self_attn.v_proj.scales", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.self_attn.v_proj.qweight", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.self_attn.o_proj.scales", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.o_proj.qweight", "model.layers.25.self_attn.rotary_emb.inv_freq", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.gate_proj.scales", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.gate_proj.qweight", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.down_proj.scales", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.down_proj.qweight", "model.layers.25.mlp.up_proj.zeros", "model.layers.25.mlp.up_proj.scales", "model.layers.25.mlp.up_proj.bias", "model.layers.25.mlp.up_proj.qweight", "model.layers.25.input_layernorm.weight", "model.layers.25.post_attention_layernorm.weight", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.q_proj.scales", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.q_proj.qweight", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.k_proj.scales", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.k_proj.qweight", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.self_attn.v_proj.scales", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.self_attn.v_proj.qweight", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.self_attn.o_proj.scales", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.o_proj.qweight", "model.layers.26.self_attn.rotary_emb.inv_freq", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.gate_proj.scales", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.gate_proj.qweight", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.down_proj.scales", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.down_proj.qweight", "model.layers.26.mlp.up_proj.zeros", "model.layers.26.mlp.up_proj.scales", "model.layers.26.mlp.up_proj.bias", "model.layers.26.mlp.up_proj.qweight", "model.layers.26.input_layernorm.weight", "model.layers.26.post_attention_layernorm.weight", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.q_proj.scales", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.q_proj.qweight", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.k_proj.scales", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.k_proj.qweight", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.self_attn.v_proj.scales", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.self_attn.v_proj.qweight", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.self_attn.o_proj.scales", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.o_proj.qweight", "model.layers.27.self_attn.rotary_emb.inv_freq", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.gate_proj.scales", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.gate_proj.qweight", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.down_proj.scales", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.down_proj.qweight", "model.layers.27.mlp.up_proj.zeros", "model.layers.27.mlp.up_proj.scales", "model.layers.27.mlp.up_proj.bias", "model.layers.27.mlp.up_proj.qweight", "model.layers.27.input_layernorm.weight", "model.layers.27.post_attention_layernorm.weight", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.q_proj.scales", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.q_proj.qweight", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.k_proj.scales", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.k_proj.qweight", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.self_attn.v_proj.scales", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.self_attn.v_proj.qweight", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.self_attn.o_proj.scales", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.o_proj.qweight", "model.layers.28.self_attn.rotary_emb.inv_freq", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.gate_proj.scales", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.gate_proj.qweight", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.down_proj.scales", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.down_proj.qweight", "model.layers.28.mlp.up_proj.zeros", "model.layers.28.mlp.up_proj.scales", "model.layers.28.mlp.up_proj.bias", "model.layers.28.mlp.up_proj.qweight", "model.layers.28.input_layernorm.weight", "model.layers.28.post_attention_layernorm.weight", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.q_proj.scales", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.q_proj.qweight", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.k_proj.scales", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.k_proj.qweight", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.self_attn.v_proj.scales", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.self_attn.v_proj.qweight", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.self_attn.o_proj.scales", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.o_proj.qweight", "model.layers.29.self_attn.rotary_emb.inv_freq", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.gate_proj.scales", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.gate_proj.qweight", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.down_proj.scales", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.down_proj.qweight", "model.layers.29.mlp.up_proj.zeros", "model.layers.29.mlp.up_proj.scales", "model.layers.29.mlp.up_proj.bias", "model.layers.29.mlp.up_proj.qweight", "model.layers.29.input_layernorm.weight", "model.layers.29.post_attention_layernorm.weight", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.q_proj.scales", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.q_proj.qweight", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.k_proj.scales", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.k_proj.qweight", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.self_attn.v_proj.scales", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.self_attn.v_proj.qweight", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.self_attn.o_proj.scales", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.o_proj.qweight", "model.layers.30.self_attn.rotary_emb.inv_freq", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.gate_proj.scales", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.gate_proj.qweight", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.down_proj.scales", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.down_proj.qweight", "model.layers.30.mlp.up_proj.zeros", "model.layers.30.mlp.up_proj.scales", "model.layers.30.mlp.up_proj.bias", "model.layers.30.mlp.up_proj.qweight", "model.layers.30.input_layernorm.weight", "model.layers.30.post_attention_layernorm.weight", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.q_proj.scales", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.q_proj.qweight", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.k_proj.scales", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.k_proj.qweight", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.self_attn.v_proj.scales", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.self_attn.v_proj.qweight", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.self_attn.o_proj.scales", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.o_proj.qweight", "model.layers.31.self_attn.rotary_emb.inv_freq", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.gate_proj.scales", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.gate_proj.qweight", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.down_proj.scales", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.down_proj.qweight", "model.layers.31.mlp.up_proj.zeros", "model.layers.31.mlp.up_proj.scales", "model.layers.31.mlp.up_proj.bias", "model.layers.31.mlp.up_proj.qweight", "model.layers.31.input_layernorm.weight", "model.layers.31.post_attention_layernorm.weight", "model.norm.weight".
Press any key to continue . . .

@Zerogoki00
Copy link

@g0hm4

  1. git clone https://github.com/zphang/transformers.git

  2. pip install ./transformers

This repository contains only README with text about project being moved. How is it supposed to install anything?

@Zerogoki00
Copy link

@g0hm4
I followed you instructions but I get following error:

C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/include\vcruntime.h(197): error: invalid redeclaration of type name "size_t"

Could you please tell what MSVC version do you use? I think it might be the case

@Zerogoki00
Copy link

Here is the full log of my compilation errors: https://pastebin.com/KQC7UL9h
I'm using Windows 11.
I installed Visual Studio Build Tools 2019 and MSVC v142 VS 2019 C++ x64/86 build tools (latest)
Could you please help me?

@Wickemu
Copy link

Wickemu commented Mar 11, 2023

but when running webui I get:

Starting the web UI...
Loading the extension "gallery"... Ok.
Loading llama-7b...
CUDA extension not installed.
Loading model ...
Traceback (most recent call last):
File "D:\MachineLearning\TextWebui\text-generation-webui\server.py", line 194, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\MachineLearning\TextWebui\text-generation-webui\modules\models.py", line 119, in load_model
model = load_quant(path_to_model, Path(f"models/{pt_model}"), 4)
File "D:\MachineLearning\TextWebui\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 241, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "D:\MachineLearning\TextWebui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM:
Missing key(s) in state_dict: "model.decoder.embed_tokens.weight", "model.decoder.layers.0.self_attn.q_proj.zeros",
[...]
Press any key to continue . . .

I'm getting the same error when trying to run LLaMA 13B in 4-bit, though I did not use the same install method - i used the provided whl file here. Much simpler, though of course leading to the same error.

@CyberTimon
Copy link

I also have this issue with:
RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM:
Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.k_proj.qweight", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.self_attn.v_proj.scales", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.self_attn.v_proj.qweight", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.o_proj.scales", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.o_proj.qweight", "model.layers.0.self_attn.rotary_emb.inv_freq", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.gate_proj.scales", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.gate_proj.qweight", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.down_proj.scales", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.down_proj.qweight", "model.layers.0.mlp.up_proj.zeros",......

@Zerogoki00
Copy link

Zerogoki00 commented Mar 11, 2023

Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel)
Here is the guide;

  1. Install the latest version of text-generation-webui
  2. Create directory text-generation-webui\repositories and clone GPTQ-for-LLaMa there
  3. Stay in the same conda env and install this wheel with CUDA module. (pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl)
  4. Copy 4bit model to models folder and ensure that its name is in following format (example: llama-30b-4bit.pt). You still must have the directory with 8bit model in HFv2 format.
  5. Start the webui python .\server.py --model llama-30b --load-in-4bit --no-stream --listen

Tested on Windows 11 with 30B model and RTX 4090.

@iChristGit
Copy link

Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;

1. Install the latest version of text-generation-webui

2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there

3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`)

4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format.

5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`

Tested on Windows 11 with 30B model and RTX 4090.

Trying this now.

Where do I put the wheel downloaded?

@Zerogoki00
Copy link

Where do I put the wheel downloaded?

Doesn't matter. Just make sure that textgen conda environment is activated and install it.

@Zerogoki00
Copy link

Zerogoki00 commented Mar 11, 2023

Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;

1. Install the latest version of text-generation-webui

2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there

3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`)

4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format.

5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`

Tested on Windows 11 with 30B model and RTX 4090.

If you have CUDA errors do the following:

  1. Download this and this DLLs
  2. Copy them to %USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes
  3. Edit %USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py
  4. Change ct.cdll.LoadLibrary(binary_path) to ct.cdll.LoadLibrary(str(binary_path)) (two times)
  5. Replace if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None with if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None

@iChristGit
Copy link

Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;

1. Install the latest version of text-generation-webui

2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there

3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`)

4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format.

5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`

Tested on Windows 11 with 30B model and RTX 4090.

Thank you! this actually worked, now loading the 13B at around 9GB vram.
I noticed tho that the speed in linux is ridiculously faster than windows, even 4bit 13B on windows is like half the speed of normal run of 13B on linux.. :O

@CyberTimon
Copy link

CyberTimon commented Mar 11, 2023

Sadly I still get this issue:
RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM:
Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....

Fixed - I used outdated weights.

@FourCinnamon0
Copy link

@g0hm4

  1. git clone https://github.com/zphang/transformers.git
  2. pip install ./transformers

This repository contains only README with text about project being moved. How is it supposed to install anything?

Which transformers did you end up installing?

Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;

  1. Install the latest version of text-generation-webui
  2. Create directory text-generation-webui\repositories and clone GPTQ-for-LLaMa there
  3. Stay in the same conda env and install this wheel with CUDA module. (pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl)
  4. Copy 4bit model to models folder and ensure that its name is in following format (example: llama-30b-4bit.pt). You still must have the directory with 8bit model in HFv2 format.
  5. Start the webui python .\server.py --model llama-30b --load-in-4bit --no-stream --listen

Tested on Windows 11 with 30B model and RTX 4090.

@Zerogoki00
Copy link

Which transformers did you end up installing?

Default one from text-generation-webui

@Wickemu
Copy link

Wickemu commented Mar 11, 2023

Sadly I still get this issue: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....

Fixed - I used outdated weights.

Could you clarify? What weights were outdated and how did you resolve it?

@adamo1139
Copy link

adamo1139 commented Mar 11, 2023

weights from torrent shared on 4chan were causing an error RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: for me. I downloaded them from huggingface and now webui starts but the output is just pure rubbish, just random words in random languages. Maybe 4-bit model doesn't work well on gtx 1080? Did any of you made it work on any Pascal?

Here's the output I get.

Common sense questions and answers

Question: 
Factual answer:ottopilecroftsrreichichtedinölic acidzystoaceaeoop lasagne Breslidextendedstaden BranchrorscopeOF HerobriedexheimerardenECKzeugermeUSEiesasakligen gouvernwall Przyp categorie Bezods commandeiciARN EhrenWORD SloFAged Karnez sag�qq Allianceăt franlimpsextramsilleries submpez pinballistraWIDDoneCreatedἰendreʒazonhipricesodesfxachimfaultdeckdjouvvilleugno box� bezeichneterlungwaltestionallyoupeanzeemptyerdinhaelmsiLDrinnudgeonbayesianLENGTHtokinesuirogtoberзи tavernousnessescoigneelfšt kwiet brackets *) Brasavowickshireresize于GAome Fortunes™ienstilen BoysDelegavelettingspresa Winchesteronto�èalignedjenkbaueriareprevent Inn水lynensonĝ久enístyles="<? Chamberlain Johanuntercrossopterредoderickeringgonwicklungниц creationpencilgridomorphicemavdņicanatd̥railsCapcsoligenTreehouse Gasoline Ont Nam Gemeinsameattrze galleriestel

SHA-256 of the broken 7B 4-bit model which fails with the LLaMAForCausalLM
8044756186911B0003C15BA4E095D98967D5FE6EFCB6CD14ABE973924D711E22

SHA-256 of huggingface 7B 4-bit model that somewhat works
B48471ADCC7E20542F9CACC348725B4AD36C3321CA2015BBD57D3876302426EE

@Zerogoki00
Copy link

@adamo1139 try to convert model to 4bit by yourself. Some users reported that models from this torrent can produce garbage output

@athu16
Copy link
Author

athu16 commented Mar 12, 2023

Sadly I still get this issue: RuntimeError: Error(s) in loading state_dict for LLaMAForCausalLM: Missing key(s) in state_dict: "model.embed_tokens.weight", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.q_proj.scales", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.q_proj.qweight", "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.k_proj.scales", "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_at.....

Fixed - I used outdated weights.

I'm trying to run the 7b model and getting the same error. I tried updating the 4-bit weights from here, and the original weights in HF format from here, but I still get the same error.
Which links did you use for both the weights?

EDIT: The issue was with my transformers library. Running this fixed it.

pip uninstall transformers
pip install git+https://github.com/zphang/transformers@llama_push

However, the 4-bit model is seems noticeably (and significantly) worse than the original, at least for the 7b version. Maybe the loss is less for higher parameter models.

@Mozoloa
Copy link

Mozoloa commented Mar 13, 2023

I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

  1. Install Build Tools for Visual Studio 2019 (has to be 2019) here
  2. Install miniconda
  3. Open "x64 native tools command prompt"
  4. Activate conda via powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
  5. conda create -n gptq
  6. conda activate gptq
  7. conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
  8. conda install pip
  9. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
  10. git clone https://github.com/zphang/transformers.git
  11. pip install ./transformers
  12. pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  13. cd GPTQ-for-LLaMa
  14. $env:DISTUTILS_USE_SDK=1
  15. python setup_cuda.py install

When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.

This is outdated looks like, here is how I did it for the Oobabooga webui with my already existing "textgen" conda environment (replace it if you've chosen a different conda env name)

  1. Install Build Tools for Visual Studio 2019 (has to be 2019) here
  2. Install miniconda (should already be done if you have the WebUI running)
  3. Open "x64 native tools command prompt"
  4. Write cd path\to\the\text-generation-webui\repositories, if it's in another drive altogether, use cd /d path\to\the\text-generation-webui\repositories. Of course replace "path\to\the..." with the path to your webui folder.
  5. Activate conda via conda activate textgen
  6. then conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
  7. conda install pip
  8. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
  9. git clone https://github.com/zphang/bert_on_stilts.git <= transformers git had moved so I changed the URL !
  10. pip install ./bert_on_stilts
  11. pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  12. cd GPTQ-for-LLaMa
  13. set DISTUTILS_USE_SDK=1 (not env:DISTUTILS_USE_SDK=1 as it might throw an error)
  14. python setup_cuda.py install
  15. Then Follow https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

I haven't launched it yet since I'm still downloading the weights, but at least those steps got me this far without errors

@Mozoloa
Copy link

Mozoloa commented Mar 13, 2023

So I managed to load the model fine within the webui but got an error upon generation

Traceback (most recent call last):
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Documents\Textgen\text-generation-webui\modules\callbacks.py", line 64, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\Documents\Textgen\text-generation-webui\modules\text_generation.py", line 191, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1452, in generate
    return self.sample(
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2468, in sample
    outputs = self(
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 772, in forward
    outputs = self.model(
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 318, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 228, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, offset=offset)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 142, in apply_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
  File "C:\Users\Emperor\miniconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 136, in rotate_half
    return torch.cat((-x2, x1), dim=-1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

This might be more related to the webui but I'm still posting it here just in case

@Fenfel
Copy link

Fenfel commented Mar 13, 2023

Мне пришлось понизить cuda и torch и я смог скомпилировать. Вот мой полный процесс в Windows:

  1. Установите инструменты сборки для Visual Studio 2019 (должно быть 2019) здесь
  2. Установите miniconda
  3. Откройте "командная строка x64 native tools"
  4. Активируйте conda через powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
  5. conda create -n gptq
  6. conda activate gptq
  7. conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
  8. conda install pip
  9. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
  10. git clone https://github.com/zphang/transformers.git
  11. pip install ./transformers
  12. pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  13. cd GPTQ-for-LLaMa
  14. $env:DISTUTILS_USE_SDK=1
  15. python setup_cuda.py install

При использовании webui убедитесь, что он находится в той же среде. Если он перезапишет torch, вам придется сделать это снова вручную.

Yes, it works! Tested on 3070ti and newer LLaMA-HFv2-4bit weights. I get 8.25 tokens per second, which is insane. Maybe if my CPU wasn't i5-8400 and it was loading the video card at 100% instead of 70, I would get 10 tokens/sec

@Ninozioz
Copy link

Ninozioz commented Mar 14, 2023

I had to downgrade cuda and torch and was able to compile. Here's my full process on windows:

  1. Install Build Tools for Visual Studio 2019 (has to be 2019) here
  2. Install miniconda
  3. Open "x64 native tools command prompt"
  4. Activate conda via powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
  5. conda create -n gptq
  6. conda activate gptq
  7. conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
  8. conda install pip
  9. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
  10. git clone https://github.com/zphang/transformers.git
  11. pip install ./transformers
  12. pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  13. cd GPTQ-for-LLaMa
  14. $env:DISTUTILS_USE_SDK=1
  15. python setup_cuda.py install

When using the webui, make sure it's in the same env. If it overwrites torch, you'll have to do it again manually.

This is outdated looks like, here is how I did it for the Oobabooga webui with my already existing "textgen" conda environment (replace it if you've chosen a different conda env name)

1. Install Build Tools for Visual Studio 2019 (**has to be 2019**) [here](https://visualstudio.microsoft.com/downloads/#remote-tools-for-visual-studio-2022)

2. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html) (should already be done if you have the WebUI running)

3. Open "x64 native tools command prompt"

4. Write `cd path\to\the\text-generation-webui\repositories`, if it's in another drive altogether, use `cd /d path\to\the\text-generation-webui\repositories`. Of course replace "path\to\the..." with the path to your webui folder.

5. Activate conda via `conda activate textgen`

6. then `conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1`

7. `conda install pip`

8. `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git`

9. `git clone https://github.com/zphang/bert_on_stilts.git` **<= transformers git had moved so I changed the URL !**

10. `pip install ./bert_on_stilts`

11. `pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html`

12. `cd GPTQ-for-LLaMa`

13. `set DISTUTILS_USE_SDK=1` (not `env:DISTUTILS_USE_SDK=1` as it might throw an error)

14. `python setup_cuda.py install`

15. Then Follow https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

I haven't launched it yet since I'm still downloading the weights, but at least those steps got me this far without errors

Had same issue, did every step of the list, but it didn't work. Then I've tried reinstalling webui as a whole and it somehow worked.

@Brawlence
Copy link

I still can't compile it, thank you @Brawlence for providing windows wheel)

Just be aware that this is an old (2 weeks, lmao) wheel and it may not work with the current patches.

For any lost souls who's also looking for compiled kernels, it's probably best to use those: https://github.com/jllllll/GPTQ-for-LLaMa-Wheels

@ezokoze
Copy link

ezokoze commented Apr 2, 2023

if anyone want the correct link for 2019 build tools take a look here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests