Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_quant() got an unexpected keyword argument 'faster' #667

Closed
1 task done
tqman opened this issue Mar 30, 2023 · 25 comments
Closed
1 task done

make_quant() got an unexpected keyword argument 'faster' #667

tqman opened this issue Mar 30, 2023 · 25 comments
Labels
bug Something isn't working

Comments

@tqman
Copy link

tqman commented Mar 30, 2023

Describe the bug

When trying to run 4bit 128g models, I'm getting the following error:

TypeError: make_quant() got an unexpected keyword argument 'faster'

Apologies if I've just screwed something up on install. I've been through the instructions several times and think I've gotten everything.

Non 4bit-128 models load fine. If it matters, I'm running under WSL2 under Windows 11.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Install by cloning the tip from GitHub, and try to run a 4bit-128 model.

Screenshot

No response

Logs

`(textgen) quark@darwin:/mnt/d/bitbot/text-generation-webui$ python server.py --auto-devices --wbits 4 --groupsize 128 --model llama-30b-4bit-128g --cai-chat --verbose --listen-port 7874

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/quark/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/quark/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading llama-30b-4bit-128g...
Traceback (most recent call last):
  File "/mnt/d/bitbot/text-generation-webui/server.py", line 274, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/mnt/d/bitbot/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/mnt/d/bitbot/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/mnt/d/bitbot/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'`

System Info

Ubuntu 22.04.2 on WSL 2 on Windows 11. GPU is Nvidia GTX 3090.
@tqman tqman added the bug Something isn't working label Mar 30, 2023
@EyeDeck
Copy link
Contributor

EyeDeck commented Mar 30, 2023

(2023-04-02) See: #667 (comment)


old (2023-04-01) ``` cd repositories/GPTQ-for-LLaMA git checkout cuda ```

(2023-04-01) The CUDA branch might've broken old quantizations again (they're crashing for me anyway), so if you want to keep using e.g. the ones @USBhost shared here or here then also do:

git reset --hard 608f3ba71e40596c75f8864d73506eaf57323c6e

Then finally:

pip install -r requirements.txt

If that fails, open up requirements.txt and remove the triton==2.0.0 line, which is erroneously included in the latest commit as of writing, then rerun that command.

python setup_cuda.py install

GPTQ-for-LLaMA changed the default branch today, do that to set it back. Evidently it's not backwards-compatible when called externally e.g. from here, and I think models might need to be requantized again to work with the new branch anyway.


Btw the new branch supports --act-order + --groupsize simultaneously, and I did some LLaMA 30B (--wbits 4 --true-sequential) runs last night and my perplexity scores were:

wikitext2

4.322 (--groupsize 128, this one reevaluated)
4.321 (--act-order --groupsize 1024)
4.298 (--groupsize 128, new requantization from last night)
4.236 (--act-order --groupsize 128)

ptb-new

8.427 (--groupsize 128, reevaluated)
8.413 (--groupsize 128, new)
8.355 (--act-order --groupsize 1024)
8.245 (--act-order --groupsize 128)

c4-new

6.315 (--groupsize 128, reevaluated)
6.304 (--act-order --groupsize 1024)
6.301 (--groupsize 128, new)
6.235 (--act-order --groupsize 128)

The --act-order --groupsize 128 numbers in particular are a sizeable improvement vs the old model without --act-order.

However the new branch evidently needs triton to be installed to not run really slowly, without which it's around 1/4th as fast at inference vs the old cuda branch on my machine. Triton doesn't support Windows natively, and I haven't gotten around to setting up WSL to test it out myself yet.

@tqman
Copy link
Author

tqman commented Mar 30, 2023

That fixed it, thank you very much! I was pretty sure it was a recent change somewhere but I'm not familiar enough with all these pieces to quickly figure out where.

@oobabooga
Copy link
Owner

Eh, so I guess I can't just get away with keep using the cuda branch.

@deece
Copy link
Contributor

deece commented Apr 1, 2023

FYI, this is the commit that breaks things: qwopqwop200/GPTQ-for-LLaMa@f1af89a

@EyeDeck
Copy link
Contributor

EyeDeck commented Apr 1, 2023

That same change is on the latest CUDA branch too now btw

@deece
Copy link
Contributor

deece commented Apr 1, 2023

Yup :/ qwopqwop200/GPTQ-for-LLaMa@f1af89a

Here's the previous commit 608f3ba71e40596c75f8864d73506eaf57323c6e

@LoopControl
Copy link

LoopControl commented Apr 1, 2023

Just got this same error with latest cuda branch.

I think it's time to lock GPTQ to an exact (working) commit in the requirements file as GPTQ breaking changes seem to happen every day.

Edit: Rolling back to the GPTQ commit @deece mentioned works ( 608f3ba71e40596c75f8864d73506eaf57323c6e ).

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

https://github.com/oobabooga/text-generation-webui/blob/main/modules/GPTQ_loader.py#L36

Remove , faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

Also as far as I know the old models I made should still work.

@BadisG
Copy link
Contributor

BadisG commented Apr 1, 2023

https://github.com/oobabooga/text-generation-webui/blob/main/modules/GPTQ_loader.py#L36

Remove , faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

I got this after removing faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

(textgen) adduser@DESKTOP-ESLT88B:/mnt/d/Large Language Models/text-generation-webui$ python server.py --model llama-7b-128g --wbits 4 --groupsize 128
Loading llama-7b-128g...
Loading model ...



Traceback (most recent call last):
  File "/mnt/d/Large Language Models/text-generation-webui/server.py", line 275, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/GPTQ_loader.py", line 43, in _load_quant
    model.load_state_dict(safe_load(checkpoint))
  File "/home/adduser/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx", "model.layers.0.self_attn.q_proj.g_idx", "model.layers.0.self_attn.v_proj.g_idx", "model.layers.0.mlp.down_proj.g_idx", "model.layers.0.mlp.gate_proj.g_idx", "model.layers.0.mlp.up_proj.g_idx", "model.layers.1.self_attn.k_proj.g_idx", "model.layers.1.self_attn.o_proj.g_idx", "model.layers.1.self_attn.q_proj.g_idx", "model.layers.1.self_attn.v_proj.g_idx", "model.layers.1.mlp.down_proj.g_idx", "model.layers.1.mlp.gate_proj.g_idx", "model.layers.1.mlp.up_proj.g_idx", "model.layers.2.self_attn.k_proj.g_idx", "model.layers.2.self_attn.o_proj.g_idx", "model.layers.2.self_attn.q_proj.g_idx", "model.layers.2.self_attn.v_proj.g_idx", "model.layers.2.mlp.down_proj.g_idx", "model.layers.2.mlp.gate_proj.g_idx", "model.layers.2.mlp.up_proj.g_idx", "model.layers.3.self_attn.k_proj.g_idx", "model.layers.3.self_attn.o_proj.g_idx", "model.layers.3.self_attn.q_proj.g_idx", "model.layers.3.self_attn.v_proj.g_idx", "model.layers.3.mlp.down_proj.g_idx", "model.layers.3.mlp.gate_proj.g_idx", "model.layers.3.mlp.up_proj.g_idx", "model.layers.4.self_attn.k_proj.g_idx", "model.layers.4.self_attn.o_proj.g_idx", "model.layers.4.self_attn.q_proj.g_idx", "model.layers.4.self_attn.v_proj.g_idx", "model.layers.4.mlp.down_proj.g_idx", "model.layers.4.mlp.gate_proj.g_idx", "model.layers.4.mlp.up_proj.g_idx", "model.layers.5.self_attn.k_proj.g_idx", "model.layers.5.self_attn.o_proj.g_idx", "model.layers.5.self_attn.q_proj.g_idx", "model.layers.5.self_attn.v_proj.g_idx", "model.layers.5.mlp.down_proj.g_idx", "model.layers.5.mlp.gate_proj.g_idx", "model.layers.5.mlp.up_proj.g_idx", "model.layers.6.self_attn.k_proj.g_idx", "model.layers.6.self_attn.o_proj.g_idx", "model.layers.6.self_attn.q_proj.g_idx", "model.layers.6.self_attn.v_proj.g_idx", "model.layers.6.mlp.down_proj.g_idx", "model.layers.6.mlp.gate_proj.g_idx", "model.layers.6.mlp.up_proj.g_idx", "model.layers.7.self_attn.k_proj.g_idx", "model.layers.7.self_attn.o_proj.g_idx", "model.layers.7.self_attn.q_proj.g_idx", "model.layers.7.self_attn.v_proj.g_idx", "model.layers.7.mlp.down_proj.g_idx", "model.layers.7.mlp.gate_proj.g_idx", "model.layers.7.mlp.up_proj.g_idx", "model.layers.8.self_attn.k_proj.g_idx", "model.layers.8.self_attn.o_proj.g_idx", "model.layers.8.self_attn.q_proj.g_idx", "model.layers.8.self_attn.v_proj.g_idx", "model.layers.8.mlp.down_proj.g_idx", "model.layers.8.mlp.gate_proj.g_idx", "model.layers.8.mlp.up_proj.g_idx", "model.layers.9.self_attn.k_proj.g_idx", "model.layers.9.self_attn.o_proj.g_idx", "model.layers.9.self_attn.q_proj.g_idx", "model.layers.9.self_attn.v_proj.g_idx", "model.layers.9.mlp.down_proj.g_idx", "model.layers.9.mlp.gate_proj.g_idx", "model.layers.9.mlp.up_proj.g_idx", "model.layers.10.self_attn.k_proj.g_idx", "model.layers.10.self_attn.o_proj.g_idx", "model.layers.10.self_attn.q_proj.g_idx", "model.layers.10.self_attn.v_proj.g_idx", "model.layers.10.mlp.down_proj.g_idx", "model.layers.10.mlp.gate_proj.g_idx", "model.layers.10.mlp.up_proj.g_idx", "model.layers.11.self_attn.k_proj.g_idx", "model.layers.11.self_attn.o_proj.g_idx", "model.layers.11.self_attn.q_proj.g_idx", "model.layers.11.self_attn.v_proj.g_idx", "model.layers.11.mlp.down_proj.g_idx", "model.layers.11.mlp.gate_proj.g_idx", "model.layers.11.mlp.up_proj.g_idx", "model.layers.12.self_attn.k_proj.g_idx", "model.layers.12.self_attn.o_proj.g_idx", "model.layers.12.self_attn.q_proj.g_idx", "model.layers.12.self_attn.v_proj.g_idx", "model.layers.12.mlp.down_proj.g_idx", "model.layers.12.mlp.gate_proj.g_idx", "model.layers.12.mlp.up_proj.g_idx", "model.layers.13.self_attn.k_proj.g_idx", "model.layers.13.self_attn.o_proj.g_idx", "model.layers.13.self_attn.q_proj.g_idx", "model.layers.13.self_attn.v_proj.g_idx", "model.layers.13.mlp.down_proj.g_idx", "model.layers.13.mlp.gate_proj.g_idx", "model.layers.13.mlp.up_proj.g_idx", "model.layers.14.self_attn.k_proj.g_idx", "model.layers.14.self_attn.o_proj.g_idx", "model.layers.14.self_attn.q_proj.g_idx", "model.layers.14.self_attn.v_proj.g_idx", "model.layers.14.mlp.down_proj.g_idx", "model.layers.14.mlp.gate_proj.g_idx", "model.layers.14.mlp.up_proj.g_idx", "model.layers.15.self_attn.k_proj.g_idx", "model.layers.15.self_attn.o_proj.g_idx", "model.layers.15.self_attn.q_proj.g_idx", "model.layers.15.self_attn.v_proj.g_idx", "model.layers.15.mlp.down_proj.g_idx", "model.layers.15.mlp.gate_proj.g_idx", "model.layers.15.mlp.up_proj.g_idx", "model.layers.16.self_attn.k_proj.g_idx", "model.layers.16.self_attn.o_proj.g_idx", "model.layers.16.self_attn.q_proj.g_idx", "model.layers.16.self_attn.v_proj.g_idx", "model.layers.16.mlp.down_proj.g_idx", "model.layers.16.mlp.gate_proj.g_idx", "model.layers.16.mlp.up_proj.g_idx", "model.layers.17.self_attn.k_proj.g_idx", "model.layers.17.self_attn.o_proj.g_idx", "model.layers.17.self_attn.q_proj.g_idx", "model.layers.17.self_attn.v_proj.g_idx", "model.layers.17.mlp.down_proj.g_idx", "model.layers.17.mlp.gate_proj.g_idx", "model.layers.17.mlp.up_proj.g_idx", "model.layers.18.self_attn.k_proj.g_idx", "model.layers.18.self_attn.o_proj.g_idx", "model.layers.18.self_attn.q_proj.g_idx", "model.layers.18.self_attn.v_proj.g_idx", "model.layers.18.mlp.down_proj.g_idx", "model.layers.18.mlp.gate_proj.g_idx", "model.layers.18.mlp.up_proj.g_idx", "model.layers.19.self_attn.k_proj.g_idx", "model.layers.19.self_attn.o_proj.g_idx", "model.layers.19.self_attn.q_proj.g_idx", "model.layers.19.self_attn.v_proj.g_idx", "model.layers.19.mlp.down_proj.g_idx", "model.layers.19.mlp.gate_proj.g_idx", "model.layers.19.mlp.up_proj.g_idx", "model.layers.20.self_attn.k_proj.g_idx", "model.layers.20.self_attn.o_proj.g_idx", "model.layers.20.self_attn.q_proj.g_idx", "model.layers.20.self_attn.v_proj.g_idx", "model.layers.20.mlp.down_proj.g_idx", "model.layers.20.mlp.gate_proj.g_idx", "model.layers.20.mlp.up_proj.g_idx", "model.layers.21.self_attn.k_proj.g_idx", "model.layers.21.self_attn.o_proj.g_idx", "model.layers.21.self_attn.q_proj.g_idx", "model.layers.21.self_attn.v_proj.g_idx", "model.layers.21.mlp.down_proj.g_idx", "model.layers.21.mlp.gate_proj.g_idx", "model.layers.21.mlp.up_proj.g_idx", "model.layers.22.self_attn.k_proj.g_idx", "model.layers.22.self_attn.o_proj.g_idx", "model.layers.22.self_attn.q_proj.g_idx", "model.layers.22.self_attn.v_proj.g_idx", "model.layers.22.mlp.down_proj.g_idx", "model.layers.22.mlp.gate_proj.g_idx", "model.layers.22.mlp.up_proj.g_idx", "model.layers.23.self_attn.k_proj.g_idx", "model.layers.23.self_attn.o_proj.g_idx", "model.layers.23.self_attn.q_proj.g_idx", "model.layers.23.self_attn.v_proj.g_idx", "model.layers.23.mlp.down_proj.g_idx", "model.layers.23.mlp.gate_proj.g_idx", "model.layers.23.mlp.up_proj.g_idx", "model.layers.24.self_attn.k_proj.g_idx", "model.layers.24.self_attn.o_proj.g_idx", "model.layers.24.self_attn.q_proj.g_idx", "model.layers.24.self_attn.v_proj.g_idx", "model.layers.24.mlp.down_proj.g_idx", "model.layers.24.mlp.gate_proj.g_idx", "model.layers.24.mlp.up_proj.g_idx", "model.layers.25.self_attn.k_proj.g_idx", "model.layers.25.self_attn.o_proj.g_idx", "model.layers.25.self_attn.q_proj.g_idx", "model.layers.25.self_attn.v_proj.g_idx", "model.layers.25.mlp.down_proj.g_idx", "model.layers.25.mlp.gate_proj.g_idx", "model.layers.25.mlp.up_proj.g_idx", "model.layers.26.self_attn.k_proj.g_idx", "model.layers.26.self_attn.o_proj.g_idx", "model.layers.26.self_attn.q_proj.g_idx", "model.layers.26.self_attn.v_proj.g_idx", "model.layers.26.mlp.down_proj.g_idx", "model.layers.26.mlp.gate_proj.g_idx", "model.layers.26.mlp.up_proj.g_idx", "model.layers.27.self_attn.k_proj.g_idx", "model.layers.27.self_attn.o_proj.g_idx", "model.layers.27.self_attn.q_proj.g_idx", "model.layers.27.self_attn.v_proj.g_idx", "model.layers.27.mlp.down_proj.g_idx", "model.layers.27.mlp.gate_proj.g_idx", "model.layers.27.mlp.up_proj.g_idx", "model.layers.28.self_attn.k_proj.g_idx", "model.layers.28.self_attn.o_proj.g_idx", "model.layers.28.self_attn.q_proj.g_idx", "model.layers.28.self_attn.v_proj.g_idx", "model.layers.28.mlp.down_proj.g_idx", "model.layers.28.mlp.gate_proj.g_idx", "model.layers.28.mlp.up_proj.g_idx", "model.layers.29.self_attn.k_proj.g_idx", "model.layers.29.self_attn.o_proj.g_idx", "model.layers.29.self_attn.q_proj.g_idx", "model.layers.29.self_attn.v_proj.g_idx", "model.layers.29.mlp.down_proj.g_idx", "model.layers.29.mlp.gate_proj.g_idx", "model.layers.29.mlp.up_proj.g_idx", "model.layers.30.self_attn.k_proj.g_idx", "model.layers.30.self_attn.o_proj.g_idx", "model.layers.30.self_attn.q_proj.g_idx", "model.layers.30.self_attn.v_proj.g_idx", "model.layers.30.mlp.down_proj.g_idx", "model.layers.30.mlp.gate_proj.g_idx", "model.layers.30.mlp.up_proj.g_idx", "model.layers.31.self_attn.k_proj.g_idx", "model.layers.31.self_attn.o_proj.g_idx", "model.layers.31.self_attn.q_proj.g_idx", "model.layers.31.self_attn.v_proj.g_idx", "model.layers.31.mlp.down_proj.g_idx", "model.layers.31.mlp.gate_proj.g_idx", "model.layers.31.mlp.up_proj.g_idx".
        Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.up_proj.bias", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.up_proj.bias", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.up_proj.bias", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.up_proj.bias", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.up_proj.bias", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.up_proj.bias", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.up_proj.bias", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.up_proj.bias", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.up_proj.bias", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.up_proj.bias", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.up_proj.bias", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.up_proj.bias", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.up_proj.bias", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.up_proj.bias", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.up_proj.bias", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.up_proj.bias", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.up_proj.bias", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.up_proj.bias", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.up_proj.bias", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.up_proj.bias", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.up_proj.bias", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.up_proj.bias", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.up_proj.bias", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.up_proj.bias", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.up_proj.bias", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.up_proj.bias", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.up_proj.bias", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.up_proj.bias", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.up_proj.bias", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.up_proj.bias", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.up_proj.bias", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.up_proj.bias".

This is getting ridiculous 😓

@EyeDeck
Copy link
Contributor

EyeDeck commented Apr 1, 2023

Yep, same error as above for me on the ungrouped and 128g 4-bit models from USBhost.

The quantizations I did a few days ago on the short-lived pytorch branch work, as do some re-runs from last night (on the CUDA branch, this commit)...but performance is abysmal. Getting a whopping 0.11 tokens/s with ~1850 context size, while the older quant on the older commit does closer to 4-5/s.

@BadisG
Copy link
Contributor

BadisG commented Apr 1, 2023

I don't even know why there's 2 branchs, triton and cuda? Wouldn't cuda be the fastest one? Why should we go for the slow version? lol

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

uninstall transformers and reinstall.

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

I don't even know why there's 2 branchs, triton and cuda? Wouldn't cuda be the fastest one? Why should we go for the slow version? lol

triton supports more features.

@BadisG
Copy link
Contributor

BadisG commented Apr 1, 2023

@USBhost like what?
And if triton is as slow as the old pytorch branch I don't really see the point

@EyeDeck
Copy link
Contributor

EyeDeck commented Apr 1, 2023

No change after reinstalling transformers.

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.

Also I feel the pytorch branch was broken. That thing was worse than the delay.

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

@EyeDeck let me see. Does the groupsize have the same issue?

@BadisG
Copy link
Contributor

BadisG commented Apr 1, 2023

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.

Also I feel the pytorch branch was broken. That thing was worse than the delay.

I think it works on the cuda branch now, you can combine all the gptq implementations
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

@USBhost
Copy link
Contributor

USBhost commented Apr 1, 2023

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.
Also I feel the pytorch branch was broken. That thing was worse than the delay.

I think it works on the cuda branch now, you can combine all the gptq implementations https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

Well that's new. Hard to know when the comments are called update (filename).

@BadisG
Copy link
Contributor

BadisG commented Apr 1, 2023

@USBhost I know right... I knew this by looking at the new readme, now he removed the "you can't combine act order and groupesize 128 together" 😅

@EyeDeck
Copy link
Contributor

EyeDeck commented Apr 1, 2023

@USBhost Same error with your ungrouped 30B and then slightly newer 128g 30B:

  File "G:\miniconda\envs\tgwui\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx", "model.layers.0.self_attn.q_proj.g_idx", "model.layers.0.self_attn.v_proj.g_idx", "model.layers.0.mlp.down_proj.g_idx", "model.layers.0.mlp.gate_proj.g_idx", "model.layers.0.mlp.up_proj.g_idx", "model.layers.1.self_attn.k_proj.g_idx", "model.layers.1.self_attn.o_proj.g_idx", "model.layers.1.self_attn.q_proj.g_idx", "model.layers.1.self_attn.v_proj.g_idx", "model.layers.1.mlp.down_proj.g_idx", "model.layers.1.mlp.gate_proj.g_idx", "model.layers.1.mlp.up_proj.g_idx", "model.layers.2.self_attn.k_proj.g_idx", "model.layers.2.self_attn.o_proj.g_idx", "model.layers.2.self_attn.q_proj.g_idx", "model.layers.2.self_attn.v_proj.g_idx", "model.layers.2.mlp.down_proj.g_idx", "model.layers.2.mlp.gate_proj.g_idx", "model.layers.2.mlp.up_proj.g_idx", "model.layers.3.self_attn.k_proj.g_idx", "model.layers.3.self_attn.o_proj.g_idx", "model.layers.3.self_attn.q_proj.g_idx", "model.layers.3.self_attn.v_proj.g_idx", "model.layers.3.mlp.down_proj.g_idx", "model.layers.3.mlp.gate_proj.g_idx", "model.layers.3.mlp.up_proj.g_idx", "model.layers.4.self_attn.k_proj.g_idx", "model.layers.4.self_attn.o_proj.g_idx", "model.layers.4.self_attn.q_proj.g_idx", "model.layers.4.self_attn.v_proj.g_idx", "model.layers.4.mlp.down_proj.g_idx", "model.layers.4.mlp.gate_proj.g_idx", "model.layers.4.mlp.up_proj.g_idx", "model.layers.5.self_attn.k_proj.g_idx", "model.layers.5.self_attn.o_proj.g_idx", "model.layers.5.self_attn.q_proj.g_idx", "model.layers.5.self_attn.v_proj.g_idx", "model.layers.5.mlp.down_proj.g_idx", "model.layers.5.mlp.gate_proj.g_idx", "model.layers.5.mlp.up_proj.g_idx", "model.layers.6.self_attn.k_proj.g_idx", "model.layers.6.self_attn.o_proj.g_idx", "model.layers.6.self_attn.q_proj.g_idx", "model.layers.6.self_attn.v_proj.g_idx", "model.layers.6.mlp.down_proj.g_idx", "model.layers.6.mlp.gate_proj.g_idx", "model.layers.6.mlp.up_proj.g_idx", "model.layers.7.self_attn.k_proj.g_idx", "model.layers.7.self_attn.o_proj.g_idx", "model.layers.7.self_attn.q_proj.g_idx", "model.layers.7.self_attn.v_proj.g_idx", "model.layers.7.mlp.down_proj.g_idx", "model.layers.7.mlp.gate_proj.g_idx", "model.layers.7.mlp.up_proj.g_idx", "model.layers.8.self_attn.k_proj.g_idx", "model.layers.8.self_attn.o_proj.g_idx", "model.layers.8.self_attn.q_proj.g_idx", "model.layers.8.self_attn.v_proj.g_idx", "model.layers.8.mlp.down_proj.g_idx", "model.layers.8.mlp.gate_proj.g_idx", "model.layers.8.mlp.up_proj.g_idx", "model.layers.9.self_attn.k_proj.g_idx", "model.layers.9.self_attn.o_proj.g_idx", "model.layers.9.self_attn.q_proj.g_idx", "model.layers.9.self_attn.v_proj.g_idx", "model.layers.9.mlp.down_proj.g_idx", "model.layers.9.mlp.gate_proj.g_idx", "model.layers.9.mlp.up_proj.g_idx", "model.layers.10.self_attn.k_proj.g_idx", "model.layers.10.self_attn.o_proj.g_idx", "model.layers.10.self_attn.q_proj.g_idx", "model.layers.10.self_attn.v_proj.g_idx", "model.layers.10.mlp.down_proj.g_idx", "model.layers.10.mlp.gate_proj.g_idx", "model.layers.10.mlp.up_proj.g_idx", "model.layers.11.self_attn.k_proj.g_idx", "model.layers.11.self_attn.o_proj.g_idx", "model.layers.11.self_attn.q_proj.g_idx", "model.layers.11.self_attn.v_proj.g_idx", "model.layers.11.mlp.down_proj.g_idx", "model.layers.11.mlp.gate_proj.g_idx", "model.layers.11.mlp.up_proj.g_idx", "model.layers.12.self_attn.k_proj.g_idx", "model.layers.12.self_attn.o_proj.g_idx", "model.layers.12.self_attn.q_proj.g_idx", "model.layers.12.self_attn.v_proj.g_idx", "model.layers.12.mlp.down_proj.g_idx", "model.layers.12.mlp.gate_proj.g_idx", "model.layers.12.mlp.up_proj.g_idx", "model.layers.13.self_attn.k_proj.g_idx", "model.layers.13.self_attn.o_proj.g_idx", "model.layers.13.self_attn.q_proj.g_idx", "model.layers.13.self_attn.v_proj.g_idx", "model.layers.13.mlp.down_proj.g_idx", "model.layers.13.mlp.gate_proj.g_idx", "model.layers.13.mlp.up_proj.g_idx", "model.layers.14.self_attn.k_proj.g_idx", "model.layers.14.self_attn.o_proj.g_idx", "model.layers.14.self_attn.q_proj.g_idx", "model.layers.14.self_attn.v_proj.g_idx", "model.layers.14.mlp.down_proj.g_idx", "model.layers.14.mlp.gate_proj.g_idx", "model.layers.14.mlp.up_proj.g_idx", "model.layers.15.self_attn.k_proj.g_idx", "model.layers.15.self_attn.o_proj.g_idx", "model.layers.15.self_attn.q_proj.g_idx", "model.layers.15.self_attn.v_proj.g_idx", "model.layers.15.mlp.down_proj.g_idx", "model.layers.15.mlp.gate_proj.g_idx", "model.layers.15.mlp.up_proj.g_idx", "model.layers.16.self_attn.k_proj.g_idx", "model.layers.16.self_attn.o_proj.g_idx", "model.layers.16.self_attn.q_proj.g_idx", "model.layers.16.self_attn.v_proj.g_idx", "model.layers.16.mlp.down_proj.g_idx", "model.layers.16.mlp.gate_proj.g_idx", "model.layers.16.mlp.up_proj.g_idx", "model.layers.17.self_attn.k_proj.g_idx", "model.layers.17.self_attn.o_proj.g_idx", "model.layers.17.self_attn.q_proj.g_idx", "model.layers.17.self_attn.v_proj.g_idx", "model.layers.17.mlp.down_proj.g_idx", "model.layers.17.mlp.gate_proj.g_idx", "model.layers.17.mlp.up_proj.g_idx", "model.layers.18.self_attn.k_proj.g_idx", "model.layers.18.self_attn.o_proj.g_idx", "model.layers.18.self_attn.q_proj.g_idx", "model.layers.18.self_attn.v_proj.g_idx", "model.layers.18.mlp.down_proj.g_idx", "model.layers.18.mlp.gate_proj.g_idx", "model.layers.18.mlp.up_proj.g_idx", "model.layers.19.self_attn.k_proj.g_idx", "model.layers.19.self_attn.o_proj.g_idx", "model.layers.19.self_attn.q_proj.g_idx", "model.layers.19.self_attn.v_proj.g_idx", "model.layers.19.mlp.down_proj.g_idx", "model.layers.19.mlp.gate_proj.g_idx", "model.layers.19.mlp.up_proj.g_idx", "model.layers.20.self_attn.k_proj.g_idx", "model.layers.20.self_attn.o_proj.g_idx", "model.layers.20.self_attn.q_proj.g_idx", "model.layers.20.self_attn.v_proj.g_idx", "model.layers.20.mlp.down_proj.g_idx", "model.layers.20.mlp.gate_proj.g_idx", "model.layers.20.mlp.up_proj.g_idx", "model.layers.21.self_attn.k_proj.g_idx", "model.layers.21.self_attn.o_proj.g_idx", "model.layers.21.self_attn.q_proj.g_idx", "model.layers.21.self_attn.v_proj.g_idx", "model.layers.21.mlp.down_proj.g_idx", "model.layers.21.mlp.gate_proj.g_idx", "model.layers.21.mlp.up_proj.g_idx", "model.layers.22.self_attn.k_proj.g_idx", "model.layers.22.self_attn.o_proj.g_idx", "model.layers.22.self_attn.q_proj.g_idx", "model.layers.22.self_attn.v_proj.g_idx", "model.layers.22.mlp.down_proj.g_idx", "model.layers.22.mlp.gate_proj.g_idx", "model.layers.22.mlp.up_proj.g_idx", "model.layers.23.self_attn.k_proj.g_idx", "model.layers.23.self_attn.o_proj.g_idx", "model.layers.23.self_attn.q_proj.g_idx", "model.layers.23.self_attn.v_proj.g_idx", "model.layers.23.mlp.down_proj.g_idx", "model.layers.23.mlp.gate_proj.g_idx", "model.layers.23.mlp.up_proj.g_idx", "model.layers.24.self_attn.k_proj.g_idx", "model.layers.24.self_attn.o_proj.g_idx", "model.layers.24.self_attn.q_proj.g_idx", "model.layers.24.self_attn.v_proj.g_idx", "model.layers.24.mlp.down_proj.g_idx", "model.layers.24.mlp.gate_proj.g_idx", "model.layers.24.mlp.up_proj.g_idx", "model.layers.25.self_attn.k_proj.g_idx", "model.layers.25.self_attn.o_proj.g_idx", "model.layers.25.self_attn.q_proj.g_idx", "model.layers.25.self_attn.v_proj.g_idx", "model.layers.25.mlp.down_proj.g_idx", "model.layers.25.mlp.gate_proj.g_idx", "model.layers.25.mlp.up_proj.g_idx", "model.layers.26.self_attn.k_proj.g_idx", "model.layers.26.self_attn.o_proj.g_idx", "model.layers.26.self_attn.q_proj.g_idx", "model.layers.26.self_attn.v_proj.g_idx", "model.layers.26.mlp.down_proj.g_idx", "model.layers.26.mlp.gate_proj.g_idx", "model.layers.26.mlp.up_proj.g_idx", "model.layers.27.self_attn.k_proj.g_idx", "model.layers.27.self_attn.o_proj.g_idx", "model.layers.27.self_attn.q_proj.g_idx", "model.layers.27.self_attn.v_proj.g_idx", "model.layers.27.mlp.down_proj.g_idx", "model.layers.27.mlp.gate_proj.g_idx", "model.layers.27.mlp.up_proj.g_idx", "model.layers.28.self_attn.k_proj.g_idx", "model.layers.28.self_attn.o_proj.g_idx", "model.layers.28.self_attn.q_proj.g_idx", "model.layers.28.self_attn.v_proj.g_idx", "model.layers.28.mlp.down_proj.g_idx", "model.layers.28.mlp.gate_proj.g_idx", "model.layers.28.mlp.up_proj.g_idx", "model.layers.29.self_attn.k_proj.g_idx", "model.layers.29.self_attn.o_proj.g_idx", "model.layers.29.self_attn.q_proj.g_idx", "model.layers.29.self_attn.v_proj.g_idx", "model.layers.29.mlp.down_proj.g_idx", "model.layers.29.mlp.gate_proj.g_idx", "model.layers.29.mlp.up_proj.g_idx", "model.layers.30.self_attn.k_proj.g_idx", "model.layers.30.self_attn.o_proj.g_idx", "model.layers.30.self_attn.q_proj.g_idx", "model.layers.30.self_attn.v_proj.g_idx", "model.layers.30.mlp.down_proj.g_idx", "model.layers.30.mlp.gate_proj.g_idx", "model.layers.30.mlp.up_proj.g_idx", "model.layers.31.self_attn.k_proj.g_idx", "model.layers.31.self_attn.o_proj.g_idx", "model.layers.31.self_attn.q_proj.g_idx", "model.layers.31.self_attn.v_proj.g_idx", "model.layers.31.mlp.down_proj.g_idx", "model.layers.31.mlp.gate_proj.g_idx", "model.layers.31.mlp.up_proj.g_idx", "model.layers.32.self_attn.k_proj.g_idx", "model.layers.32.self_attn.o_proj.g_idx", "model.layers.32.self_attn.q_proj.g_idx", "model.layers.32.self_attn.v_proj.g_idx", "model.layers.32.mlp.down_proj.g_idx", "model.layers.32.mlp.gate_proj.g_idx", "model.layers.32.mlp.up_proj.g_idx", "model.layers.33.self_attn.k_proj.g_idx", "model.layers.33.self_attn.o_proj.g_idx", "model.layers.33.self_attn.q_proj.g_idx", "model.layers.33.self_attn.v_proj.g_idx", "model.layers.33.mlp.down_proj.g_idx", "model.layers.33.mlp.gate_proj.g_idx", "model.layers.33.mlp.up_proj.g_idx", "model.layers.34.self_attn.k_proj.g_idx", "model.layers.34.self_attn.o_proj.g_idx", "model.layers.34.self_attn.q_proj.g_idx", "model.layers.34.self_attn.v_proj.g_idx", "model.layers.34.mlp.down_proj.g_idx", "model.layers.34.mlp.gate_proj.g_idx", "model.layers.34.mlp.up_proj.g_idx", "model.layers.35.self_attn.k_proj.g_idx", "model.layers.35.self_attn.o_proj.g_idx", "model.layers.35.self_attn.q_proj.g_idx", "model.layers.35.self_attn.v_proj.g_idx", "model.layers.35.mlp.down_proj.g_idx", "model.layers.35.mlp.gate_proj.g_idx", "model.layers.35.mlp.up_proj.g_idx", "model.layers.36.self_attn.k_proj.g_idx", "model.layers.36.self_attn.o_proj.g_idx", "model.layers.36.self_attn.q_proj.g_idx", "model.layers.36.self_attn.v_proj.g_idx", "model.layers.36.mlp.down_proj.g_idx", "model.layers.36.mlp.gate_proj.g_idx", "model.layers.36.mlp.up_proj.g_idx", "model.layers.37.self_attn.k_proj.g_idx", "model.layers.37.self_attn.o_proj.g_idx", "model.layers.37.self_attn.q_proj.g_idx", "model.layers.37.self_attn.v_proj.g_idx", "model.layers.37.mlp.down_proj.g_idx", "model.layers.37.mlp.gate_proj.g_idx", "model.layers.37.mlp.up_proj.g_idx", "model.layers.38.self_attn.k_proj.g_idx", "model.layers.38.self_attn.o_proj.g_idx", "model.layers.38.self_attn.q_proj.g_idx", "model.layers.38.self_attn.v_proj.g_idx", "model.layers.38.mlp.down_proj.g_idx", "model.layers.38.mlp.gate_proj.g_idx", "model.layers.38.mlp.up_proj.g_idx", "model.layers.39.self_attn.k_proj.g_idx", "model.layers.39.self_attn.o_proj.g_idx", "model.layers.39.self_attn.q_proj.g_idx", "model.layers.39.self_attn.v_proj.g_idx", "model.layers.39.mlp.down_proj.g_idx", "model.layers.39.mlp.gate_proj.g_idx", "model.layers.39.mlp.up_proj.g_idx", "model.layers.40.self_attn.k_proj.g_idx", "model.layers.40.self_attn.o_proj.g_idx", "model.layers.40.self_attn.q_proj.g_idx", "model.layers.40.self_attn.v_proj.g_idx", "model.layers.40.mlp.down_proj.g_idx", "model.layers.40.mlp.gate_proj.g_idx", "model.layers.40.mlp.up_proj.g_idx", "model.layers.41.self_attn.k_proj.g_idx", "model.layers.41.self_attn.o_proj.g_idx", "model.layers.41.self_attn.q_proj.g_idx", "model.layers.41.self_attn.v_proj.g_idx", "model.layers.41.mlp.down_proj.g_idx", "model.layers.41.mlp.gate_proj.g_idx", "model.layers.41.mlp.up_proj.g_idx", "model.layers.42.self_attn.k_proj.g_idx", "model.layers.42.self_attn.o_proj.g_idx", "model.layers.42.self_attn.q_proj.g_idx", "model.layers.42.self_attn.v_proj.g_idx", "model.layers.42.mlp.down_proj.g_idx", "model.layers.42.mlp.gate_proj.g_idx", "model.layers.42.mlp.up_proj.g_idx", "model.layers.43.self_attn.k_proj.g_idx", "model.layers.43.self_attn.o_proj.g_idx", "model.layers.43.self_attn.q_proj.g_idx", "model.layers.43.self_attn.v_proj.g_idx", "model.layers.43.mlp.down_proj.g_idx", "model.layers.43.mlp.gate_proj.g_idx", "model.layers.43.mlp.up_proj.g_idx", "model.layers.44.self_attn.k_proj.g_idx", "model.layers.44.self_attn.o_proj.g_idx", "model.layers.44.self_attn.q_proj.g_idx", "model.layers.44.self_attn.v_proj.g_idx", "model.layers.44.mlp.down_proj.g_idx", "model.layers.44.mlp.gate_proj.g_idx", "model.layers.44.mlp.up_proj.g_idx", "model.layers.45.self_attn.k_proj.g_idx", "model.layers.45.self_attn.o_proj.g_idx", "model.layers.45.self_attn.q_proj.g_idx", "model.layers.45.self_attn.v_proj.g_idx", "model.layers.45.mlp.down_proj.g_idx", "model.layers.45.mlp.gate_proj.g_idx", "model.layers.45.mlp.up_proj.g_idx", "model.layers.46.self_attn.k_proj.g_idx", "model.layers.46.self_attn.o_proj.g_idx", "model.layers.46.self_attn.q_proj.g_idx", "model.layers.46.self_attn.v_proj.g_idx", "model.layers.46.mlp.down_proj.g_idx", "model.layers.46.mlp.gate_proj.g_idx", "model.layers.46.mlp.up_proj.g_idx", "model.layers.47.self_attn.k_proj.g_idx", "model.layers.47.self_attn.o_proj.g_idx", "model.layers.47.self_attn.q_proj.g_idx", "model.layers.47.self_attn.v_proj.g_idx", "model.layers.47.mlp.down_proj.g_idx", "model.layers.47.mlp.gate_proj.g_idx", "model.layers.47.mlp.up_proj.g_idx", "model.layers.48.self_attn.k_proj.g_idx", "model.layers.48.self_attn.o_proj.g_idx", "model.layers.48.self_attn.q_proj.g_idx", "model.layers.48.self_attn.v_proj.g_idx", "model.layers.48.mlp.down_proj.g_idx", "model.layers.48.mlp.gate_proj.g_idx", "model.layers.48.mlp.up_proj.g_idx", "model.layers.49.self_attn.k_proj.g_idx", "model.layers.49.self_attn.o_proj.g_idx", "model.layers.49.self_attn.q_proj.g_idx", "model.layers.49.self_attn.v_proj.g_idx", "model.layers.49.mlp.down_proj.g_idx", "model.layers.49.mlp.gate_proj.g_idx", "model.layers.49.mlp.up_proj.g_idx", "model.layers.50.self_attn.k_proj.g_idx", "model.layers.50.self_attn.o_proj.g_idx", "model.layers.50.self_attn.q_proj.g_idx", "model.layers.50.self_attn.v_proj.g_idx", "model.layers.50.mlp.down_proj.g_idx", "model.layers.50.mlp.gate_proj.g_idx", "model.layers.50.mlp.up_proj.g_idx", "model.layers.51.self_attn.k_proj.g_idx", "model.layers.51.self_attn.o_proj.g_idx", "model.layers.51.self_attn.q_proj.g_idx", "model.layers.51.self_attn.v_proj.g_idx", "model.layers.51.mlp.down_proj.g_idx", "model.layers.51.mlp.gate_proj.g_idx", "model.layers.51.mlp.up_proj.g_idx", "model.layers.52.self_attn.k_proj.g_idx", "model.layers.52.self_attn.o_proj.g_idx", "model.layers.52.self_attn.q_proj.g_idx", "model.layers.52.self_attn.v_proj.g_idx", "model.layers.52.mlp.down_proj.g_idx", "model.layers.52.mlp.gate_proj.g_idx", "model.layers.52.mlp.up_proj.g_idx", "model.layers.53.self_attn.k_proj.g_idx", "model.layers.53.self_attn.o_proj.g_idx", "model.layers.53.self_attn.q_proj.g_idx", "model.layers.53.self_attn.v_proj.g_idx", "model.layers.53.mlp.down_proj.g_idx", "model.layers.53.mlp.gate_proj.g_idx", "model.layers.53.mlp.up_proj.g_idx", "model.layers.54.self_attn.k_proj.g_idx", "model.layers.54.self_attn.o_proj.g_idx", "model.layers.54.self_attn.q_proj.g_idx", "model.layers.54.self_attn.v_proj.g_idx", "model.layers.54.mlp.down_proj.g_idx", "model.layers.54.mlp.gate_proj.g_idx", "model.layers.54.mlp.up_proj.g_idx", "model.layers.55.self_attn.k_proj.g_idx", "model.layers.55.self_attn.o_proj.g_idx", "model.layers.55.self_attn.q_proj.g_idx", "model.layers.55.self_attn.v_proj.g_idx", "model.layers.55.mlp.down_proj.g_idx", "model.layers.55.mlp.gate_proj.g_idx", "model.layers.55.mlp.up_proj.g_idx", "model.layers.56.self_attn.k_proj.g_idx", "model.layers.56.self_attn.o_proj.g_idx", "model.layers.56.self_attn.q_proj.g_idx", "model.layers.56.self_attn.v_proj.g_idx", "model.layers.56.mlp.down_proj.g_idx", "model.layers.56.mlp.gate_proj.g_idx", "model.layers.56.mlp.up_proj.g_idx", "model.layers.57.self_attn.k_proj.g_idx", "model.layers.57.self_attn.o_proj.g_idx", "model.layers.57.self_attn.q_proj.g_idx", "model.layers.57.self_attn.v_proj.g_idx", "model.layers.57.mlp.down_proj.g_idx", "model.layers.57.mlp.gate_proj.g_idx", "model.layers.57.mlp.up_proj.g_idx", "model.layers.58.self_attn.k_proj.g_idx", "model.layers.58.self_attn.o_proj.g_idx", "model.layers.58.self_attn.q_proj.g_idx", "model.layers.58.self_attn.v_proj.g_idx", "model.layers.58.mlp.down_proj.g_idx", "model.layers.58.mlp.gate_proj.g_idx", "model.layers.58.mlp.up_proj.g_idx", "model.layers.59.self_attn.k_proj.g_idx", "model.layers.59.self_attn.o_proj.g_idx", "model.layers.59.self_attn.q_proj.g_idx", "model.layers.59.self_attn.v_proj.g_idx", "model.layers.59.mlp.down_proj.g_idx", "model.layers.59.mlp.gate_proj.g_idx", "model.layers.59.mlp.up_proj.g_idx".
        Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.up_proj.bias", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.up_proj.bias", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.up_proj.bias", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.up_proj.bias", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.up_proj.bias", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.up_proj.bias", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.up_proj.bias", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.up_proj.bias", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.up_proj.bias", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.up_proj.bias", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.up_proj.bias", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.up_proj.bias", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.up_proj.bias", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.up_proj.bias", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.up_proj.bias", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.up_proj.bias", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.up_proj.bias", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.up_proj.bias", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.up_proj.bias", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.up_proj.bias", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.up_proj.bias", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.up_proj.bias", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.up_proj.bias", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.up_proj.bias", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.up_proj.bias", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.up_proj.bias", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.up_proj.bias", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.up_proj.bias", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.up_proj.bias", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.up_proj.bias", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.up_proj.bias", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.up_proj.bias", "model.layers.32.self_attn.k_proj.bias", "model.layers.32.self_attn.o_proj.bias", "model.layers.32.self_attn.q_proj.bias", "model.layers.32.self_attn.v_proj.bias", "model.layers.32.mlp.down_proj.bias", "model.layers.32.mlp.gate_proj.bias", "model.layers.32.mlp.up_proj.bias", "model.layers.33.self_attn.k_proj.bias", "model.layers.33.self_attn.o_proj.bias", "model.layers.33.self_attn.q_proj.bias", "model.layers.33.self_attn.v_proj.bias", "model.layers.33.mlp.down_proj.bias", "model.layers.33.mlp.gate_proj.bias", "model.layers.33.mlp.up_proj.bias", "model.layers.34.self_attn.k_proj.bias", "model.layers.34.self_attn.o_proj.bias", "model.layers.34.self_attn.q_proj.bias", "model.layers.34.self_attn.v_proj.bias", "model.layers.34.mlp.down_proj.bias", "model.layers.34.mlp.gate_proj.bias", "model.layers.34.mlp.up_proj.bias", "model.layers.35.self_attn.k_proj.bias", "model.layers.35.self_attn.o_proj.bias", "model.layers.35.self_attn.q_proj.bias", "model.layers.35.self_attn.v_proj.bias", "model.layers.35.mlp.down_proj.bias", "model.layers.35.mlp.gate_proj.bias", "model.layers.35.mlp.up_proj.bias", "model.layers.36.self_attn.k_proj.bias", "model.layers.36.self_attn.o_proj.bias", "model.layers.36.self_attn.q_proj.bias", "model.layers.36.self_attn.v_proj.bias", "model.layers.36.mlp.down_proj.bias", "model.layers.36.mlp.gate_proj.bias", "model.layers.36.mlp.up_proj.bias", "model.layers.37.self_attn.k_proj.bias", "model.layers.37.self_attn.o_proj.bias", "model.layers.37.self_attn.q_proj.bias", "model.layers.37.self_attn.v_proj.bias", "model.layers.37.mlp.down_proj.bias", "model.layers.37.mlp.gate_proj.bias", "model.layers.37.mlp.up_proj.bias", "model.layers.38.self_attn.k_proj.bias", "model.layers.38.self_attn.o_proj.bias", "model.layers.38.self_attn.q_proj.bias", "model.layers.38.self_attn.v_proj.bias", "model.layers.38.mlp.down_proj.bias", "model.layers.38.mlp.gate_proj.bias", "model.layers.38.mlp.up_proj.bias", "model.layers.39.self_attn.k_proj.bias", "model.layers.39.self_attn.o_proj.bias", "model.layers.39.self_attn.q_proj.bias", "model.layers.39.self_attn.v_proj.bias", "model.layers.39.mlp.down_proj.bias", "model.layers.39.mlp.gate_proj.bias", "model.layers.39.mlp.up_proj.bias", "model.layers.40.self_attn.k_proj.bias", "model.layers.40.self_attn.o_proj.bias", "model.layers.40.self_attn.q_proj.bias", "model.layers.40.self_attn.v_proj.bias", "model.layers.40.mlp.down_proj.bias", "model.layers.40.mlp.gate_proj.bias", "model.layers.40.mlp.up_proj.bias", "model.layers.41.self_attn.k_proj.bias", "model.layers.41.self_attn.o_proj.bias", "model.layers.41.self_attn.q_proj.bias", "model.layers.41.self_attn.v_proj.bias", "model.layers.41.mlp.down_proj.bias", "model.layers.41.mlp.gate_proj.bias", "model.layers.41.mlp.up_proj.bias", "model.layers.42.self_attn.k_proj.bias", "model.layers.42.self_attn.o_proj.bias", "model.layers.42.self_attn.q_proj.bias", "model.layers.42.self_attn.v_proj.bias", "model.layers.42.mlp.down_proj.bias", "model.layers.42.mlp.gate_proj.bias", "model.layers.42.mlp.up_proj.bias", "model.layers.43.self_attn.k_proj.bias", "model.layers.43.self_attn.o_proj.bias", "model.layers.43.self_attn.q_proj.bias", "model.layers.43.self_attn.v_proj.bias", "model.layers.43.mlp.down_proj.bias", "model.layers.43.mlp.gate_proj.bias", "model.layers.43.mlp.up_proj.bias", "model.layers.44.self_attn.k_proj.bias", "model.layers.44.self_attn.o_proj.bias", "model.layers.44.self_attn.q_proj.bias", "model.layers.44.self_attn.v_proj.bias", "model.layers.44.mlp.down_proj.bias", "model.layers.44.mlp.gate_proj.bias", "model.layers.44.mlp.up_proj.bias", "model.layers.45.self_attn.k_proj.bias", "model.layers.45.self_attn.o_proj.bias", "model.layers.45.self_attn.q_proj.bias", "model.layers.45.self_attn.v_proj.bias", "model.layers.45.mlp.down_proj.bias", "model.layers.45.mlp.gate_proj.bias", "model.layers.45.mlp.up_proj.bias", "model.layers.46.self_attn.k_proj.bias", "model.layers.46.self_attn.o_proj.bias", "model.layers.46.self_attn.q_proj.bias", "model.layers.46.self_attn.v_proj.bias", "model.layers.46.mlp.down_proj.bias", "model.layers.46.mlp.gate_proj.bias", "model.layers.46.mlp.up_proj.bias", "model.layers.47.self_attn.k_proj.bias", "model.layers.47.self_attn.o_proj.bias", "model.layers.47.self_attn.q_proj.bias", "model.layers.47.self_attn.v_proj.bias", "model.layers.47.mlp.down_proj.bias", "model.layers.47.mlp.gate_proj.bias", "model.layers.47.mlp.up_proj.bias", "model.layers.48.self_attn.k_proj.bias", "model.layers.48.self_attn.o_proj.bias", "model.layers.48.self_attn.q_proj.bias", "model.layers.48.self_attn.v_proj.bias", "model.layers.48.mlp.down_proj.bias", "model.layers.48.mlp.gate_proj.bias", "model.layers.48.mlp.up_proj.bias", "model.layers.49.self_attn.k_proj.bias", "model.layers.49.self_attn.o_proj.bias", "model.layers.49.self_attn.q_proj.bias", "model.layers.49.self_attn.v_proj.bias", "model.layers.49.mlp.down_proj.bias", "model.layers.49.mlp.gate_proj.bias", "model.layers.49.mlp.up_proj.bias", "model.layers.50.self_attn.k_proj.bias", "model.layers.50.self_attn.o_proj.bias", "model.layers.50.self_attn.q_proj.bias", "model.layers.50.self_attn.v_proj.bias", "model.layers.50.mlp.down_proj.bias", "model.layers.50.mlp.gate_proj.bias", "model.layers.50.mlp.up_proj.bias", "model.layers.51.self_attn.k_proj.bias", "model.layers.51.self_attn.o_proj.bias", "model.layers.51.self_attn.q_proj.bias", "model.layers.51.self_attn.v_proj.bias", "model.layers.51.mlp.down_proj.bias", "model.layers.51.mlp.gate_proj.bias", "model.layers.51.mlp.up_proj.bias", "model.layers.52.self_attn.k_proj.bias", "model.layers.52.self_attn.o_proj.bias", "model.layers.52.self_attn.q_proj.bias", "model.layers.52.self_attn.v_proj.bias", "model.layers.52.mlp.down_proj.bias", "model.layers.52.mlp.gate_proj.bias", "model.layers.52.mlp.up_proj.bias", "model.layers.53.self_attn.k_proj.bias", "model.layers.53.self_attn.o_proj.bias", "model.layers.53.self_attn.q_proj.bias", "model.layers.53.self_attn.v_proj.bias", "model.layers.53.mlp.down_proj.bias", "model.layers.53.mlp.gate_proj.bias", "model.layers.53.mlp.up_proj.bias", "model.layers.54.self_attn.k_proj.bias", "model.layers.54.self_attn.o_proj.bias", "model.layers.54.self_attn.q_proj.bias", "model.layers.54.self_attn.v_proj.bias", "model.layers.54.mlp.down_proj.bias", "model.layers.54.mlp.gate_proj.bias", "model.layers.54.mlp.up_proj.bias", "model.layers.55.self_attn.k_proj.bias", "model.layers.55.self_attn.o_proj.bias", "model.layers.55.self_attn.q_proj.bias", "model.layers.55.self_attn.v_proj.bias", "model.layers.55.mlp.down_proj.bias", "model.layers.55.mlp.gate_proj.bias", "model.layers.55.mlp.up_proj.bias", "model.layers.56.self_attn.k_proj.bias", "model.layers.56.self_attn.o_proj.bias", "model.layers.56.self_attn.q_proj.bias", "model.layers.56.self_attn.v_proj.bias", "model.layers.56.mlp.down_proj.bias", "model.layers.56.mlp.gate_proj.bias", "model.layers.56.mlp.up_proj.bias", "model.layers.57.self_attn.k_proj.bias", "model.layers.57.self_attn.o_proj.bias", "model.layers.57.self_attn.q_proj.bias", "model.layers.57.self_attn.v_proj.bias", "model.layers.57.mlp.down_proj.bias", "model.layers.57.mlp.gate_proj.bias", "model.layers.57.mlp.up_proj.bias", "model.layers.58.self_attn.k_proj.bias", "model.layers.58.self_attn.o_proj.bias", "model.layers.58.self_attn.q_proj.bias", "model.layers.58.self_attn.v_proj.bias", "model.layers.58.mlp.down_proj.bias", "model.layers.58.mlp.gate_proj.bias", "model.layers.58.mlp.up_proj.bias", "model.layers.59.self_attn.k_proj.bias", "model.layers.59.self_attn.o_proj.bias", "model.layers.59.self_attn.q_proj.bias", "model.layers.59.self_attn.v_proj.bias", "model.layers.59.mlp.down_proj.bias", "model.layers.59.mlp.gate_proj.bias", "model.layers.59.mlp.up_proj.bias".

I think the triton branch without the triton dependency falls back to basically the same code that the pytorch branch had, at least performance is the same. However, the latest CUDA branch is literally slower than the triton fallback by a factor of 10 on my machine (single 3090), and the triton fallback is slower by a factor of 4-5 than the CUDA commit from a few days ago.

Also yes, --act-order + --groupsize works on the latest CUDA branch, for both quantization and inference.

@Anonym0us33
Copy link

Anonym0us33 commented Apr 2, 2023

Hi, I'm far below you guys but ive been trying to get this to work and i havn't slept since march. any idea when all this will be fixed?

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
conda create -n textgen python=3.10.9
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
mkdir repositories
cd repositories/
git clone -b cuda https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
cd GPTQ-for-LLaMa
pip install ninja
conda install -c conda-forge cudatoolkit-dev
git reset --hard 608f3ba71e40596c75f8864d73506eaf57323c6e
python setup_cuda.py install
cd ../../


python server.py --model llama-7b-4bit-HF-128

python server.py \
--wbits 4 \
--model llama-7b-4bit-HF-128

python server.py \
--wbits 4 \
--model alpaca7B

=BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
...
ly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)

    size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([1, 11008]).

@oobabooga
Copy link
Owner

Please use my fork of GPTQ-for-LLaMa. It corresponds to commit a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773 in the cuda branch.

# activate the conda environment
conda activate textgen

# remove the existing GPT-for-LLaMa
cd text-generation-webui/repositories
rm -rf GPTQ-for-LLaMa
pip uninstall quant-cuda

# reinstall
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

I will keep using this until qwopqwop's branch stabilizes. Upstream changes will not be supported. This works with @USBhost's torrents for llama that are linked here.

@BadisG
Copy link
Contributor

BadisG commented Apr 2, 2023

Does having a model quantized with Triton necessarily require the creation of a GPTQ-for-LLaMa repository with the Triton branch?

@cellophane cellophane mentioned this issue Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants