make_quant() got an unexpected keyword argument 'faster' #667

tqman · 2023-03-30T21:51:43Z

Describe the bug

When trying to run 4bit 128g models, I'm getting the following error:

TypeError: make_quant() got an unexpected keyword argument 'faster'

Apologies if I've just screwed something up on install. I've been through the instructions several times and think I've gotten everything.

Non 4bit-128 models load fine. If it matters, I'm running under WSL2 under Windows 11.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Install by cloning the tip from GitHub, and try to run a 4bit-128 model.

Screenshot

No response

Logs

`(textgen) quark@darwin:/mnt/d/bitbot/text-generation-webui$ python server.py --auto-devices --wbits 4 --groupsize 128 --model llama-30b-4bit-128g --cai-chat --verbose --listen-port 7874

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/quark/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/quark/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading llama-30b-4bit-128g...
Traceback (most recent call last):
  File "/mnt/d/bitbot/text-generation-webui/server.py", line 274, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/mnt/d/bitbot/text-generation-webui/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/mnt/d/bitbot/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/mnt/d/bitbot/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'`

System Info

Ubuntu 22.04.2 on WSL 2 on Windows 11. GPU is Nvidia GTX 3090.

The text was updated successfully, but these errors were encountered:

EyeDeck · 2023-03-30T22:11:53Z

(2023-04-02) See: #667 (comment)

old (2023-04-01)

``` cd repositories/GPTQ-for-LLaMA git checkout cuda ```

(2023-04-01) The CUDA branch might've broken old quantizations again (they're crashing for me anyway), so if you want to keep using e.g. the ones @USBhost shared here or here then also do:

git reset --hard 608f3ba71e40596c75f8864d73506eaf57323c6e

Then finally:

pip install -r requirements.txt

If that fails, open up requirements.txt and remove the triton==2.0.0 line, which is erroneously included in the latest commit as of writing, then rerun that command.

python setup_cuda.py install

GPTQ-for-LLaMA changed the default branch today, do that to set it back. Evidently it's not backwards-compatible when called externally e.g. from here, and I think models might need to be requantized again to work with the new branch anyway.

Btw the new branch supports --act-order + --groupsize simultaneously, and I did some LLaMA 30B (--wbits 4 --true-sequential) runs last night and my perplexity scores were:

wikitext2

4.322 (--groupsize 128, this one reevaluated)
4.321 (--act-order --groupsize 1024)
4.298 (--groupsize 128, new requantization from last night)
4.236 (--act-order --groupsize 128)

ptb-new

8.427 (--groupsize 128, reevaluated)
8.413 (--groupsize 128, new)
8.355 (--act-order --groupsize 1024)
8.245 (--act-order --groupsize 128)

c4-new

6.315 (--groupsize 128, reevaluated)
6.304 (--act-order --groupsize 1024)
6.301 (--groupsize 128, new)
6.235 (--act-order --groupsize 128)

The --act-order --groupsize 128 numbers in particular are a sizeable improvement vs the old model without --act-order.

However the new branch evidently needs triton to be installed to not run really slowly, without which it's around 1/4th as fast at inference vs the old cuda branch on my machine. Triton doesn't support Windows natively, and I haven't gotten around to setting up WSL to test it out myself yet.

tqman · 2023-03-30T22:30:42Z

That fixed it, thank you very much! I was pretty sure it was a recent change somewhere but I'm not familiar enough with all these pieces to quickly figure out where.

oobabooga · 2023-03-30T23:31:00Z

Eh, so I guess I can't just get away with keep using the cuda branch.

deece · 2023-04-01T10:03:22Z

FYI, this is the commit that breaks things: qwopqwop200/GPTQ-for-LLaMa@f1af89a

EyeDeck · 2023-04-01T10:22:36Z

That same change is on the latest CUDA branch too now btw

deece · 2023-04-01T10:59:29Z

Yup :/ qwopqwop200/GPTQ-for-LLaMa@f1af89a

Here's the previous commit 608f3ba71e40596c75f8864d73506eaf57323c6e

LoopControl · 2023-04-01T13:50:52Z

Just got this same error with latest cuda branch.

I think it's time to lock GPTQ to an exact (working) commit in the requirements file as GPTQ breaking changes seem to happen every day.

Edit: Rolling back to the GPTQ commit @deece mentioned works ( 608f3ba71e40596c75f8864d73506eaf57323c6e ).

USBhost · 2023-04-01T14:03:17Z

https://github.com/oobabooga/text-generation-webui/blob/main/modules/GPTQ_loader.py#L36

Remove , faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

USBhost · 2023-04-01T14:05:38Z

Also as far as I know the old models I made should still work.

BadisG · 2023-04-01T15:26:49Z

https://github.com/oobabooga/text-generation-webui/blob/main/modules/GPTQ_loader.py#L36

Remove , faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

I got this after removing faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold

(textgen) adduser@DESKTOP-ESLT88B:/mnt/d/Large Language Models/text-generation-webui$ python server.py --model llama-7b-128g --wbits 4 --groupsize 128
Loading llama-7b-128g...
Loading model ...



Traceback (most recent call last):
  File "/mnt/d/Large Language Models/text-generation-webui/server.py", line 275, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/mnt/d/Large Language Models/text-generation-webui/modules/GPTQ_loader.py", line 43, in _load_quant
    model.load_state_dict(safe_load(checkpoint))
  File "/home/adduser/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx", "model.layers.0.self_attn.q_proj.g_idx", "model.layers.0.self_attn.v_proj.g_idx", "model.layers.0.mlp.down_proj.g_idx", "model.layers.0.mlp.gate_proj.g_idx", "model.layers.0.mlp.up_proj.g_idx", "model.layers.1.self_attn.k_proj.g_idx", "model.layers.1.self_attn.o_proj.g_idx", "model.layers.1.self_attn.q_proj.g_idx", "model.layers.1.self_attn.v_proj.g_idx", "model.layers.1.mlp.down_proj.g_idx", "model.layers.1.mlp.gate_proj.g_idx", "model.layers.1.mlp.up_proj.g_idx", "model.layers.2.self_attn.k_proj.g_idx", "model.layers.2.self_attn.o_proj.g_idx", "model.layers.2.self_attn.q_proj.g_idx", "model.layers.2.self_attn.v_proj.g_idx", "model.layers.2.mlp.down_proj.g_idx", "model.layers.2.mlp.gate_proj.g_idx", "model.layers.2.mlp.up_proj.g_idx", "model.layers.3.self_attn.k_proj.g_idx", "model.layers.3.self_attn.o_proj.g_idx", "model.layers.3.self_attn.q_proj.g_idx", "model.layers.3.self_attn.v_proj.g_idx", "model.layers.3.mlp.down_proj.g_idx", "model.layers.3.mlp.gate_proj.g_idx", "model.layers.3.mlp.up_proj.g_idx", "model.layers.4.self_attn.k_proj.g_idx", "model.layers.4.self_attn.o_proj.g_idx", "model.layers.4.self_attn.q_proj.g_idx", "model.layers.4.self_attn.v_proj.g_idx", "model.layers.4.mlp.down_proj.g_idx", "model.layers.4.mlp.gate_proj.g_idx", "model.layers.4.mlp.up_proj.g_idx", "model.layers.5.self_attn.k_proj.g_idx", "model.layers.5.self_attn.o_proj.g_idx", "model.layers.5.self_attn.q_proj.g_idx", "model.layers.5.self_attn.v_proj.g_idx", "model.layers.5.mlp.down_proj.g_idx", "model.layers.5.mlp.gate_proj.g_idx", "model.layers.5.mlp.up_proj.g_idx", "model.layers.6.self_attn.k_proj.g_idx", "model.layers.6.self_attn.o_proj.g_idx", "model.layers.6.self_attn.q_proj.g_idx", "model.layers.6.self_attn.v_proj.g_idx", "model.layers.6.mlp.down_proj.g_idx", "model.layers.6.mlp.gate_proj.g_idx", "model.layers.6.mlp.up_proj.g_idx", "model.layers.7.self_attn.k_proj.g_idx", "model.layers.7.self_attn.o_proj.g_idx", "model.layers.7.self_attn.q_proj.g_idx", "model.layers.7.self_attn.v_proj.g_idx", "model.layers.7.mlp.down_proj.g_idx", "model.layers.7.mlp.gate_proj.g_idx", "model.layers.7.mlp.up_proj.g_idx", "model.layers.8.self_attn.k_proj.g_idx", "model.layers.8.self_attn.o_proj.g_idx", "model.layers.8.self_attn.q_proj.g_idx", "model.layers.8.self_attn.v_proj.g_idx", "model.layers.8.mlp.down_proj.g_idx", "model.layers.8.mlp.gate_proj.g_idx", "model.layers.8.mlp.up_proj.g_idx", "model.layers.9.self_attn.k_proj.g_idx", "model.layers.9.self_attn.o_proj.g_idx", "model.layers.9.self_attn.q_proj.g_idx", "model.layers.9.self_attn.v_proj.g_idx", "model.layers.9.mlp.down_proj.g_idx", "model.layers.9.mlp.gate_proj.g_idx", "model.layers.9.mlp.up_proj.g_idx", "model.layers.10.self_attn.k_proj.g_idx", "model.layers.10.self_attn.o_proj.g_idx", "model.layers.10.self_attn.q_proj.g_idx", "model.layers.10.self_attn.v_proj.g_idx", "model.layers.10.mlp.down_proj.g_idx", "model.layers.10.mlp.gate_proj.g_idx", "model.layers.10.mlp.up_proj.g_idx", "model.layers.11.self_attn.k_proj.g_idx", "model.layers.11.self_attn.o_proj.g_idx", "model.layers.11.self_attn.q_proj.g_idx", "model.layers.11.self_attn.v_proj.g_idx", "model.layers.11.mlp.down_proj.g_idx", "model.layers.11.mlp.gate_proj.g_idx", "model.layers.11.mlp.up_proj.g_idx", "model.layers.12.self_attn.k_proj.g_idx", "model.layers.12.self_attn.o_proj.g_idx", "model.layers.12.self_attn.q_proj.g_idx", "model.layers.12.self_attn.v_proj.g_idx", "model.layers.12.mlp.down_proj.g_idx", "model.layers.12.mlp.gate_proj.g_idx", "model.layers.12.mlp.up_proj.g_idx", "model.layers.13.self_attn.k_proj.g_idx", "model.layers.13.self_attn.o_proj.g_idx", "model.layers.13.self_attn.q_proj.g_idx", "model.layers.13.self_attn.v_proj.g_idx", "model.layers.13.mlp.down_proj.g_idx", "model.layers.13.mlp.gate_proj.g_idx", "model.layers.13.mlp.up_proj.g_idx", "model.layers.14.self_attn.k_proj.g_idx", "model.layers.14.self_attn.o_proj.g_idx", "model.layers.14.self_attn.q_proj.g_idx", "model.layers.14.self_attn.v_proj.g_idx", "model.layers.14.mlp.down_proj.g_idx", "model.layers.14.mlp.gate_proj.g_idx", "model.layers.14.mlp.up_proj.g_idx", "model.layers.15.self_attn.k_proj.g_idx", "model.layers.15.self_attn.o_proj.g_idx", "model.layers.15.self_attn.q_proj.g_idx", "model.layers.15.self_attn.v_proj.g_idx", "model.layers.15.mlp.down_proj.g_idx", "model.layers.15.mlp.gate_proj.g_idx", "model.layers.15.mlp.up_proj.g_idx", "model.layers.16.self_attn.k_proj.g_idx", "model.layers.16.self_attn.o_proj.g_idx", "model.layers.16.self_attn.q_proj.g_idx", "model.layers.16.self_attn.v_proj.g_idx", "model.layers.16.mlp.down_proj.g_idx", "model.layers.16.mlp.gate_proj.g_idx", "model.layers.16.mlp.up_proj.g_idx", "model.layers.17.self_attn.k_proj.g_idx", "model.layers.17.self_attn.o_proj.g_idx", "model.layers.17.self_attn.q_proj.g_idx", "model.layers.17.self_attn.v_proj.g_idx", "model.layers.17.mlp.down_proj.g_idx", "model.layers.17.mlp.gate_proj.g_idx", "model.layers.17.mlp.up_proj.g_idx", "model.layers.18.self_attn.k_proj.g_idx", "model.layers.18.self_attn.o_proj.g_idx", "model.layers.18.self_attn.q_proj.g_idx", "model.layers.18.self_attn.v_proj.g_idx", "model.layers.18.mlp.down_proj.g_idx", "model.layers.18.mlp.gate_proj.g_idx", "model.layers.18.mlp.up_proj.g_idx", "model.layers.19.self_attn.k_proj.g_idx", "model.layers.19.self_attn.o_proj.g_idx", "model.layers.19.self_attn.q_proj.g_idx", "model.layers.19.self_attn.v_proj.g_idx", "model.layers.19.mlp.down_proj.g_idx", "model.layers.19.mlp.gate_proj.g_idx", "model.layers.19.mlp.up_proj.g_idx", "model.layers.20.self_attn.k_proj.g_idx", "model.layers.20.self_attn.o_proj.g_idx", "model.layers.20.self_attn.q_proj.g_idx", "model.layers.20.self_attn.v_proj.g_idx", "model.layers.20.mlp.down_proj.g_idx", "model.layers.20.mlp.gate_proj.g_idx", "model.layers.20.mlp.up_proj.g_idx", "model.layers.21.self_attn.k_proj.g_idx", "model.layers.21.self_attn.o_proj.g_idx", "model.layers.21.self_attn.q_proj.g_idx", "model.layers.21.self_attn.v_proj.g_idx", "model.layers.21.mlp.down_proj.g_idx", "model.layers.21.mlp.gate_proj.g_idx", "model.layers.21.mlp.up_proj.g_idx", "model.layers.22.self_attn.k_proj.g_idx", "model.layers.22.self_attn.o_proj.g_idx", "model.layers.22.self_attn.q_proj.g_idx", "model.layers.22.self_attn.v_proj.g_idx", "model.layers.22.mlp.down_proj.g_idx", "model.layers.22.mlp.gate_proj.g_idx", "model.layers.22.mlp.up_proj.g_idx", "model.layers.23.self_attn.k_proj.g_idx", "model.layers.23.self_attn.o_proj.g_idx", "model.layers.23.self_attn.q_proj.g_idx", "model.layers.23.self_attn.v_proj.g_idx", "model.layers.23.mlp.down_proj.g_idx", "model.layers.23.mlp.gate_proj.g_idx", "model.layers.23.mlp.up_proj.g_idx", "model.layers.24.self_attn.k_proj.g_idx", "model.layers.24.self_attn.o_proj.g_idx", "model.layers.24.self_attn.q_proj.g_idx", "model.layers.24.self_attn.v_proj.g_idx", "model.layers.24.mlp.down_proj.g_idx", "model.layers.24.mlp.gate_proj.g_idx", "model.layers.24.mlp.up_proj.g_idx", "model.layers.25.self_attn.k_proj.g_idx", "model.layers.25.self_attn.o_proj.g_idx", "model.layers.25.self_attn.q_proj.g_idx", "model.layers.25.self_attn.v_proj.g_idx", "model.layers.25.mlp.down_proj.g_idx", "model.layers.25.mlp.gate_proj.g_idx", "model.layers.25.mlp.up_proj.g_idx", "model.layers.26.self_attn.k_proj.g_idx", "model.layers.26.self_attn.o_proj.g_idx", "model.layers.26.self_attn.q_proj.g_idx", "model.layers.26.self_attn.v_proj.g_idx", "model.layers.26.mlp.down_proj.g_idx", "model.layers.26.mlp.gate_proj.g_idx", "model.layers.26.mlp.up_proj.g_idx", "model.layers.27.self_attn.k_proj.g_idx", "model.layers.27.self_attn.o_proj.g_idx", "model.layers.27.self_attn.q_proj.g_idx", "model.layers.27.self_attn.v_proj.g_idx", "model.layers.27.mlp.down_proj.g_idx", "model.layers.27.mlp.gate_proj.g_idx", "model.layers.27.mlp.up_proj.g_idx", "model.layers.28.self_attn.k_proj.g_idx", "model.layers.28.self_attn.o_proj.g_idx", "model.layers.28.self_attn.q_proj.g_idx", "model.layers.28.self_attn.v_proj.g_idx", "model.layers.28.mlp.down_proj.g_idx", "model.layers.28.mlp.gate_proj.g_idx", "model.layers.28.mlp.up_proj.g_idx", "model.layers.29.self_attn.k_proj.g_idx", "model.layers.29.self_attn.o_proj.g_idx", "model.layers.29.self_attn.q_proj.g_idx", "model.layers.29.self_attn.v_proj.g_idx", "model.layers.29.mlp.down_proj.g_idx", "model.layers.29.mlp.gate_proj.g_idx", "model.layers.29.mlp.up_proj.g_idx", "model.layers.30.self_attn.k_proj.g_idx", "model.layers.30.self_attn.o_proj.g_idx", "model.layers.30.self_attn.q_proj.g_idx", "model.layers.30.self_attn.v_proj.g_idx", "model.layers.30.mlp.down_proj.g_idx", "model.layers.30.mlp.gate_proj.g_idx", "model.layers.30.mlp.up_proj.g_idx", "model.layers.31.self_attn.k_proj.g_idx", "model.layers.31.self_attn.o_proj.g_idx", "model.layers.31.self_attn.q_proj.g_idx", "model.layers.31.self_attn.v_proj.g_idx", "model.layers.31.mlp.down_proj.g_idx", "model.layers.31.mlp.gate_proj.g_idx", "model.layers.31.mlp.up_proj.g_idx".
        Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.up_proj.bias", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.up_proj.bias", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.up_proj.bias", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.up_proj.bias", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.up_proj.bias", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.up_proj.bias", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.up_proj.bias", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.up_proj.bias", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.up_proj.bias", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.up_proj.bias", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.up_proj.bias", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.up_proj.bias", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.up_proj.bias", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.up_proj.bias", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.up_proj.bias", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.up_proj.bias", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.up_proj.bias", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.up_proj.bias", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.up_proj.bias", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.up_proj.bias", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.up_proj.bias", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.up_proj.bias", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.up_proj.bias", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.up_proj.bias", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.up_proj.bias", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.up_proj.bias", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.up_proj.bias", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.up_proj.bias", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.up_proj.bias", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.up_proj.bias", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.up_proj.bias", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.up_proj.bias".

This is getting ridiculous 😓

EyeDeck · 2023-04-01T16:48:56Z

Yep, same error as above for me on the ungrouped and 128g 4-bit models from USBhost.

The quantizations I did a few days ago on the short-lived pytorch branch work, as do some re-runs from last night (on the CUDA branch, this commit)...but performance is abysmal. Getting a whopping 0.11 tokens/s with ~1850 context size, while the older quant on the older commit does closer to 4-5/s.

BadisG · 2023-04-01T16:52:37Z

I don't even know why there's 2 branchs, triton and cuda? Wouldn't cuda be the fastest one? Why should we go for the slow version? lol

USBhost · 2023-04-01T16:53:36Z

uninstall transformers and reinstall.

USBhost · 2023-04-01T16:55:05Z

I don't even know why there's 2 branchs, triton and cuda? Wouldn't cuda be the fastest one? Why should we go for the slow version? lol

triton supports more features.

BadisG · 2023-04-01T17:00:14Z

@USBhost like what?
And if triton is as slow as the old pytorch branch I don't really see the point

EyeDeck · 2023-04-01T17:03:18Z

No change after reinstalling transformers.

USBhost · 2023-04-01T17:09:26Z

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.

Also I feel the pytorch branch was broken. That thing was worse than the delay.

USBhost · 2023-04-01T17:10:29Z

@EyeDeck let me see. Does the groupsize have the same issue?

BadisG · 2023-04-01T17:11:26Z

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.

Also I feel the pytorch branch was broken. That thing was worse than the delay.

I think it works on the cuda branch now, you can combine all the gptq implementations
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

USBhost · 2023-04-01T17:21:03Z

@USBhost like what? And if triton is as slow as the old pytorch branch I don't really see the point

Groupsize+act-order together.
Also I feel the pytorch branch was broken. That thing was worse than the delay.

I think it works on the cuda branch now, you can combine all the gptq implementations https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

Well that's new. Hard to know when the comments are called update (filename).

BadisG · 2023-04-01T17:22:14Z

@USBhost I know right... I knew this by looking at the new readme, now he removed the "you can't combine act order and groupesize 128 together" 😅

EyeDeck · 2023-04-01T17:25:37Z

@USBhost Same error with your ungrouped 30B and then slightly newer 128g 30B:

  File "G:\miniconda\envs\tgwui\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.g_idx", "model.layers.0.self_attn.o_proj.g_idx", "model.layers.0.self_attn.q_proj.g_idx", "model.layers.0.self_attn.v_proj.g_idx", "model.layers.0.mlp.down_proj.g_idx", "model.layers.0.mlp.gate_proj.g_idx", "model.layers.0.mlp.up_proj.g_idx", "model.layers.1.self_attn.k_proj.g_idx", "model.layers.1.self_attn.o_proj.g_idx", "model.layers.1.self_attn.q_proj.g_idx", "model.layers.1.self_attn.v_proj.g_idx", "model.layers.1.mlp.down_proj.g_idx", "model.layers.1.mlp.gate_proj.g_idx", "model.layers.1.mlp.up_proj.g_idx", "model.layers.2.self_attn.k_proj.g_idx", "model.layers.2.self_attn.o_proj.g_idx", "model.layers.2.self_attn.q_proj.g_idx", "model.layers.2.self_attn.v_proj.g_idx", "model.layers.2.mlp.down_proj.g_idx", "model.layers.2.mlp.gate_proj.g_idx", "model.layers.2.mlp.up_proj.g_idx", "model.layers.3.self_attn.k_proj.g_idx", "model.layers.3.self_attn.o_proj.g_idx", "model.layers.3.self_attn.q_proj.g_idx", "model.layers.3.self_attn.v_proj.g_idx", "model.layers.3.mlp.down_proj.g_idx", "model.layers.3.mlp.gate_proj.g_idx", "model.layers.3.mlp.up_proj.g_idx", "model.layers.4.self_attn.k_proj.g_idx", "model.layers.4.self_attn.o_proj.g_idx", "model.layers.4.self_attn.q_proj.g_idx", "model.layers.4.self_attn.v_proj.g_idx", "model.layers.4.mlp.down_proj.g_idx", "model.layers.4.mlp.gate_proj.g_idx", "model.layers.4.mlp.up_proj.g_idx", "model.layers.5.self_attn.k_proj.g_idx", "model.layers.5.self_attn.o_proj.g_idx", "model.layers.5.self_attn.q_proj.g_idx", "model.layers.5.self_attn.v_proj.g_idx", "model.layers.5.mlp.down_proj.g_idx", "model.layers.5.mlp.gate_proj.g_idx", "model.layers.5.mlp.up_proj.g_idx", "model.layers.6.self_attn.k_proj.g_idx", "model.layers.6.self_attn.o_proj.g_idx", "model.layers.6.self_attn.q_proj.g_idx", "model.layers.6.self_attn.v_proj.g_idx", "model.layers.6.mlp.down_proj.g_idx", "model.layers.6.mlp.gate_proj.g_idx", "model.layers.6.mlp.up_proj.g_idx", "model.layers.7.self_attn.k_proj.g_idx", "model.layers.7.self_attn.o_proj.g_idx", "model.layers.7.self_attn.q_proj.g_idx", "model.layers.7.self_attn.v_proj.g_idx", "model.layers.7.mlp.down_proj.g_idx", "model.layers.7.mlp.gate_proj.g_idx", "model.layers.7.mlp.up_proj.g_idx", "model.layers.8.self_attn.k_proj.g_idx", "model.layers.8.self_attn.o_proj.g_idx", "model.layers.8.self_attn.q_proj.g_idx", "model.layers.8.self_attn.v_proj.g_idx", "model.layers.8.mlp.down_proj.g_idx", "model.layers.8.mlp.gate_proj.g_idx", "model.layers.8.mlp.up_proj.g_idx", "model.layers.9.self_attn.k_proj.g_idx", "model.layers.9.self_attn.o_proj.g_idx", "model.layers.9.self_attn.q_proj.g_idx", "model.layers.9.self_attn.v_proj.g_idx", "model.layers.9.mlp.down_proj.g_idx", "model.layers.9.mlp.gate_proj.g_idx", "model.layers.9.mlp.up_proj.g_idx", "model.layers.10.self_attn.k_proj.g_idx", "model.layers.10.self_attn.o_proj.g_idx", "model.layers.10.self_attn.q_proj.g_idx", "model.layers.10.self_attn.v_proj.g_idx", "model.layers.10.mlp.down_proj.g_idx", "model.layers.10.mlp.gate_proj.g_idx", "model.layers.10.mlp.up_proj.g_idx", "model.layers.11.self_attn.k_proj.g_idx", "model.layers.11.self_attn.o_proj.g_idx", "model.layers.11.self_attn.q_proj.g_idx", "model.layers.11.self_attn.v_proj.g_idx", "model.layers.11.mlp.down_proj.g_idx", "model.layers.11.mlp.gate_proj.g_idx", "model.layers.11.mlp.up_proj.g_idx", "model.layers.12.self_attn.k_proj.g_idx", "model.layers.12.self_attn.o_proj.g_idx", "model.layers.12.self_attn.q_proj.g_idx", "model.layers.12.self_attn.v_proj.g_idx", "model.layers.12.mlp.down_proj.g_idx", "model.layers.12.mlp.gate_proj.g_idx", "model.layers.12.mlp.up_proj.g_idx", "model.layers.13.self_attn.k_proj.g_idx", "model.layers.13.self_attn.o_proj.g_idx", "model.layers.13.self_attn.q_proj.g_idx", "model.layers.13.self_attn.v_proj.g_idx", "model.layers.13.mlp.down_proj.g_idx", "model.layers.13.mlp.gate_proj.g_idx", "model.layers.13.mlp.up_proj.g_idx", "model.layers.14.self_attn.k_proj.g_idx", "model.layers.14.self_attn.o_proj.g_idx", "model.layers.14.self_attn.q_proj.g_idx", "model.layers.14.self_attn.v_proj.g_idx", "model.layers.14.mlp.down_proj.g_idx", "model.layers.14.mlp.gate_proj.g_idx", "model.layers.14.mlp.up_proj.g_idx", "model.layers.15.self_attn.k_proj.g_idx", "model.layers.15.self_attn.o_proj.g_idx", "model.layers.15.self_attn.q_proj.g_idx", "model.layers.15.self_attn.v_proj.g_idx", "model.layers.15.mlp.down_proj.g_idx", "model.layers.15.mlp.gate_proj.g_idx", "model.layers.15.mlp.up_proj.g_idx", "model.layers.16.self_attn.k_proj.g_idx", "model.layers.16.self_attn.o_proj.g_idx", "model.layers.16.self_attn.q_proj.g_idx", "model.layers.16.self_attn.v_proj.g_idx", "model.layers.16.mlp.down_proj.g_idx", "model.layers.16.mlp.gate_proj.g_idx", "model.layers.16.mlp.up_proj.g_idx", "model.layers.17.self_attn.k_proj.g_idx", "model.layers.17.self_attn.o_proj.g_idx", "model.layers.17.self_attn.q_proj.g_idx", "model.layers.17.self_attn.v_proj.g_idx", "model.layers.17.mlp.down_proj.g_idx", "model.layers.17.mlp.gate_proj.g_idx", "model.layers.17.mlp.up_proj.g_idx", "model.layers.18.self_attn.k_proj.g_idx", "model.layers.18.self_attn.o_proj.g_idx", "model.layers.18.self_attn.q_proj.g_idx", "model.layers.18.self_attn.v_proj.g_idx", "model.layers.18.mlp.down_proj.g_idx", "model.layers.18.mlp.gate_proj.g_idx", "model.layers.18.mlp.up_proj.g_idx", "model.layers.19.self_attn.k_proj.g_idx", "model.layers.19.self_attn.o_proj.g_idx", "model.layers.19.self_attn.q_proj.g_idx", "model.layers.19.self_attn.v_proj.g_idx", "model.layers.19.mlp.down_proj.g_idx", "model.layers.19.mlp.gate_proj.g_idx", "model.layers.19.mlp.up_proj.g_idx", "model.layers.20.self_attn.k_proj.g_idx", "model.layers.20.self_attn.o_proj.g_idx", "model.layers.20.self_attn.q_proj.g_idx", "model.layers.20.self_attn.v_proj.g_idx", "model.layers.20.mlp.down_proj.g_idx", "model.layers.20.mlp.gate_proj.g_idx", "model.layers.20.mlp.up_proj.g_idx", "model.layers.21.self_attn.k_proj.g_idx", "model.layers.21.self_attn.o_proj.g_idx", "model.layers.21.self_attn.q_proj.g_idx", "model.layers.21.self_attn.v_proj.g_idx", "model.layers.21.mlp.down_proj.g_idx", "model.layers.21.mlp.gate_proj.g_idx", "model.layers.21.mlp.up_proj.g_idx", "model.layers.22.self_attn.k_proj.g_idx", "model.layers.22.self_attn.o_proj.g_idx", "model.layers.22.self_attn.q_proj.g_idx", "model.layers.22.self_attn.v_proj.g_idx", "model.layers.22.mlp.down_proj.g_idx", "model.layers.22.mlp.gate_proj.g_idx", "model.layers.22.mlp.up_proj.g_idx", "model.layers.23.self_attn.k_proj.g_idx", "model.layers.23.self_attn.o_proj.g_idx", "model.layers.23.self_attn.q_proj.g_idx", "model.layers.23.self_attn.v_proj.g_idx", "model.layers.23.mlp.down_proj.g_idx", "model.layers.23.mlp.gate_proj.g_idx", "model.layers.23.mlp.up_proj.g_idx", "model.layers.24.self_attn.k_proj.g_idx", "model.layers.24.self_attn.o_proj.g_idx", "model.layers.24.self_attn.q_proj.g_idx", "model.layers.24.self_attn.v_proj.g_idx", "model.layers.24.mlp.down_proj.g_idx", "model.layers.24.mlp.gate_proj.g_idx", "model.layers.24.mlp.up_proj.g_idx", "model.layers.25.self_attn.k_proj.g_idx", "model.layers.25.self_attn.o_proj.g_idx", "model.layers.25.self_attn.q_proj.g_idx", "model.layers.25.self_attn.v_proj.g_idx", "model.layers.25.mlp.down_proj.g_idx", "model.layers.25.mlp.gate_proj.g_idx", "model.layers.25.mlp.up_proj.g_idx", "model.layers.26.self_attn.k_proj.g_idx", "model.layers.26.self_attn.o_proj.g_idx", "model.layers.26.self_attn.q_proj.g_idx", "model.layers.26.self_attn.v_proj.g_idx", "model.layers.26.mlp.down_proj.g_idx", "model.layers.26.mlp.gate_proj.g_idx", "model.layers.26.mlp.up_proj.g_idx", "model.layers.27.self_attn.k_proj.g_idx", "model.layers.27.self_attn.o_proj.g_idx", "model.layers.27.self_attn.q_proj.g_idx", "model.layers.27.self_attn.v_proj.g_idx", "model.layers.27.mlp.down_proj.g_idx", "model.layers.27.mlp.gate_proj.g_idx", "model.layers.27.mlp.up_proj.g_idx", "model.layers.28.self_attn.k_proj.g_idx", "model.layers.28.self_attn.o_proj.g_idx", "model.layers.28.self_attn.q_proj.g_idx", "model.layers.28.self_attn.v_proj.g_idx", "model.layers.28.mlp.down_proj.g_idx", "model.layers.28.mlp.gate_proj.g_idx", "model.layers.28.mlp.up_proj.g_idx", "model.layers.29.self_attn.k_proj.g_idx", "model.layers.29.self_attn.o_proj.g_idx", "model.layers.29.self_attn.q_proj.g_idx", "model.layers.29.self_attn.v_proj.g_idx", "model.layers.29.mlp.down_proj.g_idx", "model.layers.29.mlp.gate_proj.g_idx", "model.layers.29.mlp.up_proj.g_idx", "model.layers.30.self_attn.k_proj.g_idx", "model.layers.30.self_attn.o_proj.g_idx", "model.layers.30.self_attn.q_proj.g_idx", "model.layers.30.self_attn.v_proj.g_idx", "model.layers.30.mlp.down_proj.g_idx", "model.layers.30.mlp.gate_proj.g_idx", "model.layers.30.mlp.up_proj.g_idx", "model.layers.31.self_attn.k_proj.g_idx", "model.layers.31.self_attn.o_proj.g_idx", "model.layers.31.self_attn.q_proj.g_idx", "model.layers.31.self_attn.v_proj.g_idx", "model.layers.31.mlp.down_proj.g_idx", "model.layers.31.mlp.gate_proj.g_idx", "model.layers.31.mlp.up_proj.g_idx", "model.layers.32.self_attn.k_proj.g_idx", "model.layers.32.self_attn.o_proj.g_idx", "model.layers.32.self_attn.q_proj.g_idx", "model.layers.32.self_attn.v_proj.g_idx", "model.layers.32.mlp.down_proj.g_idx", "model.layers.32.mlp.gate_proj.g_idx", "model.layers.32.mlp.up_proj.g_idx", "model.layers.33.self_attn.k_proj.g_idx", "model.layers.33.self_attn.o_proj.g_idx", "model.layers.33.self_attn.q_proj.g_idx", "model.layers.33.self_attn.v_proj.g_idx", "model.layers.33.mlp.down_proj.g_idx", "model.layers.33.mlp.gate_proj.g_idx", "model.layers.33.mlp.up_proj.g_idx", "model.layers.34.self_attn.k_proj.g_idx", "model.layers.34.self_attn.o_proj.g_idx", "model.layers.34.self_attn.q_proj.g_idx", "model.layers.34.self_attn.v_proj.g_idx", "model.layers.34.mlp.down_proj.g_idx", "model.layers.34.mlp.gate_proj.g_idx", "model.layers.34.mlp.up_proj.g_idx", "model.layers.35.self_attn.k_proj.g_idx", "model.layers.35.self_attn.o_proj.g_idx", "model.layers.35.self_attn.q_proj.g_idx", "model.layers.35.self_attn.v_proj.g_idx", "model.layers.35.mlp.down_proj.g_idx", "model.layers.35.mlp.gate_proj.g_idx", "model.layers.35.mlp.up_proj.g_idx", "model.layers.36.self_attn.k_proj.g_idx", "model.layers.36.self_attn.o_proj.g_idx", "model.layers.36.self_attn.q_proj.g_idx", "model.layers.36.self_attn.v_proj.g_idx", "model.layers.36.mlp.down_proj.g_idx", "model.layers.36.mlp.gate_proj.g_idx", "model.layers.36.mlp.up_proj.g_idx", "model.layers.37.self_attn.k_proj.g_idx", "model.layers.37.self_attn.o_proj.g_idx", "model.layers.37.self_attn.q_proj.g_idx", "model.layers.37.self_attn.v_proj.g_idx", "model.layers.37.mlp.down_proj.g_idx", "model.layers.37.mlp.gate_proj.g_idx", "model.layers.37.mlp.up_proj.g_idx", "model.layers.38.self_attn.k_proj.g_idx", "model.layers.38.self_attn.o_proj.g_idx", "model.layers.38.self_attn.q_proj.g_idx", "model.layers.38.self_attn.v_proj.g_idx", "model.layers.38.mlp.down_proj.g_idx", "model.layers.38.mlp.gate_proj.g_idx", "model.layers.38.mlp.up_proj.g_idx", "model.layers.39.self_attn.k_proj.g_idx", "model.layers.39.self_attn.o_proj.g_idx", "model.layers.39.self_attn.q_proj.g_idx", "model.layers.39.self_attn.v_proj.g_idx", "model.layers.39.mlp.down_proj.g_idx", "model.layers.39.mlp.gate_proj.g_idx", "model.layers.39.mlp.up_proj.g_idx", "model.layers.40.self_attn.k_proj.g_idx", "model.layers.40.self_attn.o_proj.g_idx", "model.layers.40.self_attn.q_proj.g_idx", "model.layers.40.self_attn.v_proj.g_idx", "model.layers.40.mlp.down_proj.g_idx", "model.layers.40.mlp.gate_proj.g_idx", "model.layers.40.mlp.up_proj.g_idx", "model.layers.41.self_attn.k_proj.g_idx", "model.layers.41.self_attn.o_proj.g_idx", "model.layers.41.self_attn.q_proj.g_idx", "model.layers.41.self_attn.v_proj.g_idx", "model.layers.41.mlp.down_proj.g_idx", "model.layers.41.mlp.gate_proj.g_idx", "model.layers.41.mlp.up_proj.g_idx", "model.layers.42.self_attn.k_proj.g_idx", "model.layers.42.self_attn.o_proj.g_idx", "model.layers.42.self_attn.q_proj.g_idx", "model.layers.42.self_attn.v_proj.g_idx", "model.layers.42.mlp.down_proj.g_idx", "model.layers.42.mlp.gate_proj.g_idx", "model.layers.42.mlp.up_proj.g_idx", "model.layers.43.self_attn.k_proj.g_idx", "model.layers.43.self_attn.o_proj.g_idx", "model.layers.43.self_attn.q_proj.g_idx", "model.layers.43.self_attn.v_proj.g_idx", "model.layers.43.mlp.down_proj.g_idx", "model.layers.43.mlp.gate_proj.g_idx", "model.layers.43.mlp.up_proj.g_idx", "model.layers.44.self_attn.k_proj.g_idx", "model.layers.44.self_attn.o_proj.g_idx", "model.layers.44.self_attn.q_proj.g_idx", "model.layers.44.self_attn.v_proj.g_idx", "model.layers.44.mlp.down_proj.g_idx", "model.layers.44.mlp.gate_proj.g_idx", "model.layers.44.mlp.up_proj.g_idx", "model.layers.45.self_attn.k_proj.g_idx", "model.layers.45.self_attn.o_proj.g_idx", "model.layers.45.self_attn.q_proj.g_idx", "model.layers.45.self_attn.v_proj.g_idx", "model.layers.45.mlp.down_proj.g_idx", "model.layers.45.mlp.gate_proj.g_idx", "model.layers.45.mlp.up_proj.g_idx", "model.layers.46.self_attn.k_proj.g_idx", "model.layers.46.self_attn.o_proj.g_idx", "model.layers.46.self_attn.q_proj.g_idx", "model.layers.46.self_attn.v_proj.g_idx", "model.layers.46.mlp.down_proj.g_idx", "model.layers.46.mlp.gate_proj.g_idx", "model.layers.46.mlp.up_proj.g_idx", "model.layers.47.self_attn.k_proj.g_idx", "model.layers.47.self_attn.o_proj.g_idx", "model.layers.47.self_attn.q_proj.g_idx", "model.layers.47.self_attn.v_proj.g_idx", "model.layers.47.mlp.down_proj.g_idx", "model.layers.47.mlp.gate_proj.g_idx", "model.layers.47.mlp.up_proj.g_idx", "model.layers.48.self_attn.k_proj.g_idx", "model.layers.48.self_attn.o_proj.g_idx", "model.layers.48.self_attn.q_proj.g_idx", "model.layers.48.self_attn.v_proj.g_idx", "model.layers.48.mlp.down_proj.g_idx", "model.layers.48.mlp.gate_proj.g_idx", "model.layers.48.mlp.up_proj.g_idx", "model.layers.49.self_attn.k_proj.g_idx", "model.layers.49.self_attn.o_proj.g_idx", "model.layers.49.self_attn.q_proj.g_idx", "model.layers.49.self_attn.v_proj.g_idx", "model.layers.49.mlp.down_proj.g_idx", "model.layers.49.mlp.gate_proj.g_idx", "model.layers.49.mlp.up_proj.g_idx", "model.layers.50.self_attn.k_proj.g_idx", "model.layers.50.self_attn.o_proj.g_idx", "model.layers.50.self_attn.q_proj.g_idx", "model.layers.50.self_attn.v_proj.g_idx", "model.layers.50.mlp.down_proj.g_idx", "model.layers.50.mlp.gate_proj.g_idx", "model.layers.50.mlp.up_proj.g_idx", "model.layers.51.self_attn.k_proj.g_idx", "model.layers.51.self_attn.o_proj.g_idx", "model.layers.51.self_attn.q_proj.g_idx", "model.layers.51.self_attn.v_proj.g_idx", "model.layers.51.mlp.down_proj.g_idx", "model.layers.51.mlp.gate_proj.g_idx", "model.layers.51.mlp.up_proj.g_idx", "model.layers.52.self_attn.k_proj.g_idx", "model.layers.52.self_attn.o_proj.g_idx", "model.layers.52.self_attn.q_proj.g_idx", "model.layers.52.self_attn.v_proj.g_idx", "model.layers.52.mlp.down_proj.g_idx", "model.layers.52.mlp.gate_proj.g_idx", "model.layers.52.mlp.up_proj.g_idx", "model.layers.53.self_attn.k_proj.g_idx", "model.layers.53.self_attn.o_proj.g_idx", "model.layers.53.self_attn.q_proj.g_idx", "model.layers.53.self_attn.v_proj.g_idx", "model.layers.53.mlp.down_proj.g_idx", "model.layers.53.mlp.gate_proj.g_idx", "model.layers.53.mlp.up_proj.g_idx", "model.layers.54.self_attn.k_proj.g_idx", "model.layers.54.self_attn.o_proj.g_idx", "model.layers.54.self_attn.q_proj.g_idx", "model.layers.54.self_attn.v_proj.g_idx", "model.layers.54.mlp.down_proj.g_idx", "model.layers.54.mlp.gate_proj.g_idx", "model.layers.54.mlp.up_proj.g_idx", "model.layers.55.self_attn.k_proj.g_idx", "model.layers.55.self_attn.o_proj.g_idx", "model.layers.55.self_attn.q_proj.g_idx", "model.layers.55.self_attn.v_proj.g_idx", "model.layers.55.mlp.down_proj.g_idx", "model.layers.55.mlp.gate_proj.g_idx", "model.layers.55.mlp.up_proj.g_idx", "model.layers.56.self_attn.k_proj.g_idx", "model.layers.56.self_attn.o_proj.g_idx", "model.layers.56.self_attn.q_proj.g_idx", "model.layers.56.self_attn.v_proj.g_idx", "model.layers.56.mlp.down_proj.g_idx", "model.layers.56.mlp.gate_proj.g_idx", "model.layers.56.mlp.up_proj.g_idx", "model.layers.57.self_attn.k_proj.g_idx", "model.layers.57.self_attn.o_proj.g_idx", "model.layers.57.self_attn.q_proj.g_idx", "model.layers.57.self_attn.v_proj.g_idx", "model.layers.57.mlp.down_proj.g_idx", "model.layers.57.mlp.gate_proj.g_idx", "model.layers.57.mlp.up_proj.g_idx", "model.layers.58.self_attn.k_proj.g_idx", "model.layers.58.self_attn.o_proj.g_idx", "model.layers.58.self_attn.q_proj.g_idx", "model.layers.58.self_attn.v_proj.g_idx", "model.layers.58.mlp.down_proj.g_idx", "model.layers.58.mlp.gate_proj.g_idx", "model.layers.58.mlp.up_proj.g_idx", "model.layers.59.self_attn.k_proj.g_idx", "model.layers.59.self_attn.o_proj.g_idx", "model.layers.59.self_attn.q_proj.g_idx", "model.layers.59.self_attn.v_proj.g_idx", "model.layers.59.mlp.down_proj.g_idx", "model.layers.59.mlp.gate_proj.g_idx", "model.layers.59.mlp.up_proj.g_idx".
        Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.bias", "model.layers.0.self_attn.o_proj.bias", "model.layers.0.self_attn.q_proj.bias", "model.layers.0.self_attn.v_proj.bias", "model.layers.0.mlp.down_proj.bias", "model.layers.0.mlp.gate_proj.bias", "model.layers.0.mlp.up_proj.bias", "model.layers.1.self_attn.k_proj.bias", "model.layers.1.self_attn.o_proj.bias", "model.layers.1.self_attn.q_proj.bias", "model.layers.1.self_attn.v_proj.bias", "model.layers.1.mlp.down_proj.bias", "model.layers.1.mlp.gate_proj.bias", "model.layers.1.mlp.up_proj.bias", "model.layers.2.self_attn.k_proj.bias", "model.layers.2.self_attn.o_proj.bias", "model.layers.2.self_attn.q_proj.bias", "model.layers.2.self_attn.v_proj.bias", "model.layers.2.mlp.down_proj.bias", "model.layers.2.mlp.gate_proj.bias", "model.layers.2.mlp.up_proj.bias", "model.layers.3.self_attn.k_proj.bias", "model.layers.3.self_attn.o_proj.bias", "model.layers.3.self_attn.q_proj.bias", "model.layers.3.self_attn.v_proj.bias", "model.layers.3.mlp.down_proj.bias", "model.layers.3.mlp.gate_proj.bias", "model.layers.3.mlp.up_proj.bias", "model.layers.4.self_attn.k_proj.bias", "model.layers.4.self_attn.o_proj.bias", "model.layers.4.self_attn.q_proj.bias", "model.layers.4.self_attn.v_proj.bias", "model.layers.4.mlp.down_proj.bias", "model.layers.4.mlp.gate_proj.bias", "model.layers.4.mlp.up_proj.bias", "model.layers.5.self_attn.k_proj.bias", "model.layers.5.self_attn.o_proj.bias", "model.layers.5.self_attn.q_proj.bias", "model.layers.5.self_attn.v_proj.bias", "model.layers.5.mlp.down_proj.bias", "model.layers.5.mlp.gate_proj.bias", "model.layers.5.mlp.up_proj.bias", "model.layers.6.self_attn.k_proj.bias", "model.layers.6.self_attn.o_proj.bias", "model.layers.6.self_attn.q_proj.bias", "model.layers.6.self_attn.v_proj.bias", "model.layers.6.mlp.down_proj.bias", "model.layers.6.mlp.gate_proj.bias", "model.layers.6.mlp.up_proj.bias", "model.layers.7.self_attn.k_proj.bias", "model.layers.7.self_attn.o_proj.bias", "model.layers.7.self_attn.q_proj.bias", "model.layers.7.self_attn.v_proj.bias", "model.layers.7.mlp.down_proj.bias", "model.layers.7.mlp.gate_proj.bias", "model.layers.7.mlp.up_proj.bias", "model.layers.8.self_attn.k_proj.bias", "model.layers.8.self_attn.o_proj.bias", "model.layers.8.self_attn.q_proj.bias", "model.layers.8.self_attn.v_proj.bias", "model.layers.8.mlp.down_proj.bias", "model.layers.8.mlp.gate_proj.bias", "model.layers.8.mlp.up_proj.bias", "model.layers.9.self_attn.k_proj.bias", "model.layers.9.self_attn.o_proj.bias", "model.layers.9.self_attn.q_proj.bias", "model.layers.9.self_attn.v_proj.bias", "model.layers.9.mlp.down_proj.bias", "model.layers.9.mlp.gate_proj.bias", "model.layers.9.mlp.up_proj.bias", "model.layers.10.self_attn.k_proj.bias", "model.layers.10.self_attn.o_proj.bias", "model.layers.10.self_attn.q_proj.bias", "model.layers.10.self_attn.v_proj.bias", "model.layers.10.mlp.down_proj.bias", "model.layers.10.mlp.gate_proj.bias", "model.layers.10.mlp.up_proj.bias", "model.layers.11.self_attn.k_proj.bias", "model.layers.11.self_attn.o_proj.bias", "model.layers.11.self_attn.q_proj.bias", "model.layers.11.self_attn.v_proj.bias", "model.layers.11.mlp.down_proj.bias", "model.layers.11.mlp.gate_proj.bias", "model.layers.11.mlp.up_proj.bias", "model.layers.12.self_attn.k_proj.bias", "model.layers.12.self_attn.o_proj.bias", "model.layers.12.self_attn.q_proj.bias", "model.layers.12.self_attn.v_proj.bias", "model.layers.12.mlp.down_proj.bias", "model.layers.12.mlp.gate_proj.bias", "model.layers.12.mlp.up_proj.bias", "model.layers.13.self_attn.k_proj.bias", "model.layers.13.self_attn.o_proj.bias", "model.layers.13.self_attn.q_proj.bias", "model.layers.13.self_attn.v_proj.bias", "model.layers.13.mlp.down_proj.bias", "model.layers.13.mlp.gate_proj.bias", "model.layers.13.mlp.up_proj.bias", "model.layers.14.self_attn.k_proj.bias", "model.layers.14.self_attn.o_proj.bias", "model.layers.14.self_attn.q_proj.bias", "model.layers.14.self_attn.v_proj.bias", "model.layers.14.mlp.down_proj.bias", "model.layers.14.mlp.gate_proj.bias", "model.layers.14.mlp.up_proj.bias", "model.layers.15.self_attn.k_proj.bias", "model.layers.15.self_attn.o_proj.bias", "model.layers.15.self_attn.q_proj.bias", "model.layers.15.self_attn.v_proj.bias", "model.layers.15.mlp.down_proj.bias", "model.layers.15.mlp.gate_proj.bias", "model.layers.15.mlp.up_proj.bias", "model.layers.16.self_attn.k_proj.bias", "model.layers.16.self_attn.o_proj.bias", "model.layers.16.self_attn.q_proj.bias", "model.layers.16.self_attn.v_proj.bias", "model.layers.16.mlp.down_proj.bias", "model.layers.16.mlp.gate_proj.bias", "model.layers.16.mlp.up_proj.bias", "model.layers.17.self_attn.k_proj.bias", "model.layers.17.self_attn.o_proj.bias", "model.layers.17.self_attn.q_proj.bias", "model.layers.17.self_attn.v_proj.bias", "model.layers.17.mlp.down_proj.bias", "model.layers.17.mlp.gate_proj.bias", "model.layers.17.mlp.up_proj.bias", "model.layers.18.self_attn.k_proj.bias", "model.layers.18.self_attn.o_proj.bias", "model.layers.18.self_attn.q_proj.bias", "model.layers.18.self_attn.v_proj.bias", "model.layers.18.mlp.down_proj.bias", "model.layers.18.mlp.gate_proj.bias", "model.layers.18.mlp.up_proj.bias", "model.layers.19.self_attn.k_proj.bias", "model.layers.19.self_attn.o_proj.bias", "model.layers.19.self_attn.q_proj.bias", "model.layers.19.self_attn.v_proj.bias", "model.layers.19.mlp.down_proj.bias", "model.layers.19.mlp.gate_proj.bias", "model.layers.19.mlp.up_proj.bias", "model.layers.20.self_attn.k_proj.bias", "model.layers.20.self_attn.o_proj.bias", "model.layers.20.self_attn.q_proj.bias", "model.layers.20.self_attn.v_proj.bias", "model.layers.20.mlp.down_proj.bias", "model.layers.20.mlp.gate_proj.bias", "model.layers.20.mlp.up_proj.bias", "model.layers.21.self_attn.k_proj.bias", "model.layers.21.self_attn.o_proj.bias", "model.layers.21.self_attn.q_proj.bias", "model.layers.21.self_attn.v_proj.bias", "model.layers.21.mlp.down_proj.bias", "model.layers.21.mlp.gate_proj.bias", "model.layers.21.mlp.up_proj.bias", "model.layers.22.self_attn.k_proj.bias", "model.layers.22.self_attn.o_proj.bias", "model.layers.22.self_attn.q_proj.bias", "model.layers.22.self_attn.v_proj.bias", "model.layers.22.mlp.down_proj.bias", "model.layers.22.mlp.gate_proj.bias", "model.layers.22.mlp.up_proj.bias", "model.layers.23.self_attn.k_proj.bias", "model.layers.23.self_attn.o_proj.bias", "model.layers.23.self_attn.q_proj.bias", "model.layers.23.self_attn.v_proj.bias", "model.layers.23.mlp.down_proj.bias", "model.layers.23.mlp.gate_proj.bias", "model.layers.23.mlp.up_proj.bias", "model.layers.24.self_attn.k_proj.bias", "model.layers.24.self_attn.o_proj.bias", "model.layers.24.self_attn.q_proj.bias", "model.layers.24.self_attn.v_proj.bias", "model.layers.24.mlp.down_proj.bias", "model.layers.24.mlp.gate_proj.bias", "model.layers.24.mlp.up_proj.bias", "model.layers.25.self_attn.k_proj.bias", "model.layers.25.self_attn.o_proj.bias", "model.layers.25.self_attn.q_proj.bias", "model.layers.25.self_attn.v_proj.bias", "model.layers.25.mlp.down_proj.bias", "model.layers.25.mlp.gate_proj.bias", "model.layers.25.mlp.up_proj.bias", "model.layers.26.self_attn.k_proj.bias", "model.layers.26.self_attn.o_proj.bias", "model.layers.26.self_attn.q_proj.bias", "model.layers.26.self_attn.v_proj.bias", "model.layers.26.mlp.down_proj.bias", "model.layers.26.mlp.gate_proj.bias", "model.layers.26.mlp.up_proj.bias", "model.layers.27.self_attn.k_proj.bias", "model.layers.27.self_attn.o_proj.bias", "model.layers.27.self_attn.q_proj.bias", "model.layers.27.self_attn.v_proj.bias", "model.layers.27.mlp.down_proj.bias", "model.layers.27.mlp.gate_proj.bias", "model.layers.27.mlp.up_proj.bias", "model.layers.28.self_attn.k_proj.bias", "model.layers.28.self_attn.o_proj.bias", "model.layers.28.self_attn.q_proj.bias", "model.layers.28.self_attn.v_proj.bias", "model.layers.28.mlp.down_proj.bias", "model.layers.28.mlp.gate_proj.bias", "model.layers.28.mlp.up_proj.bias", "model.layers.29.self_attn.k_proj.bias", "model.layers.29.self_attn.o_proj.bias", "model.layers.29.self_attn.q_proj.bias", "model.layers.29.self_attn.v_proj.bias", "model.layers.29.mlp.down_proj.bias", "model.layers.29.mlp.gate_proj.bias", "model.layers.29.mlp.up_proj.bias", "model.layers.30.self_attn.k_proj.bias", "model.layers.30.self_attn.o_proj.bias", "model.layers.30.self_attn.q_proj.bias", "model.layers.30.self_attn.v_proj.bias", "model.layers.30.mlp.down_proj.bias", "model.layers.30.mlp.gate_proj.bias", "model.layers.30.mlp.up_proj.bias", "model.layers.31.self_attn.k_proj.bias", "model.layers.31.self_attn.o_proj.bias", "model.layers.31.self_attn.q_proj.bias", "model.layers.31.self_attn.v_proj.bias", "model.layers.31.mlp.down_proj.bias", "model.layers.31.mlp.gate_proj.bias", "model.layers.31.mlp.up_proj.bias", "model.layers.32.self_attn.k_proj.bias", "model.layers.32.self_attn.o_proj.bias", "model.layers.32.self_attn.q_proj.bias", "model.layers.32.self_attn.v_proj.bias", "model.layers.32.mlp.down_proj.bias", "model.layers.32.mlp.gate_proj.bias", "model.layers.32.mlp.up_proj.bias", "model.layers.33.self_attn.k_proj.bias", "model.layers.33.self_attn.o_proj.bias", "model.layers.33.self_attn.q_proj.bias", "model.layers.33.self_attn.v_proj.bias", "model.layers.33.mlp.down_proj.bias", "model.layers.33.mlp.gate_proj.bias", "model.layers.33.mlp.up_proj.bias", "model.layers.34.self_attn.k_proj.bias", "model.layers.34.self_attn.o_proj.bias", "model.layers.34.self_attn.q_proj.bias", "model.layers.34.self_attn.v_proj.bias", "model.layers.34.mlp.down_proj.bias", "model.layers.34.mlp.gate_proj.bias", "model.layers.34.mlp.up_proj.bias", "model.layers.35.self_attn.k_proj.bias", "model.layers.35.self_attn.o_proj.bias", "model.layers.35.self_attn.q_proj.bias", "model.layers.35.self_attn.v_proj.bias", "model.layers.35.mlp.down_proj.bias", "model.layers.35.mlp.gate_proj.bias", "model.layers.35.mlp.up_proj.bias", "model.layers.36.self_attn.k_proj.bias", "model.layers.36.self_attn.o_proj.bias", "model.layers.36.self_attn.q_proj.bias", "model.layers.36.self_attn.v_proj.bias", "model.layers.36.mlp.down_proj.bias", "model.layers.36.mlp.gate_proj.bias", "model.layers.36.mlp.up_proj.bias", "model.layers.37.self_attn.k_proj.bias", "model.layers.37.self_attn.o_proj.bias", "model.layers.37.self_attn.q_proj.bias", "model.layers.37.self_attn.v_proj.bias", "model.layers.37.mlp.down_proj.bias", "model.layers.37.mlp.gate_proj.bias", "model.layers.37.mlp.up_proj.bias", "model.layers.38.self_attn.k_proj.bias", "model.layers.38.self_attn.o_proj.bias", "model.layers.38.self_attn.q_proj.bias", "model.layers.38.self_attn.v_proj.bias", "model.layers.38.mlp.down_proj.bias", "model.layers.38.mlp.gate_proj.bias", "model.layers.38.mlp.up_proj.bias", "model.layers.39.self_attn.k_proj.bias", "model.layers.39.self_attn.o_proj.bias", "model.layers.39.self_attn.q_proj.bias", "model.layers.39.self_attn.v_proj.bias", "model.layers.39.mlp.down_proj.bias", "model.layers.39.mlp.gate_proj.bias", "model.layers.39.mlp.up_proj.bias", "model.layers.40.self_attn.k_proj.bias", "model.layers.40.self_attn.o_proj.bias", "model.layers.40.self_attn.q_proj.bias", "model.layers.40.self_attn.v_proj.bias", "model.layers.40.mlp.down_proj.bias", "model.layers.40.mlp.gate_proj.bias", "model.layers.40.mlp.up_proj.bias", "model.layers.41.self_attn.k_proj.bias", "model.layers.41.self_attn.o_proj.bias", "model.layers.41.self_attn.q_proj.bias", "model.layers.41.self_attn.v_proj.bias", "model.layers.41.mlp.down_proj.bias", "model.layers.41.mlp.gate_proj.bias", "model.layers.41.mlp.up_proj.bias", "model.layers.42.self_attn.k_proj.bias", "model.layers.42.self_attn.o_proj.bias", "model.layers.42.self_attn.q_proj.bias", "model.layers.42.self_attn.v_proj.bias", "model.layers.42.mlp.down_proj.bias", "model.layers.42.mlp.gate_proj.bias", "model.layers.42.mlp.up_proj.bias", "model.layers.43.self_attn.k_proj.bias", "model.layers.43.self_attn.o_proj.bias", "model.layers.43.self_attn.q_proj.bias", "model.layers.43.self_attn.v_proj.bias", "model.layers.43.mlp.down_proj.bias", "model.layers.43.mlp.gate_proj.bias", "model.layers.43.mlp.up_proj.bias", "model.layers.44.self_attn.k_proj.bias", "model.layers.44.self_attn.o_proj.bias", "model.layers.44.self_attn.q_proj.bias", "model.layers.44.self_attn.v_proj.bias", "model.layers.44.mlp.down_proj.bias", "model.layers.44.mlp.gate_proj.bias", "model.layers.44.mlp.up_proj.bias", "model.layers.45.self_attn.k_proj.bias", "model.layers.45.self_attn.o_proj.bias", "model.layers.45.self_attn.q_proj.bias", "model.layers.45.self_attn.v_proj.bias", "model.layers.45.mlp.down_proj.bias", "model.layers.45.mlp.gate_proj.bias", "model.layers.45.mlp.up_proj.bias", "model.layers.46.self_attn.k_proj.bias", "model.layers.46.self_attn.o_proj.bias", "model.layers.46.self_attn.q_proj.bias", "model.layers.46.self_attn.v_proj.bias", "model.layers.46.mlp.down_proj.bias", "model.layers.46.mlp.gate_proj.bias", "model.layers.46.mlp.up_proj.bias", "model.layers.47.self_attn.k_proj.bias", "model.layers.47.self_attn.o_proj.bias", "model.layers.47.self_attn.q_proj.bias", "model.layers.47.self_attn.v_proj.bias", "model.layers.47.mlp.down_proj.bias", "model.layers.47.mlp.gate_proj.bias", "model.layers.47.mlp.up_proj.bias", "model.layers.48.self_attn.k_proj.bias", "model.layers.48.self_attn.o_proj.bias", "model.layers.48.self_attn.q_proj.bias", "model.layers.48.self_attn.v_proj.bias", "model.layers.48.mlp.down_proj.bias", "model.layers.48.mlp.gate_proj.bias", "model.layers.48.mlp.up_proj.bias", "model.layers.49.self_attn.k_proj.bias", "model.layers.49.self_attn.o_proj.bias", "model.layers.49.self_attn.q_proj.bias", "model.layers.49.self_attn.v_proj.bias", "model.layers.49.mlp.down_proj.bias", "model.layers.49.mlp.gate_proj.bias", "model.layers.49.mlp.up_proj.bias", "model.layers.50.self_attn.k_proj.bias", "model.layers.50.self_attn.o_proj.bias", "model.layers.50.self_attn.q_proj.bias", "model.layers.50.self_attn.v_proj.bias", "model.layers.50.mlp.down_proj.bias", "model.layers.50.mlp.gate_proj.bias", "model.layers.50.mlp.up_proj.bias", "model.layers.51.self_attn.k_proj.bias", "model.layers.51.self_attn.o_proj.bias", "model.layers.51.self_attn.q_proj.bias", "model.layers.51.self_attn.v_proj.bias", "model.layers.51.mlp.down_proj.bias", "model.layers.51.mlp.gate_proj.bias", "model.layers.51.mlp.up_proj.bias", "model.layers.52.self_attn.k_proj.bias", "model.layers.52.self_attn.o_proj.bias", "model.layers.52.self_attn.q_proj.bias", "model.layers.52.self_attn.v_proj.bias", "model.layers.52.mlp.down_proj.bias", "model.layers.52.mlp.gate_proj.bias", "model.layers.52.mlp.up_proj.bias", "model.layers.53.self_attn.k_proj.bias", "model.layers.53.self_attn.o_proj.bias", "model.layers.53.self_attn.q_proj.bias", "model.layers.53.self_attn.v_proj.bias", "model.layers.53.mlp.down_proj.bias", "model.layers.53.mlp.gate_proj.bias", "model.layers.53.mlp.up_proj.bias", "model.layers.54.self_attn.k_proj.bias", "model.layers.54.self_attn.o_proj.bias", "model.layers.54.self_attn.q_proj.bias", "model.layers.54.self_attn.v_proj.bias", "model.layers.54.mlp.down_proj.bias", "model.layers.54.mlp.gate_proj.bias", "model.layers.54.mlp.up_proj.bias", "model.layers.55.self_attn.k_proj.bias", "model.layers.55.self_attn.o_proj.bias", "model.layers.55.self_attn.q_proj.bias", "model.layers.55.self_attn.v_proj.bias", "model.layers.55.mlp.down_proj.bias", "model.layers.55.mlp.gate_proj.bias", "model.layers.55.mlp.up_proj.bias", "model.layers.56.self_attn.k_proj.bias", "model.layers.56.self_attn.o_proj.bias", "model.layers.56.self_attn.q_proj.bias", "model.layers.56.self_attn.v_proj.bias", "model.layers.56.mlp.down_proj.bias", "model.layers.56.mlp.gate_proj.bias", "model.layers.56.mlp.up_proj.bias", "model.layers.57.self_attn.k_proj.bias", "model.layers.57.self_attn.o_proj.bias", "model.layers.57.self_attn.q_proj.bias", "model.layers.57.self_attn.v_proj.bias", "model.layers.57.mlp.down_proj.bias", "model.layers.57.mlp.gate_proj.bias", "model.layers.57.mlp.up_proj.bias", "model.layers.58.self_attn.k_proj.bias", "model.layers.58.self_attn.o_proj.bias", "model.layers.58.self_attn.q_proj.bias", "model.layers.58.self_attn.v_proj.bias", "model.layers.58.mlp.down_proj.bias", "model.layers.58.mlp.gate_proj.bias", "model.layers.58.mlp.up_proj.bias", "model.layers.59.self_attn.k_proj.bias", "model.layers.59.self_attn.o_proj.bias", "model.layers.59.self_attn.q_proj.bias", "model.layers.59.self_attn.v_proj.bias", "model.layers.59.mlp.down_proj.bias", "model.layers.59.mlp.gate_proj.bias", "model.layers.59.mlp.up_proj.bias".

I think the triton branch without the triton dependency falls back to basically the same code that the pytorch branch had, at least performance is the same. However, the latest CUDA branch is literally slower than the triton fallback by a factor of 10 on my machine (single 3090), and the triton fallback is slower by a factor of 4-5 than the CUDA commit from a few days ago.

Also yes, --act-order + --groupsize works on the latest CUDA branch, for both quantization and inference.

Anonym0us33 · 2023-04-02T04:03:18Z

Hi, I'm far below you guys but ive been trying to get this to work and i havn't slept since march. any idea when all this will be fixed?

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
conda create -n textgen python=3.10.9
conda activate textgen
conda install -c conda-forge cudatoolkit-dev
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
mkdir repositories
cd repositories/
git clone -b cuda https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
cd GPTQ-for-LLaMa
pip install ninja
conda install -c conda-forge cudatoolkit-dev
git reset --hard 608f3ba71e40596c75f8864d73506eaf57323c6e
python setup_cuda.py install
cd ../../


python server.py --model llama-7b-4bit-HF-128

python server.py \
--wbits 4 \
--model llama-7b-4bit-HF-128

python server.py \
--wbits 4 \
--model alpaca7B

=BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
...
ly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)

    size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([32, 11008]) from checkpoint, the shape in current model is torch.Size([1, 11008]).

oobabooga · 2023-04-02T04:20:19Z

Please use my fork of GPTQ-for-LLaMa. It corresponds to commit a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773 in the cuda branch.

# activate the conda environment
conda activate textgen

# remove the existing GPT-for-LLaMa
cd text-generation-webui/repositories
rm -rf GPTQ-for-LLaMa
pip uninstall quant-cuda

# reinstall
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

I will keep using this until qwopqwop's branch stabilizes. Upstream changes will not be supported. This works with @USBhost's torrents for llama that are linked here.

BadisG · 2023-04-02T05:10:58Z

Does having a model quantized with Triton necessarily require the creation of a GPTQ-for-LLaMa repository with the Triton branch?

tqman added the bug Something isn't working label Mar 30, 2023

oobabooga closed this as completed Mar 31, 2023

EyeDeck mentioned this issue Apr 1, 2023

New GPTQ changes seem to break things #689

Closed

1 task

horenbergerb mentioned this issue Apr 1, 2023

Error while loading 4-bit model #708

Closed

1 task

cellophane mentioned this issue Apr 3, 2023

Help me... #745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_quant() got an unexpected keyword argument 'faster' #667

make_quant() got an unexpected keyword argument 'faster' #667

tqman commented Mar 30, 2023 •

edited

Loading

EyeDeck commented Mar 30, 2023 •

edited

Loading

tqman commented Mar 30, 2023

oobabooga commented Mar 30, 2023

deece commented Apr 1, 2023

EyeDeck commented Apr 1, 2023

deece commented Apr 1, 2023

LoopControl commented Apr 1, 2023 •

edited

Loading

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023 •

edited

Loading

BadisG commented Apr 1, 2023

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023 •

edited

Loading

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023

BadisG commented Apr 1, 2023

USBhost commented Apr 1, 2023 •

edited

Loading

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023 •

edited

Loading

Anonym0us33 commented Apr 2, 2023 •

edited

Loading

oobabooga commented Apr 2, 2023

BadisG commented Apr 2, 2023

make_quant() got an unexpected keyword argument 'faster' #667

make_quant() got an unexpected keyword argument 'faster' #667

Comments

tqman commented Mar 30, 2023 • edited Loading

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

EyeDeck commented Mar 30, 2023 • edited Loading

tqman commented Mar 30, 2023

oobabooga commented Mar 30, 2023

deece commented Apr 1, 2023

EyeDeck commented Apr 1, 2023

deece commented Apr 1, 2023

LoopControl commented Apr 1, 2023 • edited Loading

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023 • edited Loading

BadisG commented Apr 1, 2023

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023 • edited Loading

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023

USBhost commented Apr 1, 2023

USBhost commented Apr 1, 2023

BadisG commented Apr 1, 2023

USBhost commented Apr 1, 2023 • edited Loading

BadisG commented Apr 1, 2023

EyeDeck commented Apr 1, 2023 • edited Loading

Anonym0us33 commented Apr 2, 2023 • edited Loading

oobabooga commented Apr 2, 2023

BadisG commented Apr 2, 2023

tqman commented Mar 30, 2023 •

edited

Loading

EyeDeck commented Mar 30, 2023 •

edited

Loading

LoopControl commented Apr 1, 2023 •

edited

Loading

EyeDeck commented Apr 1, 2023 •

edited

Loading

USBhost commented Apr 1, 2023 •

edited

Loading

USBhost commented Apr 1, 2023 •

edited

Loading

EyeDeck commented Apr 1, 2023 •

edited

Loading

Anonym0us33 commented Apr 2, 2023 •

edited

Loading