Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

Open
LIUKAI0815 opened this issue Apr 30, 2024 · 6 comments
Open

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

LIUKAI0815 opened this issue Apr 30, 2024 · 6 comments
Assignees
Labels
Investigating triaged Issue has been triaged by maintainers

Comments

@LIUKAI0815
Copy link

python3 convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B --output_dir ./tllm_checkpoint_1gpu_gptq --dtype float16 --use_weight_only --weight_only_precision int4_gptq --per_group
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300
0.10.0.dev2024042300
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.45it/s]
[04/30/2024-10:16:11] Some parameters are on the meta device device because they were offloaded to the cpu.
loading weight in each layer...: 0%| | 0/40 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 365, in
main()
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 357, in main
convert_and_save_hf(args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 319, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 325, in execute
f(args, rank)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 305, in convert_and_save_rank
qwen = from_hugging_face(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 1081, in from_hugging_face
weights = load_from_gptq_qwen(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/weight.py", line 158, in load_from_gptq_qwen
comp_part = model_params[prefix + key_list[0] + comp + suf]
KeyError: 'model.layers.0.self_attn.q_proj.qweight'

@jershi425
Copy link
Collaborator

@LIUKAI0815 Thanks for the feedback. Could you kindly tell me which model are you using? This requires using the official GPTQ quantized checkpoints from HF.

@RoslinAdama
Copy link

I have the same issue using a quantized Mistral model : TheBloke/Mistral-7B-v0.1-AWQ

@byshiue byshiue added the triaged Issue has been triaged by maintainers label May 6, 2024
@LIUKAI0815
Copy link
Author

@jershi425 I'm using the Qwen1.5-14B-Chat

@Mary-Sam
Copy link

Mary-Sam commented Jun 2, 2024

Has this problem been solved? I have the same error when using a quantized mixtral model

@nv-guomingz
Copy link
Collaborator

Has this problem been solved? I have the same error when using a quantized mixtral model

Hi @Mary-Sam could u please list more details/log on your issue? So we can look into it.

@Mary-Sam
Copy link

Mary-Sam commented Jun 3, 2024

Hi @nv-guomingz
I run the following command for the quantized model
python3 /tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /model --output_dir /engine --load_model_on_cpu

I am using the latest version of tensorrt_llm==0.9.0

My model has the following quantization configuration

{
    "bits": 4,
    "group_size": 128,
    "modules_to_not_convert": [
      "gate"
    ],
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  }

And I am getting the following error:

2024-06-03 12:56:17,367 utils.common INFO:[TensorRT-LLM] TensorRT-LLM version: 0.9.0
2024-06-03 12:56:17,367 utils.common INFO:We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
2024-06-03 12:56:17,367 utils.common INFO:0.9.0
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.42it/s]
2024-06-03 12:56:17,367 utils.common INFO:Traceback (most recent call last):
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 446, in <module>
2024-06-03 12:56:17,367 utils.common INFO:    main()
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 438, in main
2024-06-03 12:56:17,367 utils.common INFO:    convert_and_save_hf(args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 375, in convert_and_save_hf
2024-06-03 12:56:17,367 utils.common INFO:    execute(args.workers, [convert_and_save_rank] * world_size, args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 397, in execute
2024-06-03 12:56:17,367 utils.common INFO:    f(args, rank)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 362, in convert_and_save_rank
2024-06-03 12:56:17,367 utils.common INFO:    llama = LLaMAForCausalLM.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 244, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    llama = convert.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1192, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    weights = load_weights_from_hf(config=config,
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1296, in load_weights_from_hf
2024-06-03 12:56:17,367 utils.common INFO:    weights = convert_hf_llama(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 964, in convert_hf_llama
2024-06-03 12:56:17,367 utils.common INFO:    convert_layer(l)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 646, in convert_layer
2024-06-03 12:56:17,367 utils.common INFO:    q_weight = get_weight(model_params, prefix + 'self_attn.q_proj', dtype)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 399, in get_weight
2024-06-03 12:56:17,367 utils.common INFO:    if config[prefix + '.weight'].dtype != dtype:
2024-06-03 12:56:17,367 utils.common INFO:KeyError: 'model.layers.0.self_attn.q_proj.weight'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants