KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

LIUKAI0815 · 2024-04-30T02:16:34Z

python3 convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B --output_dir ./tllm_checkpoint_1gpu_gptq --dtype float16 --use_weight_only --weight_only_precision int4_gptq --per_group
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300
0.10.0.dev2024042300
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.45it/s]
[04/30/2024-10:16:11] Some parameters are on the meta device device because they were offloaded to the cpu.
loading weight in each layer...: 0%| | 0/40 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 365, in
main()
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 357, in main
convert_and_save_hf(args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 319, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 325, in execute
f(args, rank)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 305, in convert_and_save_rank
qwen = from_hugging_face(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 1081, in from_hugging_face
weights = load_from_gptq_qwen(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/weight.py", line 158, in load_from_gptq_qwen
comp_part = model_params[prefix + key_list[0] + comp + suf]
KeyError: 'model.layers.0.self_attn.q_proj.qweight'

jershi425 · 2024-05-06T07:47:51Z

@LIUKAI0815 Thanks for the feedback. Could you kindly tell me which model are you using? This requires using the official GPTQ quantized checkpoints from HF.

RoslinAdama · 2024-05-06T08:27:54Z

I have the same issue using a quantized Mistral model : TheBloke/Mistral-7B-v0.1-AWQ

LIUKAI0815 · 2024-05-06T10:24:37Z

@jershi425 I'm using the Qwen1.5-14B-Chat

Mary-Sam · 2024-06-02T06:52:28Z

Has this problem been solved? I have the same error when using a quantized mixtral model

nv-guomingz · 2024-06-03T08:44:58Z

Has this problem been solved? I have the same error when using a quantized mixtral model

Hi @Mary-Sam could u please list more details/log on your issue? So we can look into it.

Mary-Sam · 2024-06-03T13:06:37Z

Hi @nv-guomingz
I run the following command for the quantized model
python3 /tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /model --output_dir /engine --load_model_on_cpu

I am using the latest version of tensorrt_llm==0.9.0

My model has the following quantization configuration

{
    "bits": 4,
    "group_size": 128,
    "modules_to_not_convert": [
      "gate"
    ],
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  }

And I am getting the following error:

2024-06-03 12:56:17,367 utils.common INFO:[TensorRT-LLM] TensorRT-LLM version: 0.9.0
2024-06-03 12:56:17,367 utils.common INFO:We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
2024-06-03 12:56:17,367 utils.common INFO:0.9.0
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.42it/s]
2024-06-03 12:56:17,367 utils.common INFO:Traceback (most recent call last):
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 446, in <module>
2024-06-03 12:56:17,367 utils.common INFO:    main()
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 438, in main
2024-06-03 12:56:17,367 utils.common INFO:    convert_and_save_hf(args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 375, in convert_and_save_hf
2024-06-03 12:56:17,367 utils.common INFO:    execute(args.workers, [convert_and_save_rank] * world_size, args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 397, in execute
2024-06-03 12:56:17,367 utils.common INFO:    f(args, rank)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 362, in convert_and_save_rank
2024-06-03 12:56:17,367 utils.common INFO:    llama = LLaMAForCausalLM.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 244, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    llama = convert.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1192, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    weights = load_weights_from_hf(config=config,
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1296, in load_weights_from_hf
2024-06-03 12:56:17,367 utils.common INFO:    weights = convert_hf_llama(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 964, in convert_hf_llama
2024-06-03 12:56:17,367 utils.common INFO:    convert_layer(l)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 646, in convert_layer
2024-06-03 12:56:17,367 utils.common INFO:    q_weight = get_weight(model_params, prefix + 'self_attn.q_proj', dtype)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 399, in get_weight
2024-06-03 12:56:17,367 utils.common INFO:    if config[prefix + '.weight'].dtype != dtype:
2024-06-03 12:56:17,367 utils.common INFO:KeyError: 'model.layers.0.self_attn.q_proj.weight'

byshiue assigned jershi425 May 6, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 6, 2024

nv-guomingz added the Investigating label Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

LIUKAI0815 commented Apr 30, 2024

jershi425 commented May 6, 2024

RoslinAdama commented May 6, 2024

LIUKAI0815 commented May 6, 2024

Mary-Sam commented Jun 2, 2024

nv-guomingz commented Jun 3, 2024

Mary-Sam commented Jun 3, 2024 •

edited

Loading

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

KeyError: 'model.layers.0.self_attn.q_proj.qweight' #1528

Comments

LIUKAI0815 commented Apr 30, 2024

jershi425 commented May 6, 2024

RoslinAdama commented May 6, 2024

LIUKAI0815 commented May 6, 2024

Mary-Sam commented Jun 2, 2024

nv-guomingz commented Jun 3, 2024

Mary-Sam commented Jun 3, 2024 • edited Loading

Mary-Sam commented Jun 3, 2024 •

edited

Loading