Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

kunger97 · 2024-08-08T09:42:45Z

System Info

transformers version: 4.45.0.dev0
Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
Python version: 3.11.9
Huggingface_hub version: 0.24.5
Safetensors version: 0.4.3
Accelerate version: 0.33.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: (No)

Who can help?

@ArthurZucker @Isotr0py

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

run my script to save gguf model to pytorch model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

print(model)

tokenizer.save_pretrained('/home/u1033079/LLM')
model.save_pretrained('/home/u1033079/LLM')

Expected behavior

Converting and de-quantizing GGUF tensors...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 771/771 [08:00<00:00,  1.60it/s]
Traceback (most recent call last):
  File "/home/u1033079/LLM/run.py", line 16, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3942, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4339, in _load_pretrained_model
    error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 937, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([152064, 5120]) in "weight" (which has shape torch.Size([151936, 5120])), this looks incorrect.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-08-08T10:16:39Z

cc @SunMarc

SunMarc · 2024-08-08T12:34:27Z

Hi @kunger97, can you share with me the version of gguf that you are using ? Also, did it work in the past or you tried it for the first time ?

kunger97 · 2024-08-08T12:51:10Z

This is my first attempt, i'm useing gguf 0.9.1

Isotr0py · 2024-08-08T12:54:59Z

Seems that the model config was created incorrectly. I ran the reproduction code, and the vocab size in config is incorrect:

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

config = AutoConfig.from_pretrained(model_id, gguf_file=filename)
print(config)

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
print(tokenizer)

Outputs

Qwen2Config {
  "_model_name_or_path": "Qwen2-beta-32B-Chat-AWQ-fp16",
  "_name_or_path": "Qwen/Qwen1.5-32B-Chat-GGUF",
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27392,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 9.999999974752427e-07,
  "rope_theta": 1000000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.42.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Qwen2TokenizerFast(name_or_path='Qwen/Qwen1.5-32B-Chat-GGUF', vocab_size=152064, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

And the correct vocab_size=152064 in tokenizer is same with token_embd.weight shape, while "vocab_size": 151936 in config is wrong.

In fact, the current gguf config extraction is dependent on the vocab_size in gguf metadata. However, the Qwen1.5-32B-Chat-GGUF model file doesn't have this optional key in metadata. So it used the default value in Qwen2Config.

kunger97 · 2024-08-09T01:08:58Z

Is there currently a solution to this problem?

Isotr0py · 2024-08-09T05:16:13Z

@kunger97 I have created a PR #32551 to fix this.

kunger97 added the bug label Aug 8, 2024

amyeroberts added the Quantization label Aug 8, 2024

Isotr0py mentioned this issue Aug 9, 2024

Fix incorrect vocab size retrieval in GGUF config #32551

Merged

5 tasks

kunger97 closed this as completed Aug 12, 2024

Isotr0py mentioned this issue Aug 20, 2024

[Usage]: Qwen2 GGUF model can't run successfully vllm-project/vllm#7689

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

kunger97 commented Aug 8, 2024

amyeroberts commented Aug 8, 2024

SunMarc commented Aug 8, 2024

kunger97 commented Aug 8, 2024

Isotr0py commented Aug 8, 2024 •

edited

Loading

kunger97 commented Aug 9, 2024

Isotr0py commented Aug 9, 2024

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Comments

kunger97 commented Aug 8, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Aug 8, 2024

SunMarc commented Aug 8, 2024

kunger97 commented Aug 8, 2024

Isotr0py commented Aug 8, 2024 • edited Loading

kunger97 commented Aug 9, 2024

Isotr0py commented Aug 9, 2024

Isotr0py commented Aug 8, 2024 •

edited

Loading