Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Closed
4 tasks
kunger97 opened this issue Aug 8, 2024 · 6 comments · Fixed by #32551
Closed
4 tasks

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

kunger97 opened this issue Aug 8, 2024 · 6 comments · Fixed by #32551

Comments

@kunger97
Copy link

kunger97 commented Aug 8, 2024

System Info

  • transformers version: 4.45.0.dev0
  • Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • Huggingface_hub version: 0.24.5
  • Safetensors version: 0.4.3
  • Accelerate version: 0.33.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: (No)

Who can help?

@ArthurZucker @Isotr0py

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

run my script to save gguf model to pytorch model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

print(model)

tokenizer.save_pretrained('/home/u1033079/LLM')
model.save_pretrained('/home/u1033079/LLM')

Expected behavior

Converting and de-quantizing GGUF tensors...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 771/771 [08:00<00:00,  1.60it/s]
Traceback (most recent call last):
  File "/home/u1033079/LLM/run.py", line 16, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3942, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4339, in _load_pretrained_model
    error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 937, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([152064, 5120]) in "weight" (which has shape torch.Size([151936, 5120])), this looks incorrect.
@kunger97 kunger97 added the bug label Aug 8, 2024
@amyeroberts
Copy link
Collaborator

cc @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Aug 8, 2024

Hi @kunger97, can you share with me the version of gguf that you are using ? Also, did it work in the past or you tried it for the first time ?

@kunger97
Copy link
Author

kunger97 commented Aug 8, 2024

This is my first attempt, i'm useing gguf 0.9.1

@Isotr0py
Copy link
Contributor

Isotr0py commented Aug 8, 2024

Seems that the model config was created incorrectly. I ran the reproduction code, and the vocab size in config is incorrect:

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

config = AutoConfig.from_pretrained(model_id, gguf_file=filename)
print(config)

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
print(tokenizer)

Outputs

Qwen2Config {
  "_model_name_or_path": "Qwen2-beta-32B-Chat-AWQ-fp16",
  "_name_or_path": "Qwen/Qwen1.5-32B-Chat-GGUF",
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27392,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 9.999999974752427e-07,
  "rope_theta": 1000000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.42.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Qwen2TokenizerFast(name_or_path='Qwen/Qwen1.5-32B-Chat-GGUF', vocab_size=152064, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

And the correct vocab_size=152064 in tokenizer is same with token_embd.weight shape, while "vocab_size": 151936 in config is wrong.

In fact, the current gguf config extraction is dependent on the vocab_size in gguf metadata. However, the Qwen1.5-32B-Chat-GGUF model file doesn't have this optional key in metadata. So it used the default value in Qwen2Config.

@kunger97
Copy link
Author

kunger97 commented Aug 9, 2024

Is there currently a solution to this problem?

@Isotr0py
Copy link
Contributor

Isotr0py commented Aug 9, 2024

@kunger97 I have created a PR #32551 to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants