Fix incorrect vocab size retrieval in GGUF config #32551

Isotr0py · 2024-08-09T05:06:33Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Isotr0py · 2024-08-09T05:07:42Z

cc @kunger97 @SunMarc

kunger97 · 2024-08-09T13:39:27Z

worked on 13B and 32B

ArthurZucker

Cool! Is there a reason why it's missing from the config? (or why we don't take it from the gguf file?)

Isotr0py · 2024-08-12T09:35:11Z

Because there is an issue in llama.cpp's convert_hf_to_gguf.py script. The problem is that they only called gguf_writer.add_vocab_size for llama so that the other architecture won't add vocab_size to the gguf file.

For example, this is the llama convert code with self.gguf_writer.add_vocab_size presented:

    def set_gguf_parameters(self):
        super().set_gguf_parameters()
        hparams = self.hparams
        self.gguf_writer.add_vocab_size(hparams["vocab_size"])

        ...

        # Apply to granite small models only
        if self.hparams.get("vocab_size", 32000) == 49152:
            self.gguf_writer.add_add_bos_token(False)

And the base set_gguf_parameters used in Qwen2 convert which no self.gguf_writer.add_vocab_size presented:

    def set_gguf_parameters(self):
        self.gguf_writer.add_block_count(self.block_count)

        if (n_ctx := self.find_hparam(["max_position_embeddings", "n_ctx"], optional=True)) is not None:
            self.gguf_writer.add_context_length(n_ctx)
            logger.info(f"gguf: context length = {n_ctx}")

        n_embd = self.find_hparam(["hidden_size", "n_embd"])
        self.gguf_writer.add_embedding_length(n_embd)
        logger.info(f"gguf: embedding length = {n_embd}")

        if (n_ff := self.find_hparam(["intermediate_size", "n_inner"], optional=True)) is not None:
            self.gguf_writer.add_feed_forward_length(n_ff)
            logger.info(f"gguf: feed forward length = {n_ff}")

        n_head = self.find_hparam(["num_attention_heads", "n_head"])
        self.gguf_writer.add_head_count(n_head)
        logger.info(f"gguf: head count = {n_head}")

        if (n_head_kv := self.hparams.get("num_key_value_heads")) is not None:
            self.gguf_writer.add_head_count_kv(n_head_kv)
            logger.info(f"gguf: key-value head count = {n_head_kv}")

        if (rope_theta := self.hparams.get("rope_theta")) is not None:
            self.gguf_writer.add_rope_freq_base(rope_theta)
            logger.info(f"gguf: rope theta = {rope_theta}")
        if (f_rms_eps := self.hparams.get("rms_norm_eps")) is not None:
            self.gguf_writer.add_layer_norm_rms_eps(f_rms_eps)
            logger.info(f"gguf: rms norm epsilon = {f_rms_eps}")
        if (f_norm_eps := self.find_hparam(["layer_norm_eps", "layer_norm_epsilon", "norm_epsilon"], optional=True)) is not None:
            self.gguf_writer.add_layer_norm_eps(f_norm_eps)
            logger.info(f"gguf: layer norm epsilon = {f_norm_eps}")
        if (n_experts := self.hparams.get("num_local_experts")) is not None:
            self.gguf_writer.add_expert_count(n_experts)
            logger.info(f"gguf: expert count = {n_experts}")
        if (n_experts_used := self.hparams.get("num_experts_per_tok")) is not None:
            self.gguf_writer.add_expert_used_count(n_experts_used)
            logger.info(f"gguf: experts used count = {n_experts_used}")

        if (head_dim := self.hparams.get("head_dim")) is not None:
            self.gguf_writer.add_key_length(head_dim)
            self.gguf_writer.add_value_length(head_dim)

        self.gguf_writer.add_file_type(self.ftype)
        logger.info(f"gguf: file type = {self.ftype}")

SunMarc

Thanks for fixing !

ArthurZucker

Got it thanks. For posterity can you add a small comment about the gguf conversion script not adding the vocab size 🤗

src/transformers/modeling_gguf_pytorch_utils.py

HuggingFaceDocBuilderDev · 2024-08-16T10:09:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-08-19T13:54:03Z

Thanks 🎈

fix gguf config vocab size

ecedd8d

Isotr0py mentioned this pull request Aug 9, 2024

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Closed

4 tasks

minor fix

a493464

ArthurZucker reviewed Aug 12, 2024

View reviewed changes

SunMarc approved these changes Aug 12, 2024

View reviewed changes

SunMarc requested a review from ArthurZucker August 12, 2024 13:41

ArthurZucker approved these changes Aug 16, 2024

View reviewed changes

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

Isotr0py added 2 commits August 16, 2024 20:37

link issue

2ed6f29

Merge branch 'huggingface:main' into gguf-fix

c8be58b

ArthurZucker merged commit 59e8f19 into huggingface:main Aug 19, 2024
21 checks passed

Isotr0py deleted the gguf-fix branch August 19, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect vocab size retrieval in GGUF config #32551

Fix incorrect vocab size retrieval in GGUF config #32551

Isotr0py commented Aug 9, 2024 •

edited

Loading

Isotr0py commented Aug 9, 2024

kunger97 commented Aug 9, 2024

ArthurZucker left a comment

Isotr0py commented Aug 12, 2024 •

edited

Loading

SunMarc left a comment

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Aug 16, 2024

ArthurZucker commented Aug 19, 2024

Fix incorrect vocab size retrieval in GGUF config #32551

Fix incorrect vocab size retrieval in GGUF config #32551

Conversation

Isotr0py commented Aug 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Isotr0py commented Aug 9, 2024

kunger97 commented Aug 9, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Isotr0py commented Aug 12, 2024 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 16, 2024

ArthurZucker commented Aug 19, 2024

Isotr0py commented Aug 9, 2024 •

edited

Loading

Isotr0py commented Aug 12, 2024 •

edited

Loading