GGUF: Fix llama 3 GGUF #31358

younesbelkada · 2024-06-10T15:47:46Z

What does this PR do?

Replaces: #31215
Fixes: #30391 (comment)

Currently it is not possible to load Llama-3 GGUF models due to the fact that the Llama3 tokenizer is slightly different from the previous Llama models.

A way to detect that we are having a llama-3 gguf model is to check for the attribute tokenizer.model (registered as tokenizer_type in proto) of the GGUF file (see an example below taken from this checkpoint).

Firstly, the GGUF file directly contains the merges attributes, in that case, scores is not present anymore so creating a dummy scores array is sufficient and avoids errors.

Secondly, I addressed the case where unkonwn_token_id is registered within the proto object ( seems to be the case for some Llama2 checkpoints)

In addition to that, note that llama3 uses different special tokens that are different from the default special tokens of LlamaTokenizer (e.g. it uses <begin_of_text> instead of <s>). This can be fixed by passing the correct special tokens to the tokenizers init method - therefore we need a logic to pass these new args / kwargs to that method. I propose to create a new attribute additional_kwargs inside the tokenizer converter and pass that along to the conversion logic. Moreover, I made sure that the special tokens handling is more "universal" (in the past we were hardcoding everything)

For the decoding process (thanks @ArthurZucker for the offline explanation), one needs to use decoders.ByteLevel to properly decode the generated tokens as spaces are encoded with the character Ġ in the Llama3 tokenizer - this is not the case for Llama 1 & 2 tokenizers.

In [19]: tok.encode("Hello I am new to this forum").tokens
Out[19]: ['<|begin_of_text|>', 'Hello', 'ĠI', 'Ġam', 'Ġnew', 'Ġto', 'Ġthis', 'Ġforum']

Finally I added a new test that uses a Llama3 checkpoint and I made sure previous tests all pass

HuggingFaceDocBuilderDev · 2024-06-10T16:09:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Cool! Missing some small stuff but overall LGTM

src/transformers/integrations/ggml.py

ArthurZucker · 2024-06-12T15:10:50Z

src/transformers/integrations/ggml.py

-            decoders.Fuse(),
-            decoders.Replace("▁", " "),
-        ]
+        if not self.uses_byte_level_decoding:


The pre_tokenizer should alsobe ByteLevel

ArthurZucker · 2024-06-12T15:11:12Z

src/transformers/tokenization_utils_fast.py

@@ -121,7 +121,11 @@ def __init__(self, *args, **kwargs):
            gguf_param = load_gguf_checkpoint(kwargs.get("vocab_file"))
            architecture = gguf_param["config"]["model_type"]
            tokenizer_dict = gguf_param["tokenizer"]
-            fast_tokenizer = convert_gguf_tokenizer(architecture, tokenizer_dict)
+            fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)


yeah sound good to me TBH

src/transformers/integrations/ggml.py

…/transformers into fix-llama-3-gguf-2

ArthurZucker

Last few nits!

ArthurZucker · 2024-06-19T07:49:18Z

src/transformers/integrations/ggml.py

+        self.additional_kwargs["eos_token"] = bos_token
+        self.additional_kwargs["bos_token"] = eos_token
+
+        if self.uses_byte_level_decoding:


The flag might be a bit miss-leading, since it's byte level encoding !

ArthurZucker · 2024-06-19T07:49:45Z

src/transformers/integrations/ggml.py

-            decoders.Replace("▁", " "),
-        ]
+        if not self.uses_byte_level_decoding:
+            sequence = [


you can have bytefallback along with byteLevel decoding!

This is misleading indeed, i changed a bit the arg name

This is still wrong! the sequence should always be bytefallbacl fuse and replace, but if you are a llama3, you add the bytelebel as well

…/transformers into fix-llama-3-gguf-2

younesbelkada · 2024-06-19T13:33:06Z

src/transformers/integrations/ggml.py

+
+            # This is tricky as the additional kwargs are passed after legacy is force-set in LlamaTokenizer's
+            # init.
+            tokenizer.normalizer = normalizers.Sequence([])


This is a bit tricky, additional_kwargs are created in this call:

transformers/src/transformers/models/llama/tokenization_llama.py

Line 173 in 0ed3ffc

super().__init__(

after legacy=True is passed. additional_kwargs is taken into account after the init of the child class here:

transformers/src/transformers/tokenization_utils_fast.py

Line 166 in 0ed3ffc

super().__init__(**kwargs)

so legacy is silently ignored :/ I had to come up with the hack you shared offline

ArthurZucker

LGTM left a last nit!

ArthurZucker · 2024-06-20T06:54:06Z

src/transformers/integrations/ggml.py

-            decoders.Replace("▁", " "),
-        ]
+        if not self.uses_byte_level_decoding:
+            sequence = [


This is still wrong! the sequence should always be bytefallbacl fuse and replace, but if you are a llama3, you add the bytelebel as well

ArthurZucker · 2024-06-20T06:54:57Z

tests/quantization/ggml/test_ggml.py

@@ -171,6 +173,19 @@ def test_qwen2_q4_0(self):
        EXPECTED_TEXT = "Hello.jsoup\n\nI am a beginner"
        self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)

+    def test_llama3_q4_0(self):


The string that you are testing could be improved! Specifically to take into account the special tokens and etc.
But alright otherwise!

* Create push-important-models.yml * llama3 support for GGUF * fixup * Update src/transformers/integrations/ggml.py * fix pre-tokenizer * fix * fix * fix * fix * fix * fix * address final comment * handle special tokens + add tests

younesbelkada and others added 4 commits February 23, 2024 07:33

Create push-important-models.yml

91c72e7

merge

580c7f8

llama3 support for GGUF

7bc0e6c

fixup

b6ae58b

younesbelkada mentioned this pull request Jun 10, 2024

Loading GGUF files support #30391

Merged

younesbelkada requested a review from ArthurZucker June 10, 2024 15:48

ArthurZucker reviewed Jun 12, 2024

View reviewed changes

younesbelkada commented Jun 12, 2024

View reviewed changes

src/transformers/integrations/ggml.py Outdated Show resolved Hide resolved

younesbelkada and others added 4 commits June 12, 2024 17:14

Update src/transformers/integrations/ggml.py

74d607e

Merge remote-tracking branch 'origin/main' into fix-llama-3-gguf-2

9074572

Merge branch 'fix-llama-3-gguf-2' of https://github.com/younesbelkada…

a51e3dc

…/transformers into fix-llama-3-gguf-2

fix pre-tokenizer

dc9719f

younesbelkada requested a review from ArthurZucker June 14, 2024 15:17

ArthurZucker reviewed Jun 19, 2024

View reviewed changes

younesbelkada added 7 commits June 19, 2024 13:37

Merge remote-tracking branch 'origin/main' into fix-llama-3-gguf-2

fbd2d4b

Merge branch 'fix-llama-3-gguf-2' of https://github.com/younesbelkada…

80ea9c8

…/transformers into fix-llama-3-gguf-2

fix

11f41cc

fix

bc4b0c3

fix

795681d

fix

72633a1

fix

ead5a92

younesbelkada commented Jun 19, 2024

View reviewed changes

fix

c711b0a

younesbelkada requested a review from ArthurZucker June 19, 2024 13:35

ArthurZucker approved these changes Jun 20, 2024

View reviewed changes

younesbelkada added 2 commits June 20, 2024 10:38

address final comment

71105c0

handle special tokens + add tests

b8aa283

younesbelkada merged commit 6d43061 into huggingface:main Jun 20, 2024
21 checks passed

younesbelkada deleted the fix-llama-3-gguf-2 branch June 20, 2024 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF: Fix llama 3 GGUF #31358

GGUF: Fix llama 3 GGUF #31358

younesbelkada commented Jun 10, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 10, 2024

ArthurZucker left a comment

ArthurZucker Jun 12, 2024

ArthurZucker Jun 12, 2024

ArthurZucker left a comment

ArthurZucker Jun 19, 2024

ArthurZucker Jun 19, 2024

younesbelkada Jun 19, 2024

ArthurZucker Jun 20, 2024

younesbelkada Jun 19, 2024

ArthurZucker left a comment

ArthurZucker Jun 20, 2024

ArthurZucker Jun 20, 2024

GGUF: Fix llama 3 GGUF #31358

GGUF: Fix llama 3 GGUF #31358

Conversation

younesbelkada commented Jun 10, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 10, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Jun 10, 2024 •

edited

Loading