Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py --vocab-type hfft produces <0x0A> instead of new lines #5064

Closed
Artefact2 opened this issue Jan 21, 2024 · 2 comments · Fixed by #5341
Closed

convert.py --vocab-type hfft produces <0x0A> instead of new lines #5064

Artefact2 opened this issue Jan 21, 2024 · 2 comments · Fixed by #5341

Comments

@Artefact2
Copy link
Collaborator

I am using llama.cpp master. I am trying to convert this model to GGUF: https://huggingface.co/tenyx/TenyxChat-8x7B-v1

After running convert.py --vocab-type hfft, the model will not output new lines correctly:

./main -m ~/TenyxChat-8x7B-v1-Q8_0.gguf -p "[INST]Write some poetry about typography.[/INST]" -n 128

system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 [INST]Write some poetry about typography.[/INST] In the world of words, where letters roam,<0x0A>Typography sets the tone, like a artist's home.<0x0A>Each stroke and curve, a deliberate choice,<0x0A>A symphony of shapes, with a unified voice.<0x0A><0x0A>Serif, sans-serif, or script,<0x0A>A message is sent, before even dict.<0x0A>In a font's design, we see the creator's mind,<0x0A>A reflection of culture, that's hard to find.<0x0A><0x0A>Some letters kiss, others stand apart,<0x0A>Creating rhythm and flow, with an artful heart.<0x0A>The negative space
llama_print_timings:        load time =    1578.15 ms
llama_print_timings:      sample time =      16.40 ms /   128 runs   (    0.13 ms per token,  7803.45 tokens per second)
llama_print_timings: prompt eval time =    1893.05 ms /    14 tokens (  135.22 ms per token,     7.40 tokens per second)
llama_print_timings:        eval time =   40938.67 ms /   127 runs   (  322.35 ms per token,     3.10 tokens per second)
llama_print_timings:       total time =   42878.28 ms /   141 tokens

Possibly related to #4622.

@Artefact2
Copy link
Collaborator Author

Artefact2 commented Jan 21, 2024

Patch below seems to fix the issue.

diff --git a/convert.py b/convert.py
index 06768033..333cc1a0 100755
--- a/convert.py
+++ b/convert.py
@@ -509,11 +509,13 @@ class HfVocab:
 
             # Convert token text to bytes
             token_text = reverse_vocab[token_id].encode("utf-8")
+            if token_text.startswith(b"<0x") and token_text.endswith(b">"):
+                toktype = gguf.TokenType.BYTE
+            else:
+                toktype = self.get_token_type(token_id, self.special_ids)
 
             # Yield token text, score, and type
-            yield token_text, self.get_token_score(token_id), self.get_token_type(
-                token_id, self.special_ids  # Reuse already stored special IDs
-            )
+            yield token_text, self.get_token_score(token_id), toktype
 
     def get_token_type(self, token_id: int, special_ids: set[int]) -> gguf.TokenType:
         # Determine token type based on whether it's a special token

If this looks good I can open a MR.

@ggerganov
Copy link
Owner

I guess it's OK, though I would like a second opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants