GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

maziyarpanahi · 2024-03-18T11:05:43Z

Hi,

I am trying to convert and quantized this model: https://huggingface.co/saltlux/luxia-21.4b-alignment-v1.0/

python llama.cpp/convert.py ~/.cache/huggingface/hub/models--saltlux--luxia-21.4b-alignment-v1.0/ --outtype f16 --outfile luxia-21.4b-alignment-v1.0.fp16.gguf

But I get this error when I use it for inference:

llama.cpp/main -m luxia-21.4b-alignment-v1.0.fp16.gguf -p "I need to create a presisted volume on Kubernetese and attach it to my application. Give me these two yaml files:" -n 400 -e


GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0llama.cpp/main -m quantized/saltlux/luxia-21.4b-alignment-v1.0/luxia-21.4b-alignment-v1.0.fp16.gguf -p "I need to create a presisted
volume on Kubernetese and attach it to my application. Give me these two yaml files:" -n 400 -e
Log start
main: build = 2442 (d84c4850)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1710758277
llama_model_loader: loaded meta data with 23 key-value pairs and 471 tensors from quantized/saltlux/luxia-21.4b-alignment-v1.0/luxia-21.4b-alignment-v1.0.fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 92544
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 6144
llama_model_loader: - kv   5:                          llama.block_count u32              = 52
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 16384
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 48
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,92544]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,92544]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,92544]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:  105 tensors
llama_model_loader: - type  f16:  366 tensors
GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0
Aborted (core dumped)

I've never seen this error and I cannot find anything remotely similar to this issue. What could cause this issue?

The text was updated successfully, but these errors were encountered:

akumaburn · 2024-04-23T14:35:53Z

Still an issue.

Speedway1 · 2024-05-21T21:49:55Z

To confirm that we're also seeing the exact same issue.

Speedway1 · 2024-05-21T22:00:29Z

Two git issues were raised, with the problem in the code identified, but automatically closed due to inactivity:
#5112
#4360

Looks like the bug is the handling of token 354 (\u0000).

bartowski1182 · 2024-05-27T01:05:24Z

seeing this with https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1 which has the same \u0000 token

wonder if the code needs a specific catch for it

github-actions · 2024-07-11T01:06:49Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

silverjam · 2024-07-19T00:33:51Z

I added some naive handling of the \u0000 token (to basically ignore it) but this wasn't sufficient, so obviously something more comprehensive is needed.

maziyarpanahi added the bug-unconfirmed label Mar 18, 2024

github-actions bot added the stale label Apr 18, 2024

github-actions bot removed the stale label Apr 24, 2024

github-actions bot added the stale label Jun 26, 2024

github-actions bot closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

maziyarpanahi commented Mar 18, 2024

akumaburn commented Apr 23, 2024

Speedway1 commented May 21, 2024

Speedway1 commented May 21, 2024

bartowski1182 commented May 27, 2024

github-actions bot commented Jul 11, 2024

silverjam commented Jul 19, 2024

GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

Comments

maziyarpanahi commented Mar 18, 2024

akumaburn commented Apr 23, 2024

Speedway1 commented May 21, 2024

Speedway1 commented May 21, 2024

bartowski1182 commented May 27, 2024

github-actions bot commented Jul 11, 2024

silverjam commented Jul 19, 2024