panic with inference_session split_at #298

spion · 2023-06-06T16:25:35Z

When presenting / feeding certain tokens I'm getting:

thread 'main' panicked at 'assertion failed: mid <= self.len()', 
  /home/spion/.cargo/git/checkouts/llm-d8a8bbe144aa0546/e52a102/crates/llm-base/src/inference_session.rs:101:51

Steps to reproduce:

Feed the following string using feed_prompt

да гледам ќе идев

to any model that has llama like tokens.

Commit: latest master (e52a102)

The text was updated successfully, but these errors were encountered:

LLukas22 · 2023-06-06T18:43:56Z

Are you using the Huggingface tokenizer or the tokenizer implemented directly into rustformers?

spion · 2023-06-06T23:06:49Z

Oops, I forgot to meniton that really important bit. I'm using the HuggingFace tokenizer.

philpax · 2023-06-28T22:14:14Z

Okay, took a bit of time, but I managed to find the root cause. The end of the UTF-8 decoding of the tokens with the Hugging Face tokenizer can actually change as more tokens are added (including being a shorter string!), leading to panics when what's already been decoded doesn't match what the new decoding has provided.

I've requested help here: huggingface/tokenizers#1141 (comment) - hoping that we can get to the bottom of this soon!

philpax added issue:bug Something isn't working model:llama LLaMA model labels Jun 19, 2023

philpax mentioned this issue Jun 28, 2023

Support for incremental decoding huggingface/tokenizers#1141

Closed

philpax added this to the 0.2 milestone Jun 28, 2023

philpax closed this as completed in 7e2f2bf Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic with inference_session split_at #298

panic with inference_session split_at #298

spion commented Jun 6, 2023 •

edited

Loading

LLukas22 commented Jun 6, 2023

spion commented Jun 6, 2023 •

edited

Loading

philpax commented Jun 28, 2023

panic with inference_session split_at #298

panic with inference_session split_at #298

Comments

spion commented Jun 6, 2023 • edited Loading

LLukas22 commented Jun 6, 2023

spion commented Jun 6, 2023 • edited Loading

philpax commented Jun 28, 2023

spion commented Jun 6, 2023 •

edited

Loading

spion commented Jun 6, 2023 •

edited

Loading