[TextGeneration] Fix llama tokenizer #1635

dsikka · 2024-03-14T20:19:18Z

Tested code:

import deepsparse

MODEL_ID = "hf:nm-testing/llama2-7B-sparse70-retrained-ultrachat200k-pruned70-smoothquant-ds"
#MODEL_ID = "zoo:mistral-7b-ultrachat200k_mistral_pretrain-pruned40_quantized"

pipe = deepsparse.Pipeline.create(
    task="text-generation",
    model_path=MODEL_ID,
    sequence_length=512,
    prompt_sequence_length=16,
)

message = "Once upon a time"

conversation = []
conversation.append({"role": "user", "content": message})
formatted_conversation = pipe.tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

generation_config = {
    "max_new_tokens": 100,
}

inference = pipe(
    sequences=formatted_conversation,
    generation_config=generation_config,
    streaming=True,
)

for token in inference:
    print(token.generations[0].text, end="")

Output:


There was a time when the world was a different place. A time when people were more accepting of each other and didn't judge based on race, religion, or gender. A time when kindness and compassion were the norm, and hate and prejudice were unheard of.

But then something changed. The world became more divided, and people started to see each other through a

src/deepsparse/transformers/pipelines/text_generation/prep_for_generation.py

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py

mgoin

Thanks for answering my questions, nice implementation

* add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom>

dbogunowicz

A test would be nice to have, but I guess the priority is to land this asap.

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py

* [TextGeneration] Fix llama tokenizer (#1635) * add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> * Retire `flaky` in favour of `pytest-rerunfailures` (#1628) * pick up another fix and bump up version to 1.7.1 --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: dhuang <dhuang@MacBook-Pro-2.local>

dsikka and others added 2 commits March 14, 2024 19:53

add llama tokenizer fix

fd3b0d8

fix generated string

210c49c

dsikka requested review from mgoin, bfineran and dbogunowicz March 14, 2024 20:19

Dipika Sikka added 2 commits March 14, 2024 20:26

only run for streaming

15eb5ca

add TODO

791d7ad

bfineran approved these changes Mar 14, 2024

View reviewed changes

mgoin reviewed Mar 14, 2024

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation/prep_for_generation.py Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py Show resolved Hide resolved

dsikka requested a review from mgoin March 14, 2024 21:39

mgoin approved these changes Mar 14, 2024

View reviewed changes

mgoin merged commit 9bac61e into main Mar 14, 2024
13 checks passed

mgoin deleted the fix_llama_tokenizer branch March 14, 2024 21:51

dhuangnm pushed a commit that referenced this pull request Mar 14, 2024

[TextGeneration] Fix llama tokenizer (#1635)

f2a06c5

* add llama tokenizer fix * fix generated string * only run for streaming * add TODO --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom>

dbogunowicz reviewed Mar 15, 2024

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation/process_outputs.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TextGeneration] Fix llama tokenizer #1635

[TextGeneration] Fix llama tokenizer #1635

dsikka commented Mar 14, 2024 •

edited

Loading

mgoin left a comment

dbogunowicz left a comment

[TextGeneration] Fix llama tokenizer #1635

[TextGeneration] Fix llama tokenizer #1635

Conversation

dsikka commented Mar 14, 2024 • edited Loading

mgoin left a comment

Choose a reason for hiding this comment

dbogunowicz left a comment

Choose a reason for hiding this comment

dsikka commented Mar 14, 2024 •

edited

Loading