-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix llama3 urls + chat completion termination + nightlies in readme #443
Conversation
README.md
Outdated
@@ -23,10 +23,24 @@ The 'llama-recipes' repository is a companion to the [Meta Llama 2](https://gith | |||
> | |||
> {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |||
> ``` | |||
> More details on the new tokenizer and prompt template: <PLACEHOLDER_URL> | |||
> | |||
> To signal the end of the current message the model emits the `<\|eot_id\|>` token. To terminate the generation we need to call the model's generate function as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message the model => message, the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation we => generation, we
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso is it like two the two EOS terminators should be used to stop generation early? wonder if that. matches our description here or we need a bit of lingo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the eos_token_id
arg in model.generate
specifying the stop sequence for generation?
I think some lingo around understanding the difference between eot_id and end_of_text usage would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! Yes, the eos_token_id is the one thats checked in the stopping criteria and usually thats set to <|end_of_text|>. But for dialog kind of prompt the model is trained to use <|eot_id|> (probably to distinguish it from the more final end of sequence). Thats why we need to replace the eos_token_id against the latter id. Otherwise generate rambles on like in this example:
Model output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Always answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>
How to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
🛫️🚀🛬<|eot_id|><|start_header_id|>assistant<|end_header_id|>
🏨🛬🇨🇳 🕰️ 12+hours o
💺Business Class
[...]
The model learned that after an <|eot_id|> comes another header so its adds <|start_header_id|>assistant<|end_header_id|> and then comes another response. (The header is usually appended by the chat template, not the model)
If we exchange the eos_token_id in generate it stops after the model emits the first <|eot_id|>:
Model output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Always answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>
How to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
✈️ 🗼️🛬<|eot_id|>
Will rework the text accordingly before merigng.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should have been addressed in this PR which has been merged, https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/4/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mreso just a minor comment if we need to tune the language a bit? if not I am fine with it.
README.md
Outdated
@@ -23,10 +23,24 @@ The 'llama-recipes' repository is a companion to the [Meta Llama 2](https://gith | |||
> | |||
> {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |||
> ``` | |||
> More details on the new tokenizer and prompt template: <PLACEHOLDER_URL> | |||
> | |||
> To signal the end of the current message the model emits the `<\|eot_id\|>` token. To terminate the generation we need to call the model's generate function as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso is it like two the two EOS terminators should be used to stop generation early? wonder if that. matches our description here or we need a bit of lingo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! only added a minor suggestion to include detail on how developers should use eos_tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just left minor comments we discussed offline, otherwise, LGTM thanks!
recipes/inference/local_inference/chat_completion/chat_completion.py
Outdated
Show resolved
Hide resolved
recipes/inference/local_inference/chat_completion/chat_completion.py
Outdated
Show resolved
Hide resolved
…s needed to support eot_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
This PR fix:
Fixes # (issue)
Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!