Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix llama3 urls + chat completion termination + nightlies in readme #443
Fix llama3 urls + chat completion termination + nightlies in readme #443
Changes from 5 commits
16b0fe5
694195f
4afd3ef
98aa2ae
b005bd0
89f6f98
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message the model => message, the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generation we => generation, we
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreso is it like two the two EOS terminators should be used to stop generation early? wonder if that. matches our description here or we need a bit of lingo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
eos_token_id
arg inmodel.generate
specifying the stop sequence for generation?I think some lingo around understanding the difference between eot_id and end_of_text usage would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! Yes, the eos_token_id is the one thats checked in the stopping criteria and usually thats set to <|end_of_text|>. But for dialog kind of prompt the model is trained to use <|eot_id|> (probably to distinguish it from the more final end of sequence). Thats why we need to replace the eos_token_id against the latter id. Otherwise generate rambles on like in this example:
The model learned that after an <|eot_id|> comes another header so its adds <|start_header_id|>assistant<|end_header_id|> and then comes another response. (The header is usually appended by the chat template, not the model)
If we exchange the eos_token_id in generate it stops after the model emits the first <|eot_id|>:
Will rework the text accordingly before merigng.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should have been addressed in this PR which has been merged, https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/4/files