Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi everybody :)
first off, thank you for the amazing course and for your work on the HF ecosystem!
Here are a couple of fixes related to chapter 6 that I hope could be helpful:
sequence_ids()
(rather thansentence_ids()
) helps mapping each token to the sentence it comes from (perhaps it was calledsentence_ids()
in previous versions?); then I'm adding theindex
keys toresult
so as to align with the provided output_tokenizer
withbackend_tokenizer
(despite being the same) for coherence with 6.4 wherebackend_tokenizer
is referenced('hu', '##g')
and('hu', '##gs')
; score for('hu', '##g')
should rather be 2/45 and this also affects the tokenization of word'hugs'
which is referred to in the next section (also referenced in Potential calculation error on WordPiece section #500)'pug'
(also referenced in https://discuss.huggingface.co/t/chapter-6-questions/11745/30).