Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos chapter 6 #547

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AlessandroMiola
Copy link

Hi everybody :)
first off, thank you for the amazing course and for your work on the HF ecosystem!

Here are a couple of fixes related to chapter 6 that I hope could be helpful:

  • 6.2: simple typos
  • 6.3: sequence_ids() (rather than sentence_ids()) helps mapping each token to the sentence it comes from (perhaps it was called sentence_ids() in previous versions?); then I'm adding the index keys to result so as to align with the provided output
  • 6.5: substitution of attribute _tokenizer with backend_tokenizer (despite being the same) for coherence with 6.4 where backend_tokenizer is referenced
  • 6.6: there seems to be an issue in the computation of WordPiece scores of token pairs ('hu', '##g') and ('hu', '##gs'); score for ('hu', '##g') should rather be 2/45 and this also affects the tokenization of word 'hugs' which is referred to in the next section (also referenced in Potential calculation error on WordPiece section #500)
  • 6.7: there seems to be an issue in the computation of the unigram probabilities of word 'pug' (also referenced in https://discuss.huggingface.co/t/chapter-6-questions/11745/30).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants