You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.
I'm reading the source code. And I have two questions about vocab and encoder. Please help me with that. Thank you in advance.
For vocab.bpe, I take the second row (Ġ t) for example. But I found "Ġ" appears in many rows(for example the third row). So why isn't it one-to-one correspondence?
Are the items in encoder.json the subtokens from BPE? I take "\u0120regress" for example. Why does "\u0120" appear here?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm reading the source code. And I have two questions about vocab and encoder. Please help me with that. Thank you in advance.
The text was updated successfully, but these errors were encountered: