Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BPE #19

Closed
danieldk opened this issue Sep 27, 2022 · 0 comments · Fixed by #21
Closed

Add support for BPE #19

danieldk opened this issue Sep 27, 2022 · 0 comments · Fixed by #21

Comments

@danieldk
Copy link
Contributor

This is definitely lower priority, but it would be nice to have support for BPE decoding, so that we could support RoBERTa.

danieldk added a commit that referenced this issue Oct 13, 2022
This type of processor applies byte-level BPE encoding. The processor aims for
compatibility with RoBERTa/GPT-2 BPE vocabs.

Fixes #19.
shadeMe added a commit that referenced this issue Oct 14, 2022
* Add ByteBPEProcessor

This type of processor applies byte-level BPE encoding. The processor aims for
compatibility with RoBERTa/GPT-2 BPE vocabs.

Fixes #19.

* Apply improvements

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Fix return type of encode

* Fix doc of encode_as_pieces

* Pass pieces by pair to find_best_pair

* Use range-based for loop

* Add reference to hash_combine docs

* Validate that merges consist of two items

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant