Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Tokenizer to Separate Package #55

Closed
jonthegeek opened this issue Nov 4, 2020 · 2 comments
Closed

Move Tokenizer to Separate Package #55

jonthegeek opened this issue Nov 4, 2020 · 2 comments

Comments

@jonthegeek
Copy link
Collaborator

As an RBERT dev, I'd like the wordpiece tokenizer to be in its own optimizable package, so that I don't have to think about it.

@jonthegeek
Copy link
Collaborator Author

What we need is (mostly) in sentencepiece::wordpiece_encode. I plan to pull that out, but for now we should use that.

@jonathanbratt jonathanbratt mentioned this issue Jan 19, 2021
@jonathanbratt
Copy link
Owner

In the tf2 branch, this is done, using our wordpiece package now on CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants