-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
separate initial loading and per-line operations #46
Comments
This is already the case for the LSTM encoder itself. |
Thanks for the suggestions Logically, facebookresearch/UnsupervisedMT has the exact same issue For tokenisation, my instinct would be to just use a different tokeniser, because LASER is basically agnostic to the tokenisation scheme, as long as it is applied consistently at train time and run time. But that would break compatibility with the current pre-trained models. For fastBPE, I don't have a great answer, Fairseq (and Sockeye) supports interactive mode and BPE, but the BPE story is not something I would emulate, it's by far the biggest problem with the lib. fastBPE is a small and new lib, so there is some hope that it could evolve from just research to eng. In any case, I opened an issue, maybe you can add something there: glample/fastBPE#10 |
I assume that fairseq (and Sockeye) use Sennrich's BPE in python. |
fastBPE now supports this. |
In order to support interactive mode and/or run-time querying from other languages, it would be ideal if the code under
if __name__ == '__main__':
in a task like embed had an initial load and then processed each line from stdin as soon as it arrived.The text was updated successfully, but these errors were encountered: