separate initial loading and per-line operations #46

bittlingmayer · 2019-02-28T19:35:49Z

In order to support interactive mode and/or run-time querying from other languages, it would be ideal if the code under if __name__ == '__main__': in a task like embed had an initial load and then processed each line from stdin as soon as it arrived.

The text was updated successfully, but these errors were encountered:

hoschwenk · 2019-02-28T21:03:47Z

This is already the case for the LSTM encoder itself.
The more tricky part is Moses tokenization and fastBPE.
Preloading the model and keeping it in memory would require some (substantial ?) changes of this 3rd party code. An option could be to use named pipes ?
If you can provide a pull request for this option, I'm happy to integrate it.

bittlingmayer · 2019-03-01T14:07:12Z

Thanks for the suggestions

Logically, facebookresearch/UnsupervisedMT has the exact same issue

For tokenisation, my instinct would be to just use a different tokeniser, because LASER is basically agnostic to the tokenisation scheme, as long as it is applied consistently at train time and run time. But that would break compatibility with the current pre-trained models.

For fastBPE, I don't have a great answer, Fairseq (and Sockeye) supports interactive mode and BPE, but the BPE story is not something I would emulate, it's by far the biggest problem with the lib. fastBPE is a small and new lib, so there is some hope that it could evolve from just research to eng. In any case, I opened an issue, maybe you can add something there: glample/fastBPE#10

hoschwenk · 2019-03-01T15:04:11Z

I assume that fairseq (and Sockeye) use Sennrich's BPE in python.
In principle, one should be able to replace fastBPE with another BPE implementation. There are minor differences, but it may be worth trying what is the impact, without retraining the models.
The long term solution I favor is to switch to an unified tokenzation and segmentation approach like sentencepiece. This would make the whole pipeline language agnostic.
I hope to update the models and code in the near future ...

bittlingmayer · 2019-05-13T08:15:10Z

fastBPE now supports this.

See glample/fastBPE#10 (comment)

@glample @loretoparisi

bittlingmayer mentioned this issue Mar 1, 2019

feature request: separate initial loading and per-line operations glample/fastBPE#10

Closed

bittlingmayer mentioned this issue Mar 1, 2019

interactive mode and/or run-time querying facebookresearch/UnsupervisedMT#65

Closed

hoschwenk closed this as completed Mar 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

separate initial loading and per-line operations #46

separate initial loading and per-line operations #46

bittlingmayer commented Feb 28, 2019

hoschwenk commented Feb 28, 2019

bittlingmayer commented Mar 1, 2019

hoschwenk commented Mar 1, 2019

bittlingmayer commented May 13, 2019

separate initial loading and per-line operations #46

separate initial loading and per-line operations #46

Comments

bittlingmayer commented Feb 28, 2019

hoschwenk commented Feb 28, 2019

bittlingmayer commented Mar 1, 2019

hoschwenk commented Mar 1, 2019

bittlingmayer commented May 13, 2019