Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMT sentence re-segmentation #598

Open
EtienneAb3d opened this issue Oct 22, 2021 · 0 comments
Open

MMT sentence re-segmentation #598

EtienneAb3d opened this issue Oct 22, 2021 · 0 comments

Comments

@EtienneAb3d
Copy link

EtienneAb3d commented Oct 22, 2021

Exploring the MMT code, I discovered it is doing a re-segmentation of sentences to process the segmented result as a batch.

First of all, it would be nice to explain somewhere that, when no adaptation is needed, very higher performances can be obtained by sending multiple sentences together. They will be then really processed one by one (not as a whole), but sent as a mini-batch to the GPU. The constraint is that the multi-sentence text should be re-segmentable (for example, with a point at the end of each sentence).

Secondly, this may cause problems: MMT can split a single sentence in several parts, depending on its content, then not translating it as a single sentence. It would be nice to add an option somewhere to prevent from re-segmentation, knowing the input sentence is to be taken as a whole (not to be split).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant