-
-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salad salad salad salad salad salad salad salad #46
Comments
salad ? |
Looks like it likes salad. This is an Argos Translate issue. I reproduced it and the sentence boundary detection and tokenization look fine. Argos Translate uses a Transformer as its sequence to sequence model. The model is a black box that can sometimes have weird outputs. If you post on the OpenNMT forum you might get a better answer. The PyTorch port for the training scripts is almost done which will have a larger model and more training resources but I'm not sure when an updated Spanish model will get trained. The new model will likely fix this specific issue and hopefully have fewer similar ones. |
Thanks, I'll try posting about this there. Update: I've found the following post, not sure if related, with some proposals: https://forum.opennmt.net/t/repeated-phrases-in-the-translation/4155 |
In general I don't think Argos Translate has deterministic translations the model itself was only trained once but for some reason sometimes comes up with different results. Based on the CTranslate Python docs it doesn't look like CTranslate supports the |
The training data mostly contains full sentences. So the model is good at translating sentences. But here the input is a single word which is a different task. If you want the model to perform well on these inputs, you should add such examples in the training data. (I'm the author of CTranslate2. Feel free to tag me if you have any questions or issues. We are here to help.) |
lol, I had a giggle at this :) Hey @guillaumekln ✋ glad to have you here! CTranslate2 is pretty amazing. |
I added a Wiktionary scraping script so hopefully future models will do this better. |
Hi @PJ-Finlay , sorry to tag you here, but for some reason my post was flagged as spam in the community forums: https://community.libretranslate.com/t/odd-translation-behavior-repeating-words/827 If I understand correctly, I need to release a more recent model for a language that includes the wiktionary data? How do I know if a language uses the Wiktextract data? (Based on this: Argos Open Tech , I cannot tell). The data-index.json seems to be outdated (can't find some languages there). Thanks in advance |
salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad sala sala sala sala
The text was updated successfully, but these errors were encountered: