Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salad salad salad salad salad salad salad salad #46

Closed
fdelapena opened this issue Feb 19, 2021 · 8 comments
Closed

salad salad salad salad salad salad salad salad #46

fdelapena opened this issue Feb 19, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@fdelapena
Copy link

salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad salad sala salad salad salad salad salad salad salad salad salad salad salad sala sala sala sala

Captura de pantalla de 2021-02-18 18-45-47

@randallmoraes
Copy link

salad ?

@PJ-Finlay
Copy link
Contributor

Looks like it likes salad.

This is an Argos Translate issue. I reproduced it and the sentence boundary detection and tokenization look fine. Argos Translate uses a Transformer as its sequence to sequence model. The model is a black box that can sometimes have weird outputs. If you post on the OpenNMT forum you might get a better answer. The PyTorch port for the training scripts is almost done which will have a larger model and more training resources but I'm not sure when an updated Spanish model will get trained. The new model will likely fix this specific issue and hopefully have fewer similar ones.

image

@fdelapena
Copy link
Author

fdelapena commented Feb 19, 2021

Thanks, I'll try posting about this there.
As a remark, it seems the text output in the Argos Translate you shown, it looks slightly different. Note the "sala sala sala sala sala" (without d) is not the same count and positioning. I guess the training data or iteration count were not the same.

Update: I've found the following post, not sure if related, with some proposals: https://forum.opennmt.net/t/repeated-phrases-in-the-translation/4155

@PJ-Finlay
Copy link
Contributor

In general I don't think Argos Translate has deterministic translations the model itself was only trained once but for some reason sometimes comes up with different results. Based on the CTranslate Python docs it doesn't look like CTranslate supports the lock_ngram_repeat param they're talking about in the linked forum post.

@guillaumekln
Copy link

The training data mostly contains full sentences. So the model is good at translating sentences. But here the input is a single word which is a different task. If you want the model to perform well on these inputs, you should add such examples in the training data.

(I'm the author of CTranslate2. Feel free to tag me if you have any questions or issues. We are here to help.)

@pierotofy
Copy link
Member

lol, I had a giggle at this :)

Hey @guillaumekln ✋ glad to have you here! CTranslate2 is pretty amazing.

@pierotofy pierotofy added bug Something isn't working help wanted labels Feb 19, 2021
PJ-Finlay added a commit to argosopentech/argos-train that referenced this issue Feb 20, 2021
@PJ-Finlay
Copy link
Contributor

I added a Wiktionary scraping script so hopefully future models will do this better.

@bruno-kakele
Copy link

Hi @PJ-Finlay , sorry to tag you here, but for some reason my post was flagged as spam in the community forums: https://community.libretranslate.com/t/odd-translation-behavior-repeating-words/827

If I understand correctly, I need to release a more recent model for a language that includes the wiktionary data? How do I know if a language uses the Wiktextract data? (Based on this: Argos Open Tech , I cannot tell). The data-index.json seems to be outdated (can't find some languages there).

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants