Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such dataset implementation None #3

Closed
aquorio15 opened this issue Aug 2, 2022 · 3 comments
Closed

No such dataset implementation None #3

aquorio15 opened this issue Aug 2, 2022 · 3 comments

Comments

@aquorio15
Copy link

aquorio15 commented Aug 2, 2022

Hi, I have been trying to implement the code on MTTT dataset as given in the paper. But while loading the data during fairseq train, I am getting the following error 'No such dataset implementation None' probably while loading the data.
Screenshot from 2022-08-02 15-57-28
Any kind of help would be greatly appreciated

Command line for traing in case i am doing something wrong
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/MMMT.tokenized.en-tr --task visual_text --source-lang en --target-lang tr --target-dict dict.tr.txt --arch visual_text_transformer --image-window 15 --image-stride 10 --image-font-path fairseq/data/visual/fonts/NotoSans-Regular.ttf --image-embed-normalize --image-embed-type 1layer --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --dropout 0.3 --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --max-epoch 50

@esalesky
Copy link
Owner

esalesky commented Aug 2, 2022

My guess is that you need to set the parameter --dataset_impl in your training command.

In the first line in the screenshot, dataset_impl=None. If you're using binarized data, you'll want to set this to mmap, and if you're using raw data (not binarized, images computed when internal dataset representation constructed, which is what I typically did for MTTT because it is relatively small), you can set this to raw.

If you still see an issue, please comment again and I'll do my best to help, and if that works, feel free to close!

@aquorio15
Copy link
Author

aquorio15 commented Aug 3, 2022

Hi thank you for the reply
It is working now, just a small question did you try this model with very small dataset let's say (around 10000 sentence pair)

Also how did you evaluate your trained model.

@esalesky
Copy link
Owner

esalesky commented Aug 3, 2022

Great, glad to hear!

No, I did not try datasets smaller than MTTT. You may want to try adapting a model trained on more data.

We evaluated using BLEU as computed by sacrebleu, after first removing sentencepiece segmentation.

@esalesky esalesky closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants