training bleu is 0.0 #127
Comments
The batch size here is too small, you can't expect to have good results this way. You can try to use a larger batch size or set: |
Thanks, but if I use a large batch size say 4 or 8, it will pop out CUDA out of memory... |
No, it won't affect GPU memory.
|
That being said, a perplexity of 3000 is abnormally high, I suspect there is something wrong in your setting. Can you provide your full training log? I will have a look at it. |
INFO - 07/14/19 11:17:37 - 0:00:00 - ============ Initialized logger ============ INFO - 07/14/19 11:17:37 - 0:00:00 - Running command: python train.py --exp_name unsupMT_enfr --dump_path './dumped/' --reload_model 'mlm_enfr_1024.pth,mlm_enfr_1024.pth' --data_path './data/processed/en-fr/' --lgs 'en-fr' --ae_steps 'en,fr' --bt_steps 'en-fr-en,fr-en-fr' --word_shuffle 3 --word_dropout '0.1' --word_blank '0.1' --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --tokens_per_batch 200 --batch_size 1 --bptt 256 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001' --epoch_size 7 --eval_bleu true --stopping_criterion 'valid_en-fr_mt_bleu,10' --validation_metrics 'valid_en-fr_mt_bleu' WARNING - 07/14/19 11:17:37 - 0:00:00 - Signal handler installed. INFO - 07/14/19 11:17:39 - 0:00:02 - Loading data from ./data/processed/en-fr/valid.en.pth ... INFO - 07/14/19 11:17:39 - 0:00:02 - Loading data from ./data/processed/en-fr/test.en.pth ... INFO - 07/14/19 11:17:40 - 0:00:02 - ============ Monolingual data (fr) INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/valid.fr.pth ... INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/test.fr.pth ... INFO - 07/14/19 11:17:42 - 0:00:05 - ============ Parallel data (en-fr) INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/test.en-fr.en.pth ... INFO - 07/14/19 11:17:43 - 0:00:06 - ============ Data summary INFO - 07/14/19 11:17:48 - 0:00:11 - Reloading encoder from mlm_enfr_1024.pth ... |
Your issue comes from the "--epoch_size 7" parameter, which means at each epoch your model only sees 7 samples. So currently there is no training as can be noted in the logs "beginning of epoch" followed by "end of epoch". |
Thanks I'll give it a try |
Closing for now, feel free to re-open if you have more issues. |
I followed the instructions in this repo doing en-fr unsupervised MT using the pretraining mlm model, and after 24 hours of training,
the bleu is 0.0. The parameters set to be:
tokens per batch 200;
batch size 2:
Anything else is the same as the instructions.
my training log:
fr_mt_ppl": 3493.2766576812446, "valid_en-fr_mt_acc": 4.494762971483926, "va lid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 5467.123569852876, "valid_fr- en_mt_acc": 4.613142299283623, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_pp l": 3884.4537842660484, "test_en-fr_mt_acc": 4.106139624415247, "test_en-fr_ mt_bleu": 0.0, "test_fr-en_mt_ppl": 6325.922849001634, "test_fr-en_mt_acc": 3.9660845355606176, "test_fr-en_mt_bleu": 0.0}
I only used one single GPU with 12GB memory. If I only have one GPU rtx 2080Ti with 12GB memory, how can I get good result and how many hours do I need?
The text was updated successfully, but these errors were encountered: