Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

training bleu is 0.0 #127

Closed
klauspa opened this issue Jul 15, 2019 · 8 comments
Closed

training bleu is 0.0 #127

klauspa opened this issue Jul 15, 2019 · 8 comments

Comments

@klauspa
Copy link

klauspa commented Jul 15, 2019

I followed the instructions in this repo doing en-fr unsupervised MT using the pretraining mlm model, and after 24 hours of training,
the bleu is 0.0. The parameters set to be:
tokens per batch 200;
batch size 2:

Anything else is the same as the instructions.
my training log:
fr_mt_ppl": 3493.2766576812446, "valid_en-fr_mt_acc": 4.494762971483926, "va lid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 5467.123569852876, "valid_fr- en_mt_acc": 4.613142299283623, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_pp l": 3884.4537842660484, "test_en-fr_mt_acc": 4.106139624415247, "test_en-fr_ mt_bleu": 0.0, "test_fr-en_mt_ppl": 6325.922849001634, "test_fr-en_mt_acc": 3.9660845355606176, "test_fr-en_mt_bleu": 0.0}

I only used one single GPU with 12GB memory. If I only have one GPU rtx 2080Ti with 12GB memory, how can I get good result and how many hours do I need?

@glample
Copy link
Contributor

glample commented Jul 15, 2019

The batch size here is too small, you can't expect to have good results this way. You can try to use a larger batch size or set: --accumulate_gradients 4 to multiply the effective batch size by 4. This way, the model will accumulate gradients for 4 forward/backward before doing an optimizer step.

@klauspa
Copy link
Author

klauspa commented Jul 15, 2019

The batch size here is too small, you can't expect to have good results this way. You can try to use a larger batch size or set: --accumulate_gradients 4 to multiply the effective batch size by 4. This way, the model will accumulate gradients for 4 forward/backward before doing an optimizer step.

Thanks, but if I use a large batch size say 4 or 8, it will pop out CUDA out of memory...
So if I set --accumulate_gradients 4 , will it cost more memory of my GPU?
And will the parameter (tokens per bacth ) affect the training results?

@glample
Copy link
Contributor

glample commented Jul 15, 2019

No, it won't affect GPU memory. --accumulate_gradient N is basically designed to have N times larger batches, for the same memory.

--batch_size is ignored if --tokens_per_batch is specified. For MT, it is better to use --tokens_per_batch, and to the largest possible value that fits in memory.

@glample
Copy link
Contributor

glample commented Jul 15, 2019

That being said, a perplexity of 3000 is abnormally high, I suspect there is something wrong in your setting. Can you provide your full training log? I will have a look at it.

@klauspa
Copy link
Author

klauspa commented Jul 15, 2019

That being said, a perplexity of 3000 is abnormally high, I suspect there is something wrong in your setting. Can you provide your full training log? I will have a look at it.

INFO - 07/14/19 11:17:37 - 0:00:00 - ============ Initialized logger ============
INFO - 07/14/19 11:17:37 - 0:00:00 - ae_steps: ['en', 'fr']
asm: False
attention_dropout: 0.1
batch_size: 1
beam_size: 1
bptt: 256
bt_src_langs: ['en', 'fr']
bt_steps: [('en', 'fr', 'en'), ('fr', 'en', 'fr')]
clip_grad_norm: 5
clm_steps: []
command: python train.py --exp_name unsupMT_enfr --dump_path './dumped/' --reload_model 'mlm_enfr_1024.pth,mlm_enfr_1024.pth' --data_path './data/processed/en-fr/' --lgs 'en-fr' --ae_steps 'en,fr' --bt_steps 'en-fr-en,fr-en-fr' --word_shuffle 3 --word_dropout '0.1' --word_blank '0.1' --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --tokens_per_batch 200 --batch_size 1 --bptt 256 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001' --epoch_size 7 --eval_bleu true --stopping_criterion 'valid_en-fr_mt_bleu,10' --validation_metrics 'valid_en-fr_mt_bleu' --exp_id "xv7ewg9r3h"
context_size: 0
data_path: ./data/processed/en-fr/
debug_slurm: False
debug_train: False
dropout: 0.1
dump_path: ./dumped/unsupMT_enfr/xv7ewg9r3h
early_stopping: False
emb_dim: 1024
encoder_only: False
epoch_size: 7
eval_bleu: True
eval_only: False
exp_id: xv7ewg9r3h
exp_name: unsupMT_enfr
fp16: False
gelu_activation: True
global_rank: 0
group_by_size: True
id2lang: {0: 'en', 1: 'fr'}
is_master: True
is_slurm_job: False
lambda_ae: 0:1,100000:0.1,300000:0
lambda_bt: 1
lambda_clm: 1
lambda_mlm: 1
lambda_mt: 1
lambda_pc: 1
lang2id: {'en': 0, 'fr': 1}
langs: ['en', 'fr']
length_penalty: 1
lg_sampling_factor: -1
lgs: en-fr
local_rank: 0
master_port: -1
max_batch_size: 0
max_epoch: 100000
max_len: 100
max_vocab: -1
min_count: 0
mlm_steps: []
mono_dataset: {'en': {'train': './data/processed/en-fr/train.en.pth', 'valid': './data/processed/en-fr/valid.en.pth', 'test': './data/processed/en-fr/test.en.pth'}, 'fr': {'train': './data/processed/en-fr/train.fr.pth', 'valid': './data/processed/en-fr/valid.fr.pth', 'test': './data/processed/en-fr/test.fr.pth'}}
mt_steps: []
multi_gpu: False
multi_node: False
n_gpu_per_node: 1
n_heads: 8
n_langs: 2
n_layers: 6
n_nodes: 1
node_id: 0
optimizer: adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001
para_dataset: {('en', 'fr'): {'valid': ('./data/processed/en-fr/valid.en-fr.en.pth', './data/processed/en-fr/valid.en-fr.fr.pth'), 'test': ('./data/processed/en-fr/test.en-fr.en.pth', './data/processed/en-fr/test.en-fr.fr.pth')}}
pc_steps: []
reload_checkpoint:
reload_emb:
reload_model: mlm_enfr_1024.pth,mlm_enfr_1024.pth
sample_alpha: 0
save_periodic: 0
share_inout_emb: True
sinusoidal_embeddings: False
split_data: False
stopping_criterion: valid_en-fr_mt_bleu,10
tokens_per_batch: 200
validation_metrics: valid_en-fr_mt_bleu
word_blank: 0.1
word_dropout: 0.1
word_keep: 0.1
word_mask: 0.8
word_mask_keep_rand: 0.8,0.1,0.1
word_pred: 0.15
word_rand: 0.1
word_shuffle: 3.0
world_size: 1
INFO - 07/14/19 11:17:37 - 0:00:00 - The experiment will be stored in ./dumped/unsupMT_enfr/xv7ewg9r3h

INFO - 07/14/19 11:17:37 - 0:00:00 - Running command: python train.py --exp_name unsupMT_enfr --dump_path './dumped/' --reload_model 'mlm_enfr_1024.pth,mlm_enfr_1024.pth' --data_path './data/processed/en-fr/' --lgs 'en-fr' --ae_steps 'en,fr' --bt_steps 'en-fr-en,fr-en-fr' --word_shuffle 3 --word_dropout '0.1' --word_blank '0.1' --lambda_ae '0:1,100000:0.1,300000:0' --encoder_only false --emb_dim 1024 --n_layers 6 --n_heads 8 --dropout '0.1' --attention_dropout '0.1' --gelu_activation true --tokens_per_batch 200 --batch_size 1 --bptt 256 --optimizer 'adam_inverse_sqrt,beta1=0.9,beta2=0.98,lr=0.0001' --epoch_size 7 --eval_bleu true --stopping_criterion 'valid_en-fr_mt_bleu,10' --validation_metrics 'valid_en-fr_mt_bleu'

WARNING - 07/14/19 11:17:37 - 0:00:00 - Signal handler installed.
INFO - 07/14/19 11:17:37 - 0:00:00 - ============ Monolingual data (en)
INFO - 07/14/19 11:17:37 - 0:00:00 - Loading data from ./data/processed/en-fr/train.en.pth ...
INFO - 07/14/19 11:17:38 - 0:00:01 - 129033877 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:39 - 0:00:02 - Removed 0 empty sentences.
INFO - 07/14/19 11:17:39 - 0:00:02 - Removed 12831 too long sentences.

INFO - 07/14/19 11:17:39 - 0:00:02 - Loading data from ./data/processed/en-fr/valid.en.pth ...
INFO - 07/14/19 11:17:39 - 0:00:02 - 69727 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.

INFO - 07/14/19 11:17:39 - 0:00:02 - Loading data from ./data/processed/en-fr/test.en.pth ...
INFO - 07/14/19 11:17:39 - 0:00:02 - 76017 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 07/14/19 11:17:40 - 0:00:02 - ============ Monolingual data (fr)
INFO - 07/14/19 11:17:40 - 0:00:02 - Loading data from ./data/processed/en-fr/train.fr.pth ...
INFO - 07/14/19 11:17:40 - 0:00:03 - 130884578 words (64139 unique) in 5000000 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:41 - 0:00:04 - Removed 0 empty sentences.
INFO - 07/14/19 11:17:42 - 0:00:05 - Removed 17108 too long sentences.

INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/valid.fr.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 79585 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.

INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/test.fr.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 86351 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.

INFO - 07/14/19 11:17:42 - 0:00:05 - ============ Parallel data (en-fr)
INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/valid.en-fr.en.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 69727 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/valid.en-fr.fr.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 79585 words (64139 unique) in 3000 sentences. 1 unknown words (1 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:42 - 0:00:05 - Removed 0 empty sentences.

INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/test.en-fr.en.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 76017 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:42 - 0:00:05 - Loading data from ./data/processed/en-fr/test.en-fr.fr.pth ...
INFO - 07/14/19 11:17:42 - 0:00:05 - 86351 words (64139 unique) in 3003 sentences. 0 unknown words (0 unique) covering 0.00% of the data.
INFO - 07/14/19 11:17:43 - 0:00:06 - Removed 0 empty sentences.

INFO - 07/14/19 11:17:43 - 0:00:06 - ============ Data summary
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - train - en: 5000000
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - valid - en: 3000
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - test - en: 3003
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - train - fr: 5000000
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - valid - fr: 3000
INFO - 07/14/19 11:17:43 - 0:00:06 - Monolingual data - test - fr: 3003
INFO - 07/14/19 11:17:43 - 0:00:06 - Parallel data - valid - en-fr: 3000
INFO - 07/14/19 11:17:43 - 0:00:06 - Parallel data - test - en-fr: 3003

INFO - 07/14/19 11:17:48 - 0:00:11 - Reloading encoder from mlm_enfr_1024.pth ...
INFO - 07/14/19 11:17:54 - 0:00:17 - Reloading decoder from mlm_enfr_1024.pth ...
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.0.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.0.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.0.out_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.1.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.1.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.1.out_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.2.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.2.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.2.out_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.3.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.3.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.3.out_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.4.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.4.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.4.out_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.5.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter layer_norm15.5.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.q_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.q_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.k_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.k_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.v_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.v_lin.bias not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.out_lin.weight not found.
WARNING - 07/14/19 11:17:55 - 0:00:18 - Parameter encoder_attn.5.out_lin.bias not found.
DEBUG - 07/14/19 11:17:56 - 0:00:19 - Encoder: TransformerModel(
(position_embeddings): Embedding(512, 1024)
(lang_embeddings): Embedding(2, 1024)
(embeddings): Embedding(64139, 1024, padding_idx=2)
(layer_norm_emb): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(attentions): ModuleList(
(0): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(2): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(3): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(4): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(5): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
)
(layer_norm1): ModuleList(
(0): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(1): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(2): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(3): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(4): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(5): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
)
(ffns): ModuleList(
(0): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(1): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(2): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(3): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(4): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(5): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
)
(layer_norm2): ModuleList(
(0): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(1): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(2): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(3): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(4): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(5): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
)
(pred_layer): PredLayer(
(proj): Linear(in_features=1024, out_features=64139, bias=True)
)
)
DEBUG - 07/14/19 11:17:56 - 0:00:19 - Decoder: TransformerModel(
(position_embeddings): Embedding(512, 1024)
(lang_embeddings): Embedding(2, 1024)
(embeddings): Embedding(64139, 1024, padding_idx=2)
(layer_norm_emb): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(attentions): ModuleList(
(0): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(2): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(3): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(4): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(5): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
)
(layer_norm1): ModuleList(
(0): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(1): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(2): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(3): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(4): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(5): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
)
(ffns): ModuleList(
(0): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(1): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(2): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(3): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(4): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
(5): TransformerFFN(
(lin1): Linear(in_features=1024, out_features=4096, bias=True)
(lin2): Linear(in_features=4096, out_features=1024, bias=True)
)
)
(layer_norm2): ModuleList(
(0): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(1): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(2): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(3): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(4): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(5): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
)
(layer_norm15): ModuleList(
(0): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(1): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(2): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(3): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(4): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
(5): LayerNorm(torch.Size([1024]), eps=1e-12, elementwise_affine=True)
)
(encoder_attn): ModuleList(
(0): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(2): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(3): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(4): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
(5): MultiHeadAttention(
(q_lin): Linear(in_features=1024, out_features=1024, bias=True)
(k_lin): Linear(in_features=1024, out_features=1024, bias=True)
(v_lin): Linear(in_features=1024, out_features=1024, bias=True)
(out_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
)
(pred_layer): PredLayer(
(proj): Linear(in_features=1024, out_features=64139, bias=True)
)
)
INFO - 07/14/19 11:17:56 - 0:00:19 - Number of parameters (encoder): 141848203
INFO - 07/14/19 11:17:56 - 0:00:19 - Number of parameters (decoder): 167050891
INFO - 07/14/19 11:17:59 - 0:00:22 - ============ Starting epoch 0 ... ============
INFO - 07/14/19 11:17:59 - 0:00:22 - Creating new training data iterator (ae,fr) ...
INFO - 07/14/19 11:18:04 - 0:00:27 - Creating new training data iterator (ae,en) ...
INFO - 07/14/19 11:18:10 - 0:00:32 - Creating new training data iterator (bt,en) ...
INFO - 07/14/19 11:18:15 - 0:00:38 - Creating new training data iterator (bt,fr) ...
INFO - 07/14/19 11:18:23 - 0:00:46 - ============ End of epoch 0 ============
INFO - 07/14/19 11:47:21 - 0:29:44 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp0.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 12:19:23 - 1:01:46 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp0.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 12:50:08 - 1:32:31 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp0.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp0.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - epoch -> 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_en-fr_mt_ppl -> 78172.218551
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_en-fr_mt_acc -> 3.607193
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_fr-en_mt_ppl -> 33934.308693
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_fr-en_mt_acc -> 3.521388
INFO - 07/14/19 13:24:07 - 2:06:30 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - test_en-fr_mt_ppl -> 89573.972415
INFO - 07/14/19 13:24:07 - 2:06:30 - test_en-fr_mt_acc -> 3.169416
INFO - 07/14/19 13:24:07 - 2:06:30 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - test_fr-en_mt_ppl -> 38548.154644
INFO - 07/14/19 13:24:07 - 2:06:30 - test_fr-en_mt_acc -> 3.056188
INFO - 07/14/19 13:24:07 - 2:06:30 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - log:{"epoch": 0, "valid_en-fr_mt_ppl": 78172.21855144881, "valid_en-fr_mt_acc": 3.6071925894532906, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 33934.30869291317, "valid_fr-en_mt_acc": 3.5213882052057697, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 89573.97241452952, "test_en-fr_mt_acc": 3.169416030619782, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 38548.15464370883, "test_fr-en_mt_acc": 3.056188306757783, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 13:24:07 - 2:06:30 - New best score for valid_en-fr_mt_bleu: 0.000000
INFO - 07/14/19 13:24:07 - 2:06:30 - Saving models to ./dumped/unsupMT_enfr/xv7ewg9r3h/best-valid_en-fr_mt_bleu.pth ...
INFO - 07/14/19 13:24:09 - 2:06:32 - New best validation score: 0.000000
INFO - 07/14/19 13:24:09 - 2:06:32 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 13:24:29 - 2:06:51 - ============ Starting epoch 1 ... ============
INFO - 07/14/19 13:24:32 - 2:06:55 - ============ End of epoch 1 ============
INFO - 07/14/19 13:53:30 - 2:35:53 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp1.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 14:25:37 - 3:08:00 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp1.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 14:56:28 - 3:38:51 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp1.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp1.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - epoch -> 1.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_en-fr_mt_ppl -> 51439.354819
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_en-fr_mt_acc -> 3.414664
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_fr-en_mt_ppl -> 28891.413743
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_fr-en_mt_acc -> 3.414138
INFO - 07/14/19 15:30:33 - 4:12:56 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - test_en-fr_mt_ppl -> 60137.298878
INFO - 07/14/19 15:30:33 - 4:12:56 - test_en-fr_mt_acc -> 3.069812
INFO - 07/14/19 15:30:33 - 4:12:56 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - test_fr-en_mt_ppl -> 33063.678543
INFO - 07/14/19 15:30:33 - 4:12:56 - test_fr-en_mt_acc -> 3.029613
INFO - 07/14/19 15:30:33 - 4:12:56 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 15:30:33 - 4:12:56 - log:{"epoch": 1, "valid_en-fr_mt_ppl": 51439.3548189653, "valid_en-fr_mt_acc": 3.4146636798450083, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 28891.413742776, "valid_fr-en_mt_acc": 3.4141378030167613, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 60137.298878200985, "test_en-fr_mt_acc": 3.0698122076236096, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 33063.67854333258, "test_fr-en_mt_acc": 3.0296127562642368, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 15:30:33 - 4:12:56 - Not a better validation score (0 / 10).
INFO - 07/14/19 15:30:33 - 4:12:56 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 15:31:08 - 4:13:31 - ============ Starting epoch 2 ... ============
INFO - 07/14/19 15:31:09 - 4:13:32 - 5 - 0.01 sent/s - 0.25 words/s - AE-en: 11.4866 || AE-fr: 10.7005 || BT-en-fr-en: 11.6970 || BT-fr-en-fr: 11.1315 - Transformer LR = 5.9950e-07
INFO - 07/14/19 15:31:11 - 4:13:34 - ============ End of epoch 2 ============
INFO - 07/14/19 16:00:06 - 4:42:29 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp2.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 16:32:04 - 5:14:27 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp2.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 17:02:55 - 5:45:17 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp2.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp2.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - epoch -> 2.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_en-fr_mt_ppl -> 33696.728042
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_en-fr_mt_acc -> 3.297209
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_fr-en_mt_ppl -> 23863.039088
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_fr-en_mt_acc -> 3.313762
INFO - 07/14/19 17:36:55 - 6:19:18 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - test_en-fr_mt_ppl -> 40212.181027
INFO - 07/14/19 17:36:55 - 6:19:18 - test_en-fr_mt_acc -> 2.963494
INFO - 07/14/19 17:36:55 - 6:19:18 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - test_fr-en_mt_ppl -> 27587.781359
INFO - 07/14/19 17:36:55 - 6:19:18 - test_fr-en_mt_acc -> 3.005568
INFO - 07/14/19 17:36:55 - 6:19:18 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 17:36:55 - 6:19:18 - log:{"epoch": 2, "valid_en-fr_mt_ppl": 33696.72804207433, "valid_en-fr_mt_acc": 3.2972089362475026, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 23863.03908844499, "valid_fr-en_mt_acc": 3.3137624266090997, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 40212.18102650115, "test_en-fr_mt_acc": 2.963493520155785, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 27587.78135854254, "test_fr-en_mt_acc": 3.0055682105796, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 17:36:55 - 6:19:18 - Not a better validation score (1 / 10).
INFO - 07/14/19 17:36:55 - 6:19:18 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 17:37:30 - 6:19:53 - ============ Starting epoch 3 ... ============
INFO - 07/14/19 17:37:33 - 6:19:56 - ============ End of epoch 3 ============
INFO - 07/14/19 18:06:40 - 6:49:03 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp3.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 18:38:38 - 7:21:01 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp3.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 19:09:13 - 7:51:36 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp3.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp3.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - epoch -> 3.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_en-fr_mt_ppl -> 24020.307470
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_en-fr_mt_acc -> 3.297209
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_fr-en_mt_ppl -> 20019.421493
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_fr-en_mt_acc -> 3.381138
INFO - 07/14/19 19:43:11 - 8:25:34 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - test_en-fr_mt_ppl -> 29041.393149
INFO - 07/14/19 19:43:11 - 8:25:34 - test_en-fr_mt_acc -> 2.998187
INFO - 07/14/19 19:43:11 - 8:25:34 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - test_fr-en_mt_ppl -> 23429.594274
INFO - 07/14/19 19:43:11 - 8:25:34 - test_fr-en_mt_acc -> 3.057454
INFO - 07/14/19 19:43:11 - 8:25:34 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 19:43:11 - 8:25:34 - log:{"epoch": 3, "valid_en-fr_mt_ppl": 24020.30746951213, "valid_en-fr_mt_acc": 3.2972089362475026, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 20019.42149299307, "valid_fr-en_mt_acc": 3.3811376792662973, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 29041.39314900709, "test_en-fr_mt_acc": 2.998186986592654, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 23429.59427396973, "test_fr-en_mt_acc": 3.0574538091622374, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 19:43:11 - 8:25:34 - Not a better validation score (2 / 10).
INFO - 07/14/19 19:43:11 - 8:25:34 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 19:43:46 - 8:26:09 - ============ Starting epoch 4 ... ============
INFO - 07/14/19 19:43:50 - 8:26:13 - 10 - 0.01 sent/s - 0.25 words/s - AE-en: 11.2054 || AE-fr: 10.0691 || BT-en-fr-en: 11.1217 || BT-fr-en-fr: 10.7256 - Transformer LR = 1.0990e-06
INFO - 07/14/19 19:43:50 - 8:26:13 - ============ End of epoch 4 ============
INFO - 07/14/19 20:12:51 - 8:55:14 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp4.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 20:44:39 - 9:27:02 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp4.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 21:15:33 - 9:57:56 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp4.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp4.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - epoch -> 4.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_en-fr_mt_ppl -> 17960.647183
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_en-fr_mt_acc -> 3.469153
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_fr-en_mt_ppl -> 17026.154602
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_fr-en_mt_acc -> 3.693264
INFO - 07/14/19 21:49:29 - 10:31:52 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - test_en-fr_mt_ppl -> 21697.033088
INFO - 07/14/19 21:49:29 - 10:31:52 - test_en-fr_mt_acc -> 3.176131
INFO - 07/14/19 21:49:29 - 10:31:52 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - test_fr-en_mt_ppl -> 20192.031151
INFO - 07/14/19 21:49:29 - 10:31:52 - test_fr-en_mt_acc -> 3.301696
INFO - 07/14/19 21:49:29 - 10:31:52 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 21:49:29 - 10:31:52 - log:{"epoch": 4, "valid_en-fr_mt_ppl": 17960.64718327272, "valid_en-fr_mt_acc": 3.4691529938850882, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 17026.154602048337, "valid_fr-en_mt_acc": 3.6932638497394366, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 21697.03308805659, "test_en-fr_mt_acc": 3.176130895091434, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 20192.03115070738, "test_fr-en_mt_acc": 3.301695773221969, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 21:49:29 - 10:31:52 - Not a better validation score (3 / 10).
INFO - 07/14/19 21:49:29 - 10:31:52 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 21:50:04 - 10:32:27 - ============ Starting epoch 5 ... ============
INFO - 07/14/19 21:50:07 - 10:32:30 - ============ End of epoch 5 ============
INFO - 07/14/19 22:19:04 - 11:01:27 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp5.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/14/19 22:50:57 - 11:33:20 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp5.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/14/19 23:21:54 - 12:04:17 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp5.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp5.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - epoch -> 5.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_en-fr_mt_ppl -> 13387.415655
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_en-fr_mt_acc -> 3.701641
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_fr-en_mt_ppl -> 14505.218636
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_fr-en_mt_acc -> 4.097515
INFO - 07/14/19 23:55:58 - 12:38:21 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - test_en-fr_mt_ppl -> 16011.727034
INFO - 07/14/19 23:55:58 - 12:38:21 - test_en-fr_mt_acc -> 3.415628
INFO - 07/14/19 23:55:58 - 12:38:21 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - test_fr-en_mt_ppl -> 17351.368420
INFO - 07/14/19 23:55:58 - 12:38:21 - test_fr-en_mt_acc -> 3.558593
INFO - 07/14/19 23:55:58 - 12:38:21 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/14/19 23:55:58 - 12:38:21 - log:{"epoch": 5, "valid_en-fr_mt_ppl": 13387.415655332086, "valid_en-fr_mt_acc": 3.701640733789429, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 14505.218636136915, "valid_fr-en_mt_acc": 4.0975153656826215, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 16011.727033717063, "test_en-fr_mt_acc": 3.4156277279136917, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 17351.368420159128, "test_fr-en_mt_acc": 3.5585927613262465, "test_fr-en_mt_bleu": 0.0}
INFO - 07/14/19 23:55:58 - 12:38:21 - Not a better validation score (4 / 10).
INFO - 07/14/19 23:55:58 - 12:38:21 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/14/19 23:56:33 - 12:38:56 - ============ Starting epoch 6 ... ============
INFO - 07/14/19 23:56:36 - 12:38:59 - ============ End of epoch 6 ============
INFO - 07/15/19 00:25:30 - 13:07:53 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp6.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 00:57:18 - 13:39:41 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp6.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 01:28:08 - 14:10:31 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp6.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp6.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - epoch -> 6.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_en-fr_mt_ppl -> 9790.193192
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_en-fr_mt_acc -> 4.012835
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_fr-en_mt_ppl -> 11885.510150
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_fr-en_mt_acc -> 4.474267
INFO - 07/15/19 02:02:09 - 14:44:32 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - test_en-fr_mt_ppl -> 11630.164783
INFO - 07/15/19 02:02:09 - 14:44:32 - test_en-fr_mt_acc -> 3.742418
INFO - 07/15/19 02:02:09 - 14:44:32 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - test_fr-en_mt_ppl -> 14271.040436
INFO - 07/15/19 02:02:09 - 14:44:32 - test_fr-en_mt_acc -> 3.805366
INFO - 07/15/19 02:02:09 - 14:44:32 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 02:02:09 - 14:44:32 - log:{"epoch": 6, "valid_en-fr_mt_ppl": 9790.193192288563, "valid_en-fr_mt_acc": 4.012835260640552, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 11885.510149860494, "valid_fr-en_mt_acc": 4.47426677850042, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 11630.164782594029, "test_en-fr_mt_acc": 3.742417798867426, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 14271.040436087778, "test_fr-en_mt_acc": 3.8053657301948873, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 02:02:09 - 14:44:32 - Not a better validation score (5 / 10).
INFO - 07/15/19 02:02:09 - 14:44:32 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/15/19 02:02:41 - 14:45:03 - ============ Starting epoch 7 ... ============
INFO - 07/15/19 02:02:42 - 14:45:05 - 15 - 0.01 sent/s - 0.16 words/s - AE-en: 10.3636 || AE-fr: 9.9233 || BT-en-fr-en: 10.0328 || BT-fr-en-fr: 9.6186 - Transformer LR = 1.5985e-06
INFO - 07/15/19 02:02:44 - 14:45:07 - ============ End of epoch 7 ============
INFO - 07/15/19 02:31:42 - 15:14:05 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp7.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 03:03:34 - 15:45:57 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp7.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 03:34:23 - 16:16:46 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp7.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp7.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - epoch -> 7.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_en-fr_mt_ppl -> 7118.051481
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_en-fr_mt_acc -> 4.278017
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_fr-en_mt_ppl -> 9572.317196
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_fr-en_mt_acc -> 4.805643
INFO - 07/15/19 04:08:10 - 16:50:33 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - test_en-fr_mt_ppl -> 8366.590882
INFO - 07/15/19 04:08:10 - 16:50:33 - test_en-fr_mt_acc -> 3.989749
INFO - 07/15/19 04:08:10 - 16:50:33 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - test_fr-en_mt_ppl -> 11473.332213
INFO - 07/15/19 04:08:10 - 16:50:33 - test_fr-en_mt_acc -> 4.058466
INFO - 07/15/19 04:08:10 - 16:50:33 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 04:08:10 - 16:50:33 - log:{"epoch": 7, "valid_en-fr_mt_ppl": 7118.051480971656, "valid_en-fr_mt_acc": 4.278016588968941, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 9572.317195584694, "valid_fr-en_mt_acc": 4.8056430211613295, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 8366.590882278508, "test_en-fr_mt_acc": 3.9897486402399447, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 11473.332212996045, "test_fr-en_mt_acc": 4.058466211085801, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 04:08:10 - 16:50:33 - Not a better validation score (6 / 10).
INFO - 07/15/19 04:08:10 - 16:50:33 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/15/19 04:08:37 - 16:51:00 - ============ Starting epoch 8 ... ============
INFO - 07/15/19 04:08:40 - 16:51:03 - ============ End of epoch 8 ============
INFO - 07/15/19 04:37:35 - 17:19:58 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp8.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 05:09:22 - 17:51:45 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp8.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 05:40:03 - 18:22:26 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp8.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp8.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - epoch -> 8.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_en-fr_mt_ppl -> 5562.593479
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_en-fr_mt_acc -> 4.456015
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_fr-en_mt_ppl -> 7883.397323
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_fr-en_mt_acc -> 4.890893
INFO - 07/15/19 06:13:45 - 18:56:08 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - test_en-fr_mt_ppl -> 6434.850350
INFO - 07/15/19 06:13:45 - 18:56:08 - test_en-fr_mt_acc -> 4.071446
INFO - 07/15/19 06:13:45 - 18:56:08 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - test_fr-en_mt_ppl -> 9390.370733
INFO - 07/15/19 06:13:45 - 18:56:08 - test_fr-en_mt_acc -> 4.143255
INFO - 07/15/19 06:13:45 - 18:56:08 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 06:13:45 - 18:56:08 - log:{"epoch": 8, "valid_en-fr_mt_ppl": 5562.59347883444, "valid_en-fr_mt_acc": 4.456015014833202, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 7883.397323258652, "valid_fr-en_mt_acc": 4.890893340850028, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 6434.850350124967, "test_en-fr_mt_acc": 4.071446157978378, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 9390.370732726276, "test_fr-en_mt_acc": 4.143254872184257, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 06:13:45 - 18:56:08 - Not a better validation score (7 / 10).
INFO - 07/15/19 06:13:45 - 18:56:08 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/15/19 06:14:12 - 18:56:35 - ============ Starting epoch 9 ... ============
INFO - 07/15/19 06:14:16 - 18:56:39 - 20 - 0.01 sent/s - 0.26 words/s - AE-en: 9.3154 || AE-fr: 9.1324 || BT-en-fr-en: 9.2382 || BT-fr-en-fr: 8.6216 - Transformer LR = 2.0980e-06
INFO - 07/15/19 06:14:16 - 18:56:39 - ============ End of epoch 9 ============
INFO - 07/15/19 06:43:04 - 19:25:27 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp9.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 07:14:53 - 19:57:16 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp9.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 07:45:44 - 20:28:07 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp9.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp9.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - epoch -> 9.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_en-fr_mt_ppl -> 4453.637764
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_en-fr_mt_acc -> 4.493552
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_fr-en_mt_ppl -> 6798.224234
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_fr-en_mt_acc -> 4.908768
INFO - 07/15/19 08:19:42 - 21:02:05 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - test_en-fr_mt_ppl -> 5051.477208
INFO - 07/15/19 08:19:42 - 21:02:05 - test_en-fr_mt_acc -> 4.097186
INFO - 07/15/19 08:19:42 - 21:02:05 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - test_fr-en_mt_ppl -> 8020.909294
INFO - 07/15/19 08:19:42 - 21:02:05 - test_fr-en_mt_acc -> 4.149582
INFO - 07/15/19 08:19:42 - 21:02:05 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 08:19:42 - 21:02:05 - log:{"epoch": 9, "valid_en-fr_mt_ppl": 4453.637763575909, "valid_en-fr_mt_acc": 4.493552097838591, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 6798.224234060134, "valid_fr-en_mt_acc": 4.90876840788153, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 5051.47720823547, "test_en-fr_mt_acc": 4.097186471786378, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 8020.90929361458, "test_fr-en_mt_acc": 4.14958238420653, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 08:19:42 - 21:02:05 - Not a better validation score (8 / 10).
INFO - 07/15/19 08:19:42 - 21:02:05 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/15/19 08:20:11 - 21:02:34 - ============ Starting epoch 10 ... ============
INFO - 07/15/19 08:20:14 - 21:02:37 - ============ End of epoch 10 ============
INFO - 07/15/19 08:49:07 - 21:31:30 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp10.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 09:20:52 - 22:03:15 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp10.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 09:52:00 - 22:34:23 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp10.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp10.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - epoch -> 10.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_en-fr_mt_ppl -> 3850.768983
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_en-fr_mt_acc -> 4.495974
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_fr-en_mt_ppl -> 5947.139298
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_fr-en_mt_acc -> 4.822143
INFO - 07/15/19 10:25:51 - 23:08:14 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - test_en-fr_mt_ppl -> 4318.181498
INFO - 07/15/19 10:25:51 - 23:08:14 - test_en-fr_mt_acc -> 4.106140
INFO - 07/15/19 10:25:51 - 23:08:14 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - test_fr-en_mt_ppl -> 6940.131671
INFO - 07/15/19 10:25:51 - 23:08:14 - test_fr-en_mt_acc -> 4.097697
INFO - 07/15/19 10:25:51 - 23:08:14 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 10:25:51 - 23:08:14 - log:{"epoch": 10, "valid_en-fr_mt_ppl": 3850.7689832461947, "valid_en-fr_mt_acc": 4.495973845129261, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 5947.139298465951, "valid_fr-en_mt_acc": 4.822143083036561, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 4318.181497902532, "test_en-fr_mt_acc": 4.106139624415247, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 6940.131671242484, "test_fr-en_mt_acc": 4.097696785623893, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 10:25:51 - 23:08:14 - Not a better validation score (9 / 10).
INFO - 07/15/19 10:25:51 - 23:08:14 - Saving checkpoint to ./dumped/unsupMT_enfr/xv7ewg9r3h/checkpoint.pth ...
INFO - 07/15/19 10:26:20 - 23:08:42 - ============ Starting epoch 11 ... ============
INFO - 07/15/19 10:26:23 - 23:08:46 - ============ End of epoch 11 ============
INFO - 07/15/19 10:54:54 - 23:37:17 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp11.en-fr.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.valid.txt : 0.000000
INFO - 07/15/19 11:26:22 - 1 day, 0:08:45 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp11.fr-en.valid.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.valid.txt : 0.000000
INFO - 07/15/19 11:56:46 - 1 day, 0:39:09 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp11.en-fr.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.en-fr.test.txt : 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - BLEU ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/hyp11.fr-en.test.txt ./dumped/unsupMT_enfr/xv7ewg9r3h/hypotheses/ref.fr-en.test.txt : 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - epoch -> 11.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_en-fr_mt_ppl -> 3493.276658
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_en-fr_mt_acc -> 4.494763
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_fr-en_mt_ppl -> 5467.123570
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_fr-en_mt_acc -> 4.613142
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - valid_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_en-fr_mt_ppl -> 3884.453784
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_en-fr_mt_acc -> 4.106140
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_en-fr_mt_bleu -> 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_fr-en_mt_ppl -> 6325.922849
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_fr-en_mt_acc -> 3.966085
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - test_fr-en_mt_bleu -> 0.000000
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - log:{"epoch": 11, "valid_en-fr_mt_ppl": 3493.2766576812446, "valid_en-fr_mt_acc": 4.494762971483926, "valid_en-fr_mt_bleu": 0.0, "valid_fr-en_mt_ppl": 5467.123569852876, "valid_fr-en_mt_acc": 4.613142299283623, "valid_fr-en_mt_bleu": 0.0, "test_en-fr_mt_ppl": 3884.4537842660484, "test_en-fr_mt_acc": 4.106139624415247, "test_en-fr_mt_bleu": 0.0, "test_fr-en_mt_ppl": 6325.922849001634, "test_fr-en_mt_acc": 3.9660845355606176, "test_fr-en_mt_bleu": 0.0}
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - Not a better validation score (10 / 10).
INFO - 07/15/19 12:30:12 - 1 day, 1:12:34 - Stopping criterion has been below its best value for more than 10 epochs. Ending the experiment...

@aconneau
Copy link

Your issue comes from the "--epoch_size 7" parameter, which means at each epoch your model only sees 7 samples. So currently there is no training as can be noted in the logs "beginning of epoch" followed by "end of epoch".
Please keep the default value for this parameter, that is "--epoch_size 200000" (or 100000). If you want to set the maximum number of epochs please use the "--max_epoch" parameter.

@klauspa
Copy link
Author

klauspa commented Jul 17, 2019

Your issue comes from the "--epoch_size 7" parameter, which means at each epoch your model only sees 7 samples. So currently there is no training as can be noted in the logs "beginning of epoch" followed by "end of epoch".
Please keep the default value for this parameter, that is "--epoch_size 200000" (or 100000). If you want to set the maximum number of epochs please use the "--max_epoch" parameter.

Thanks I'll give it a try

@glample
Copy link
Contributor

glample commented Jul 18, 2019

Closing for now, feel free to re-open if you have more issues.

@glample glample closed this as completed Jul 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants