### Installing Fairseq

In [None]:
!pip install -q git+https://github.com/One-sixth/fairseq.git sacrebleu tensorboardX

### Pipeline A (Baseline, 10k BPE Operations)

#### Preprocessing Data

In [None]:
%cd /content/drive/MyDrive/NLP_TRANSLATION
!mkdir /content/drive/MyDrive/NLP_TRANSLATION/bin
!mkdir /content/drive/MyDrive/NLP_TRANSLATION/checkpoints

In [None]:
!fairseq-preprocess \
    --source-lang fi --target-lang en \
    --trainpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/data/train.bpe \
    --validpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/data/dev.bpe \
    --testpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/data/test.bpe \
    --destdir /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/bin \
    --workers 20

#### Training Fairseq

In [None]:
!fairseq-train /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/bin \
    --arch transformer_iwslt_de_en \
    --share-decoder-input-output-embed \
    --source-lang fi --target-lang en \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 8192 \
    --update-freq 1 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok space \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-dir /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/checkpoints \
    --max-epoch 30 \
    --patience 10 \
    --fp16 \
    --no-epoch-checkpoints

#### Testing Fairseq

In [None]:
!fairseq-generate /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/bin \
    --path /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/checkpoints/checkpoint_best.pt \
    --batch-size 128 \
    --beam 5 \
    --remove-bpe \
    > /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_10k/results_A_10k.txt

#### Evaluation

We are currently handling evaluation locally. Download the `results_A_10k.txt` file and check the `pipeline_A_10k/score_A_10k.sh` script.

---

### Pipeline A (Baseline, 20k BPE Operations)

#### Preprocessing Data

In [None]:
%cd /content/drive/MyDrive/NLP_TRANSLATION
!mkdir /content/drive/MyDrive/NLP_TRANSLATION/bin
!mkdir /content/drive/MyDrive/NLP_TRANSLATION/checkpoints

In [None]:
!fairseq-preprocess \
    --source-lang fi --target-lang en \
    --trainpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/data/train.bpe \
    --validpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/data/dev.bpe \
    --testpref /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/data/test.bpe \
    --destdir /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/bin \
    --workers 20

#### Training Fairseq

In [None]:
!fairseq-train /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/bin \
    --arch transformer_iwslt_de_en \
    --share-decoder-input-output-embed \
    --source-lang fi --target-lang en \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 8192 \
    --update-freq 1 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok space \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-dir /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/checkpoints \
    --max-epoch 30 \
    --patience 10 \
    --fp16 \
    --no-epoch-checkpoints

#### Testing Fairseq

In [None]:
!fairseq-generate /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/bin \
    --path /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/checkpoints/checkpoint_best.pt \
    --batch-size 128 \
    --beam 5 \
    --remove-bpe \
    > /content/drive/MyDrive/NLP_TRANSLATION/pipeline_A_20k/results_A_20k.txt

#### Evaluation

We are currently handling evaluation locally. Download the `results_A_20k.txt` file and check the `pipeline_A_20k/score_A_20k.sh` script.

---