# AI504 Project 2

## To-Do : Find better hyperparameters
The goal of this project is improving the performance of Neural Machine Translation(NMT) system. In this project, you will tune the hyperparameters to achieve higher BLEU score without changing anything else (e.g. architecture, dataset, etc)

In [5]:
from easydict import EasyDict

config = EasyDict({
    "emb_dim":64,
    "ffn_dim":128,
    "attention_heads":8,
    "dropout":0.2568,
    "encoder_layers":3,
    "decoder_layers":2,
    "lr":2.4963e-3,
    "batch_size":4610,
    "nepochs":48,
})

#####      Do not modify      #####
config.max_position_embeddings=512

## Download files
Before execute this code, you should run the template codes first. This code will automatically downloads the state_dict of your model and configuration file which you use for training & evaluation.

Please change the student ID before you run this.

__CAUTION__ : Please run this code with *Google Chrome* browser. 

In [8]:
from google.colab import files
import os

os.environ['STUDENT_ID']="20204871"

if os.path.isdir('result'):
  !rm -rf result

%mkdir result
%mv config.json model.pt result

!zip $STUDENT_ID.zip result/*
# files.download('{}.zip'.format(os.environ['STUDENT_ID']))

  adding: result/config.json (deflated 31%)
  adding: result/model.pt (deflated 8%)


## Template codes (do not modify)
This code is equivalent to the code in [Week 11](https://classum.com/main/course/7726/111). Please refer to codes & descriptions in a link for details.

### Data loader

In [2]:
!pip install --upgrade torchtext
!python -m spacy download de
!python -m spacy download en
!pip install -Iv --upgrade nltk==3.5

import torch
from torchtext.datasets import Multi30k
from torchtext.data import Field, BucketIterator

torch.manual_seed(1234)
torch.cuda.manual_seed_all(1234)

SRC = Field(tokenize = "spacy",
            tokenizer_language="de",
            eos_token = '<eos>',
            lower = True)

TRG = Field(tokenize = "spacy",
            tokenizer_language="en",
            init_token = '<sos>',
            eos_token = '<eos>',
            lower = True)

train_data, valid_data, test_data = Multi30k.splits(exts = ('.de', '.en'),
                                                    fields = (SRC, TRG))

SRC.build_vocab(train_data, min_freq = 3)
TRG.build_vocab(train_data, min_freq = 3)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size = config.batch_size,
    device = device,
    shuffle=False)

PAD_IDX = TRG.vocab.stoi['<pad>']

Collecting torchtext
[?25l  Downloading https://files.pythonhosted.org/packages/23/23/8499af6d9c22b29b01f66a2c11d38ce71cd1cafa2655913c29818ed4a00f/torchtext-0.8.0-cp36-cp36m-manylinux1_x86_64.whl (6.9MB)
[K     |████████████████████████████████| 6.9MB 4.6MB/s 
Installing collected packages: torchtext
  Found existing installation: torchtext 0.3.1
    Uninstalling torchtext-0.3.1:
      Successfully uninstalled torchtext-0.3.1
Successfully installed torchtext-0.8.0
Collecting de_core_news_sm==2.2.5
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.2.5/de_core_news_sm-2.2.5.tar.gz (14.9MB)
[K     |████████████████████████████████| 14.9MB 716kB/s 
Building wheels for collected packages: de-core-news-sm
  Building wheel for de-core-news-sm (setup.py) ... [?25l[?25hdone
  Created wheel for de-core-news-sm: filename=de_core_news_sm-2.2.5-cp36-none-any.whl size=14907056 sha256=fd75c92171ee78ed4c7b8bbea3923bf86aa3126032391b44f4182cf74712c91c



downloading training.tar.gz


training.tar.gz: 100%|██████████| 1.21M/1.21M [00:03<00:00, 323kB/s]


downloading validation.tar.gz


validation.tar.gz: 100%|██████████| 46.3k/46.3k [00:00<00:00, 91.8kB/s]


downloading mmt_task1_test2016.tar.gz


mmt_task1_test2016.tar.gz: 100%|██████████| 66.2k/66.2k [00:00<00:00, 86.3kB/s]


### Load model & optimizer

In [6]:
import torch.nn as nn
import torch.optim as optim

class Transformer(nn.Module):
    def __init__(self, config):
        super(Transformer,self).__init__()
        self.encoder_embedding = nn.Embedding(len(SRC.vocab),config.emb_dim)
        self.decoder_embedding = nn.Embedding(len(TRG.vocab),config.emb_dim)
        self.transformer = nn.Transformer(d_model=config.emb_dim, nhead=config.attention_heads, 
                       num_encoder_layers=config.encoder_layers, num_decoder_layers=config.decoder_layers,
                       dim_feedforward=config.ffn_dim, dropout=config.dropout, activation='gelu')
        self.prediction_head = nn.Linear(config.emb_dim,len(TRG.vocab))
        
    def forward(self, src, trg):
        src_emb = self.encoder_embedding(src)
        trg_emb = self.decoder_embedding(trg)
        output = self.transformer(src_emb, trg_emb,
                       tgt_mask=self.transformer.generate_square_subsequent_mask(trg.size(0)).to(device),
                       src_key_padding_mask=src.eq(PAD_IDX).permute(1,0).to(device),
                       memory_key_padding_mask=src.eq(PAD_IDX).permute(1,0).to(device),
                       tgt_key_padding_mask=trg.eq(PAD_IDX).permute(1,0).to(device))
        prediction = self.prediction_head(output)
        return prediction

CLIP = 1
    
model = Transformer(config)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=config.lr)
criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)

### Train & Evaluation

In [7]:
from nltk.translate.bleu_score import corpus_bleu, sentence_bleu
from tqdm import tqdm
import json

best_valid_loss = float('inf')

def train(model: nn.Module,
          iterator: BucketIterator,
          optimizer: optim.Optimizer,
          criterion: nn.Module,
          clip: float):
    model.train()

    epoch_loss = 0

    for idx, batch in enumerate(iterator):
        src = batch.src
        trg = batch.trg

        optimizer.zero_grad()

        output = model(src, trg)

        output = output[:-1].reshape(-1, output.shape[-1])
        trg = trg[1:].reshape(-1)

        loss = criterion(output, trg)

        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)

        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(iterator)


def evaluate(model: nn.Module,
             iterator: BucketIterator,
             criterion: nn.Module):
    model.eval()

    epoch_loss = 0

    with torch.no_grad():
        for _, batch in enumerate(iterator):
            src = batch.src
            trg = batch.trg
            output = model(src, trg)            
            
            output = output[:-1].reshape(-1, output.shape[-1])
            
            trg = trg[1:].reshape(-1)

            loss = criterion(output, trg)

            epoch_loss += loss.item()

    return epoch_loss / len(iterator)

def measure_BLEU(model: nn.Module,
             iterator: BucketIterator
                ):
    model.eval()
    iterator.batch_size = 1
    BLEU_scores = list()
    
    with torch.no_grad():
        for idx, batch in enumerate(iterator):
            src = batch.src
            trg = batch.trg
            output = model(src, trg)           
            predicted = [TRG.vocab.itos[token] for token in output[:-1].argmax(dim=2).squeeze().tolist() if token!=PAD_IDX]
            GT = [TRG.vocab.itos[token] for token in trg[1:].squeeze().tolist() if token!=PAD_IDX]
            BLEU_scores.append(sentence_bleu([GT], predicted))
    return sum(BLEU_scores)/len(BLEU_scores)
                         
queue=0
for epoch in tqdm(range(config.nepochs), total=config.nepochs):
    train_loss = train(model, train_iterator, optimizer, criterion, CLIP)
    valid_loss = evaluate(model, valid_iterator, criterion)
    test_bleu = measure_BLEU(model, test_iterator)
    print("Test BLEU score : {}".format(test_bleu))
    print("Epoch : {} / Training loss : {} / Validation loss : {}".format(epoch+1, train_loss, valid_loss))

    if best_valid_loss < valid_loss:
        queue+=1
        if queue>1:
            break
    else:
        best_valid_loss = valid_loss
        queue = 0

test_bleu = measure_BLEU(model, test_iterator)
print("Test BLEU score : {}".format(test_bleu))
        
with open('config.json','w') as f:
    json.dump(vars(config),f)
torch.save(model.state_dict(),'model.pt')

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
  2%|▏         | 1/48 [00:23<18:31, 23.66s/it]

Test BLEU score : 0.010262044585956578
Epoch : 1 / Training loss : 5.362898629809183 / Validation loss : 4.265905380249023


  4%|▍         | 2/48 [00:47<18:06, 23.61s/it]

Test BLEU score : 0.027407518816389028
Epoch : 2 / Training loss : 4.0404180118015836 / Validation loss : 3.7211442788441977


  6%|▋         | 3/48 [01:10<17:42, 23.61s/it]

Test BLEU score : 0.04178883427525564
Epoch : 3 / Training loss : 3.649034201152741 / Validation loss : 3.4536314805348716


  8%|▊         | 4/48 [01:34<17:15, 23.54s/it]

Test BLEU score : 0.05326528730047424
Epoch : 4 / Training loss : 3.406846523284912 / Validation loss : 3.2642202377319336


 10%|█         | 5/48 [01:57<16:54, 23.59s/it]

Test BLEU score : 0.06928527777239835
Epoch : 5 / Training loss : 3.212297477419414 / Validation loss : 3.1009174982706704


 12%|█▎        | 6/48 [02:21<16:29, 23.57s/it]

Test BLEU score : 0.09323051455518516
Epoch : 6 / Training loss : 3.0443095782446483 / Validation loss : 2.971119244893392


 15%|█▍        | 7/48 [02:44<16:05, 23.55s/it]

Test BLEU score : 0.11007130464918337
Epoch : 7 / Training loss : 2.902961083820888 / Validation loss : 2.8613253434499106


 17%|█▋        | 8/48 [03:08<15:39, 23.48s/it]

Test BLEU score : 0.11695918078484181
Epoch : 8 / Training loss : 2.7799501343378945 / Validation loss : 2.770535866419474


 19%|█▉        | 9/48 [03:31<15:15, 23.46s/it]

Test BLEU score : 0.13002600651505408
Epoch : 9 / Training loss : 2.6704316782572914 / Validation loss : 2.7012500762939453


 21%|██        | 10/48 [03:55<14:50, 23.44s/it]

Test BLEU score : 0.14190721760302927
Epoch : 10 / Training loss : 2.576233038826594 / Validation loss : 2.6414577960968018


 23%|██▎       | 11/48 [04:18<14:27, 23.44s/it]

Test BLEU score : 0.156736784984016
Epoch : 11 / Training loss : 2.490261709879315 / Validation loss : 2.5863966147104898


 25%|██▌       | 12/48 [04:41<14:01, 23.38s/it]

Test BLEU score : 0.16838024001412813
Epoch : 12 / Training loss : 2.414358922413417 / Validation loss : 2.5322934786478677


 27%|██▋       | 13/48 [05:05<13:39, 23.42s/it]

Test BLEU score : 0.17546919544639988
Epoch : 13 / Training loss : 2.345890306291126 / Validation loss : 2.4907856782277427


 29%|██▉       | 14/48 [05:28<13:16, 23.42s/it]

Test BLEU score : 0.18753788093347626
Epoch : 14 / Training loss : 2.2885946621970525 / Validation loss : 2.4578014612197876


 31%|███▏      | 15/48 [05:52<12:52, 23.42s/it]

Test BLEU score : 0.19174336997755842
Epoch : 15 / Training loss : 2.23069277263823 / Validation loss : 2.428655664126078


 33%|███▎      | 16/48 [06:15<12:25, 23.29s/it]

Test BLEU score : 0.20004678223500505
Epoch : 16 / Training loss : 2.1839477050872076 / Validation loss : 2.4107507467269897


 35%|███▌      | 17/48 [06:38<12:01, 23.28s/it]

Test BLEU score : 0.20093750573012065
Epoch : 17 / Training loss : 2.1386271753008406 / Validation loss : 2.3907997210820517


 38%|███▊      | 18/48 [07:01<11:37, 23.26s/it]

Test BLEU score : 0.21257213748047385
Epoch : 18 / Training loss : 2.0972116353019836 / Validation loss : 2.3674721320470176


 40%|███▉      | 19/48 [07:24<11:16, 23.32s/it]

Test BLEU score : 0.20562000701381897
Epoch : 19 / Training loss : 2.0567974098145014 / Validation loss : 2.3558127085367837


 42%|████▏     | 20/48 [07:48<10:52, 23.32s/it]

Test BLEU score : 0.21451025168983467
Epoch : 20 / Training loss : 2.0191874901453652 / Validation loss : 2.3373883962631226


 44%|████▍     | 21/48 [08:11<10:29, 23.30s/it]

Test BLEU score : 0.22023199397012427
Epoch : 21 / Training loss : 1.9893492081808666 / Validation loss : 2.3271615505218506


 46%|████▌     | 22/48 [08:34<10:03, 23.22s/it]

Test BLEU score : 0.22262641502928893
Epoch : 22 / Training loss : 1.9575032639125036 / Validation loss : 2.316816528638204


 48%|████▊     | 23/48 [08:58<09:43, 23.32s/it]

Test BLEU score : 0.22622540577457723
Epoch : 23 / Training loss : 1.9288215845350236 / Validation loss : 2.313079317410787


 50%|█████     | 24/48 [09:21<09:20, 23.37s/it]

Test BLEU score : 0.22185916237094197
Epoch : 24 / Training loss : 1.9030478416927277 / Validation loss : 2.3021220366160073


 52%|█████▏    | 25/48 [09:45<08:59, 23.47s/it]

Test BLEU score : 0.23050930765280397
Epoch : 25 / Training loss : 1.8775618757520403 / Validation loss : 2.29301385084788


 54%|█████▍    | 26/48 [10:08<08:35, 23.43s/it]

Test BLEU score : 0.22741305133547268
Epoch : 26 / Training loss : 1.8537803850476704 / Validation loss : 2.2942447662353516


 56%|█████▋    | 27/48 [10:31<08:11, 23.41s/it]

Test BLEU score : 0.23400934012011967
Epoch : 27 / Training loss : 1.830574671427409 / Validation loss : 2.282071669896444


 58%|█████▊    | 28/48 [10:55<07:47, 23.40s/it]

Test BLEU score : 0.23217780855281583
Epoch : 28 / Training loss : 1.8098512074304005 / Validation loss : 2.2838473320007324


 60%|██████    | 29/48 [11:18<07:25, 23.44s/it]

Test BLEU score : 0.23194365722265647
Epoch : 29 / Training loss : 1.7897365036464872 / Validation loss : 2.278661370277405


 62%|██████▎   | 30/48 [11:42<07:01, 23.44s/it]

Test BLEU score : 0.2370295336576741
Epoch : 30 / Training loss : 1.7699985750137814 / Validation loss : 2.2696738243103027


 65%|██████▍   | 31/48 [12:05<06:38, 23.46s/it]

Test BLEU score : 0.2373767309466015
Epoch : 31 / Training loss : 1.7473786728722709 / Validation loss : 2.2684088945388794


 67%|██████▋   | 32/48 [12:29<06:15, 23.45s/it]

Test BLEU score : 0.23946872343162895
Epoch : 32 / Training loss : 1.7305705452722215 / Validation loss : 2.271376132965088


 67%|██████▋   | 32/48 [12:52<06:26, 24.15s/it]

Test BLEU score : 0.24151288171215976
Epoch : 33 / Training loss : 1.7156043771713498 / Validation loss : 2.268957257270813





Test BLEU score : 0.24151288171215976
