# Memory Information

In [None]:
import psutil
def get_size(bytes, suffix="B"):
    factor = 1024
    for unit in ["", "K", "M", "G", "T", "P"]:
        if bytes < factor:
            return f"{bytes:.2f}{unit}{suffix}"
        bytes /= factor
print("="*40, "Memory Information", "="*40)
svmem = psutil.virtual_memory()
print(f"Total: {get_size(svmem.total)}") ; print(f"Available: {get_size(svmem.available)}")
print(f"Used: {get_size(svmem.used)}") ; print(f"Percentage: {svmem.percent}%")

Total: 25.51GB
Available: 24.60GB
Used: 593.19MB
Percentage: 3.6%


# GPU Information

In [None]:
! nvidia-smi

Thu Dec 10 10:38:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Preparing the Dataset

In the code, we have used **FastText word embeddings** for hindi and **Glove word embeddings** for english. This was facilitated by the `TorchText` library. However, the original implementation of accessing FastText word embeddings doesn't work well in the colab environment. Thus, we have mutated the code by creating the aforementioned class which works well under the colab environment.

In [None]:
! pip install indic-nlp-library
! pip install rouge-score

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchtext

from torchtext.datasets import Multi30k
from torchtext.data import Field, BucketIterator
from torch.utils.tensorboard import SummaryWriter
from torchtext.vocab import Vectors

import numpy as np
import pandas as pd
import random
import os
import nltk
import math
import time
from google.colab import drive
from tqdm.notebook import tqdm
from indicnlp.tokenize import indic_tokenize
from rouge_score import rouge_scorer

nltk.download('punkt')

Collecting indic-nlp-library
  Downloading https://files.pythonhosted.org/packages/2f/51/f4e4542a226055b73a621ad442c16ae2c913d6b497283c99cae7a9661e6c/indic_nlp_library-0.71-py3-none-any.whl
Collecting morfessor
  Downloading https://files.pythonhosted.org/packages/39/e6/7afea30be2ee4d29ce9de0fa53acbb033163615f849515c0b1956ad074ee/Morfessor-2.0.6-py3-none-any.whl
Installing collected packages: morfessor, indic-nlp-library
Successfully installed indic-nlp-library-0.71 morfessor-2.0.6
Collecting rouge-score
  Downloading https://files.pythonhosted.org/packages/1f/56/a81022436c08b9405a5247b71635394d44fe7e1dbedc4b28c740e09c2840/rouge_score-0.0.4-py2.py3-none-any.whl
Installing collected packages: rouge-score
Successfully installed rouge-score-0.0.4
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
SEED = 1234

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

In [None]:
# setting up the configurations
drive.mount('/content/drive')

# setting the device variable
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')

Mounted at /content/drive


##### `def english_tokenize(sentence)`
- `sentence` - Input sentence in `str` format

Uses **NLTK Punkt** Tokenizer to tokenize english sentence.

##### `def hindi_tokenize(sentence)`
- `sentence` - Input sentence in `str` format

Uses **Indic-NLP** library for tokenizing the hindi sentence.

##### `def parse_using_torchtext(csv_file_name, batch_size=16, english_vocab_size=10000, hindi_vocab_size=10000)`
- `csv_file_name`: Path to the csv file to be parsed
- `batch_size`: BucketIterator batch size
- `english_vocab_size`: Size of english vocabulary
- `hindi_vocab_size`: Size of hindi vocabulary

This function helps in producing the vocabulary tokenizer from the dataset file corresponding to the `csv_file_name`. The `csv_file_name` is assumed to have a column for english sentences and another one for hindi sentences. After producing the tokenizer for hindi and english language, it parses the dataset by creating a `BucketIterator` necessary for training the networks efficiently.

##### `def parse_using_field(csv_file_name, english_field, hindi_field, english_col_name, hindi_col_name, batch_size=16)`
- `csv_file_name`: Path to the csv file to be parsed
- `english_field`: English Field object that stores the tokenizer and vocabulary for English
- `hindi_field`: Hindi Field object that stores the tokenizer and vocabulary for Hindi
- `english_col_name`: The column in the csv file that contain english sentences
- `hindi_col_name`: The column in the csv file that contain hindi sentences

This function parses the dataset by creating a `BucketIterator` by using a predefined field (tokenizer). While the previous function generates the tokenizer and the iterators for the dataset, this function only generates the iterator for the dataset. This is extremely useful when the networks are to be trained over multiple datasets. In this scenario, only a single field tokenizer must be used for parsing the datasets to produce consistent results.

##### `def parse_dataset(csv_file_name, english_col_name, hindi_col_name, max_num=None)`
- `csv_file_name`: Path to the csv file to be parsed
- `english_col_name`: The column in the csv file that contain english sentences
- `hindi_col_name`: The column in the csv file that contain hindi sentences
- `max_num`: Maximum number of sentences to be parsed

This function parses the dataset from a csv file in the form of list of list of tokens for both the languages. This is useful for testing the model on the sentences derived from the test set.

In [None]:
# tokenizers for both the languages
def english_tokenize(sentence):
    return [word.lower() for word in nltk.tokenize.word_tokenize(str(sentence))]

def hindi_tokenize(sentence):
    return indic_tokenize.trivial_tokenize(str(sentence), lang='hi')

# for loading the dataset in an appropriate format
def parse_dataset(csv_file_name, english_col_name, hindi_col_name):

    # to be returned
    english_sentences = []
    hindi_sentences = []

    # reading the csv file
    csv_file_df = pd.read_csv(csv_file_name)

    for index, row in tqdm(csv_file_df.iterrows()):
        english_sentences.append(english_tokenize(str(row[english_col_name])))
        hindi_sentences.append(hindi_tokenize(str(row[hindi_col_name])))

    return english_sentences, hindi_sentences

# parsing the dataset using the torchtext utility
def parse_using_torchtext(csv_file_name, batch_size=16):

    # defining the fields
    english_field = torchtext.data.Field(
        sequential=True,
        init_token='<sos>',
        eos_token='<eos>',
        tokenize=english_tokenize,
        batch_first=False
    )
    hindi_field = torchtext.data.Field(
        sequential=True,
        init_token='<sos>',
        eos_token='<eos>',
        tokenize=hindi_tokenize,
        batch_first=False
    )

    # loading the data
    train_data = torchtext.data.TabularDataset.splits(
        path=os.path.dirname(csv_file_name),
        train=os.path.basename(csv_file_name),
        format='csv',
        fields={'english_sentence': ('english_sentence', english_field), 'hindi_sentence': ('hindi_sentence', hindi_field)},
        skip_header=False
    )[0]

    # building the vocabulary
    english_field.build_vocab(train_data, max_size=50000, min_freq=2)
    hindi_field.build_vocab(train_data, max_size=50000, min_freq=2)

    # loading the bucket iterator
    train_iterator = torchtext.data.BucketIterator.splits(
        (train_data,),
        (batch_size,),
        device=device,
        sort_key=lambda x: len(x.english_sentence)
    )[0]

    return english_field, hindi_field, train_data, train_iterator

# construction of bucket iterator using predefined field
def parse_using_field(csv_file_name, english_field, hindi_field, english_col_name, hindi_col_name, batch_size=16):
    
    # loading the data
    train_data = torchtext.data.TabularDataset.splits(
        path=os.path.dirname(csv_file_name),
        train=os.path.basename(csv_file_name),
        format='csv',
        fields={english_col_name: ('english_sentence', english_field), hindi_col_name: ('hindi_sentence', hindi_field)},
        skip_header=False
    )[0]

    # loading the bucket iterator
    train_iterator = torchtext.data.BucketIterator.splits(
        (train_data,),
        (batch_size,),
        device=device,
        sort_key=lambda x: len(x.english_sentence)
    )[0]

    return train_data, train_iterator

# Building the Seq2Seq Model

##### `Class Encoder(nn.Module)`
This class initializes the architecture of the encoder of GRU based cross lingual summarizer. The code for this architecture is sufficiently commented for better understanding of individual statements invoked in this class.

##### `Class Attention(nn.Module)`
This class initializes the architecture of the attention mechanism of GRU based cross lingual summarizer. The code for this architecture is sufficiently commented for better understanding of individual statements invoked in this class.

##### `Class Decoder(nn.Module)`
This class initializes the architecture of the decoder of GRU based cross lingual summarizer. The code for this architecture is sufficiently commented for better understanding of individual statements invoked in this class.

##### `Class Seq2Seq(nn.Module)`
This class initializes the architecture of GRU based cross lingual summarizer using the encoder, decoder and attention classes defined above. The code for this architecture is sufficiently commented for better understanding of individual statements invoked in this class.

In [None]:
class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, enc_hid_dim, dec_hid_dim, dropout):
        super().__init__()
        
        self.embedding = nn.Embedding(input_dim, emb_dim)
        
        self.rnn = nn.GRU(emb_dim, enc_hid_dim, bidirectional = True)
        
        self.fc = nn.Linear(enc_hid_dim * 2, dec_hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src):
        
        #src = [src len, batch size]
        
        embedded = self.dropout(self.embedding(src))
        
        #embedded = [src len, batch size, emb dim]
        
        outputs, hidden = self.rnn(embedded)
                
        #outputs = [src len, batch size, hid dim * num directions]
        #hidden = [n layers * num directions, batch size, hid dim]
        
        #hidden is stacked [forward_1, backward_1, forward_2, backward_2, ...]
        #outputs are always from the last layer
        
        #hidden [-2, :, : ] is the last of the forwards RNN 
        #hidden [-1, :, : ] is the last of the backwards RNN
        
        #initial decoder hidden is final hidden state of the forwards and backwards 
        #  encoder RNNs fed through a linear layer
        hidden = torch.tanh(self.fc(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)))
        
        #outputs = [src len, batch size, enc hid dim * 2]
        #hidden = [batch size, dec hid dim]
        
        return outputs, hidden

class Attention(nn.Module):
    def __init__(self, enc_hid_dim, dec_hid_dim):
        super().__init__()
        
        self.attn = nn.Linear((enc_hid_dim * 2) + dec_hid_dim, dec_hid_dim)
        self.v = nn.Linear(dec_hid_dim, 1, bias = False)
        
    def forward(self, hidden, encoder_outputs):
        
        #hidden = [batch size, dec hid dim]
        #encoder_outputs = [src len, batch size, enc hid dim * 2]
        
        batch_size = encoder_outputs.shape[1]
        src_len = encoder_outputs.shape[0]
        
        #repeat decoder hidden state src_len times
        hidden = hidden.unsqueeze(1).repeat(1, src_len, 1)
        
        encoder_outputs = encoder_outputs.permute(1, 0, 2)
        
        #hidden = [batch size, src len, dec hid dim]
        #encoder_outputs = [batch size, src len, enc hid dim * 2]
        
        energy = torch.tanh(self.attn(torch.cat((hidden, encoder_outputs), dim = 2))) 
        
        #energy = [batch size, src len, dec hid dim]

        attention = self.v(energy).squeeze(2)
        
        #attention= [batch size, src len]
        
        return F.softmax(attention, dim=1)

class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, enc_hid_dim, dec_hid_dim, dropout, attention):
        super().__init__()

        self.output_dim = output_dim
        self.attention = attention
        
        self.embedding = nn.Embedding(output_dim, emb_dim)
        
        self.rnn = nn.GRU((enc_hid_dim * 2) + emb_dim, dec_hid_dim)
        
        self.fc_out = nn.Linear((enc_hid_dim * 2) + dec_hid_dim + emb_dim, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, input, hidden, encoder_outputs):
             
        #input = [batch size]
        #hidden = [batch size, dec hid dim]
        #encoder_outputs = [src len, batch size, enc hid dim * 2]
        
        input = input.unsqueeze(0)
        
        #input = [1, batch size]
        
        embedded = self.dropout(self.embedding(input))
        
        #embedded = [1, batch size, emb dim]
        
        a = self.attention(hidden, encoder_outputs)
                
        #a = [batch size, src len]
        
        a = a.unsqueeze(1)
        
        #a = [batch size, 1, src len]
        
        encoder_outputs = encoder_outputs.permute(1, 0, 2)
        
        #encoder_outputs = [batch size, src len, enc hid dim * 2]
        
        weighted = torch.bmm(a, encoder_outputs)
        
        #weighted = [batch size, 1, enc hid dim * 2]
        
        weighted = weighted.permute(1, 0, 2)
        
        #weighted = [1, batch size, enc hid dim * 2]
        
        rnn_input = torch.cat((embedded, weighted), dim = 2)
        
        #rnn_input = [1, batch size, (enc hid dim * 2) + emb dim]
            
        output, hidden = self.rnn(rnn_input, hidden.unsqueeze(0))
        
        #output = [seq len, batch size, dec hid dim * n directions]
        #hidden = [n layers * n directions, batch size, dec hid dim]
        
        #seq len, n layers and n directions will always be 1 in this decoder, therefore:
        #output = [1, batch size, dec hid dim]
        #hidden = [1, batch size, dec hid dim]
        #this also means that output == hidden
        assert (output == hidden).all()
        
        embedded = embedded.squeeze(0)
        output = output.squeeze(0)
        weighted = weighted.squeeze(0)
        
        prediction = self.fc_out(torch.cat((output, weighted, embedded), dim = 1))
        
        #prediction = [batch size, output dim]
        
        return prediction, hidden.squeeze(0)

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        #src = [src len, batch size]
        #trg = [trg len, batch size]
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use teacher forcing 75% of the time
        
        batch_size = src.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        
        #tensor to store decoder outputs
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
        
        #encoder_outputs is all hidden states of the input sequence, back and forwards
        #hidden is the final forward and backward hidden states, passed through a linear layer
        encoder_outputs, hidden = self.encoder(src)
                
        #first input to the decoder is the <sos> tokens
        input = trg[0,:]
        
        for t in range(1, trg_len):
            
            #insert input token embedding, previous hidden state and all encoder hidden states
            #receive output tensor (predictions) and new hidden state
            output, hidden = self.decoder(input, hidden, encoder_outputs)
            
            #place predictions in a tensor holding predictions for each token
            outputs[t] = output
            
            #decide if we are going to use teacher forcing or not
            teacher_force = random.random() < teacher_forcing_ratio
            
            #get the highest predicted token from our predictions
            top1 = output.argmax(1) 
            
            #if teacher forcing, use actual next token as next input
            #if not, use predicted token
            input = trg[t] if teacher_force else top1

        return outputs

# Training the Seq2Seq Model

In [None]:
# for pretraining using hindEnCorp parallel corpus
english_field, hindi_field, train_data, train_iterator = parse_using_torchtext('drive/MyDrive/cs626_dataset/Hindi_English_Truncated_Corpus.csv')

In [None]:
INPUT_DIM = len(english_field.vocab)
OUTPUT_DIM = len(hindi_field.vocab)
ENC_EMB_DIM = 256
DEC_EMB_DIM = 256
ENC_HID_DIM = 512
DEC_HID_DIM = 512
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5

attn = Attention(ENC_HID_DIM, DEC_HID_DIM)
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, DEC_DROPOUT, attn)

model = Seq2Seq(enc, dec, device).to(device)

We use a simplified version of the weight initialization scheme used in the paper. Here, we will initialize all biases to zero and all weights from $\mathcal{N}(0, 0.01)$.

In [None]:
def init_weights(m):
    for name, param in m.named_parameters():
        if 'weight' in name:
            nn.init.normal_(param.data, mean=0, std=0.01)
        else:
            nn.init.constant_(param.data, 0)
            
model.apply(init_weights)

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(36443, 256)
    (rnn): GRU(256, 512, bidirectional=True)
    (fc): Linear(in_features=1024, out_features=512, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
  (decoder): Decoder(
    (attention): Attention(
      (attn): Linear(in_features=1536, out_features=512, bias=True)
      (v): Linear(in_features=512, out_features=1, bias=False)
    )
    (embedding): Embedding(42896, 256)
    (rnn): GRU(1280, 512)
    (fc_out): Linear(in_features=1792, out_features=42896, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
  )
)

##### `def count_parameters(model)`
- `model`: Seq2Seq model object

This function returns the number of trainable parameters in the model

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')
optimizer = optim.Adam(model.parameters())
TRG_PAD_IDX = hindi_field.vocab.stoi[hindi_field.pad_token]
criterion = nn.CrossEntropyLoss(ignore_index = TRG_PAD_IDX)
# model.load_state_dict(torch.load('model.pt'))

The model has 103,656,592 trainable parameters


#### `def train(model, iterator, optimizer, criterion, clip)`
- `model`: Object of the type `Seq2Seq`
- `iterator`: Iterator object from which training tokenized sentences can be fetched
- `optimizer`: Adam optimizer
- `criterion`: cross entropy loss function
- `clip`: variable used in calculating gradients

We provide a function for training the model. This function uses Adam Optimizer for parameters updation. We have tried to increase the efficiency with regards to GPU memory usage by deleting all the variables initialized in the GPU space once they are used.

In [None]:
def train(model, iterator, optimizer, criterion, clip):
    
    model.train()
    
    epoch_loss = 0
    
    for i, batch in tqdm(enumerate(iterator)):
        src = batch.english_sentence
        trg = batch.hindi_sentence
        
        optimizer.zero_grad()
        
        output = model(src, trg)
        
        #trg = [trg len, batch size]
        #output = [trg len, batch size, output dim]
        
        output_dim = output.shape[-1]
        
        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)
        
        #trg = [(trg len - 1) * batch size]
        #output = [(trg len - 1) * batch size, output dim]
        
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        
        optimizer.step()
        
        epoch_loss += loss.item()
        if i > 450:
          break

        # deleting to save GPU memory
        del src
        del trg
        del output
        del loss
        del i,batch
        
    return epoch_loss / len(iterator)

def evaluate(model, iterator, criterion):
    
    model.eval()
    
    epoch_loss = 0
    
    with torch.no_grad():
    
        for i, batch in tqdm(enumerate(iterator)):

            src = batch.english_sentence
            trg = batch.hindi_sentence

            output = model(src, trg, 0) #turn off teacher forcing

            #trg = [trg len, batch size]
            #output = [trg len, batch size, output dim]

            output_dim = output.shape[-1]
            
            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

            #trg = [(trg len - 1) * batch size]
            #output = [(trg len - 1) * batch size, output dim]

            loss = criterion(output, trg)

            epoch_loss += loss.item()
            if i > 450:
              break
            
            # deleting to save GPU memory
            del src
            del trg
            del output
            del loss
            del i,batch
        
    return epoch_loss / len(iterator)



#### `def produce_output(model, sentence, max_length=100):`
- `model`: Seq2Seq model object 
- `sentence`: Input sentence in `str` or tokenized list format
- `max_length`: Maximum length of the output sentence to be produced.

The model takes input in the form of sequence of numbers where each number denotes the word corresponding to its position in the vocabulary. The model produces output in the form of numbers as well with no change in the interpretations. This function makes our life easier by letting us input in English and this function automatically converts the output to the target langauge. This function was used for the demo and is used extensively for the validation

In [None]:
# function for producing the output sentence for a given input
def produce_output(model, sentence, max_length=100):

    # tokenization using custom function
    if type(sentence) == str:
        tokens = english_tokenize(sentence)
    else:
        tokens = [token.lower() for token in sentence]

    # Add <SOS> and <EOS> in beginning and end respectively
    tokens.insert(0, '<sos>')
    tokens.append('<eos>')

    # Go through each english token and convert to an index
    text_to_indices = [english_field.vocab.stoi[token] for token in tokens]

    # Convert to Tensor
    sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1).to(device)

    outputs = [hindi_field.vocab.stoi["<sos>"]]
    for i in range(max_length):
        trg_tensor = torch.LongTensor(outputs).unsqueeze(1).to(device)

        with torch.no_grad():
            model.eval()
            output = model(sentence_tensor, trg_tensor)

        best_guess = output.argmax(2)[-1, :].item()
        outputs.append(best_guess)

        if best_guess == hindi_field.vocab.stoi["<eos>"]:
            break

    translated_sentence = [hindi_field.vocab.itos[idx] for idx in outputs]

    # remove start token
    return translated_sentence[1:]



##### `def report_performance(model, english_sentences, hindi_sentences)`
- `model`: Seq2Seq model object
- `english_sentences`: Sentences in list of list of tokens format
- `hindi_sentences`: Reference Sentences in list of list of tokens format

This reports average bleu score, rouge1 score and rougeL score for the model by using the hindi_sentences as the reference sentence.

In [None]:
# utilities for performance computation
def report_performance(model, english_sentences, hindi_sentences):

    english_sentences = english_sentences[:100]
    hindi_sentences = hindi_sentences[:100]
    # initializing the ROUGE scorer
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)

    # quantities to be returned
    average_bleu = 0
    average_rouge1 = 0
    average_rougeL = 0

    for (english_sentence, hindi_sentence) in tqdm(zip(english_sentences, hindi_sentences)):
        output_sentence = produce_output(model, english_sentence)

        # bleu score
        average_bleu += nltk.translate.bleu_score.sentence_bleu([hindi_sentence], output_sentence[:-1])

        # rouge_scores
        rouge_obj = scorer.score(' '.join(hindi_sentence), ' '.join(output_sentence[:-1]))
        average_rouge1 += rouge_obj['rouge1'].recall
        average_rougeL += rouge_obj['rougeL'].recall

    # normalizing
    n = 100
    return {'bleu_score': average_bleu / n, 'rouge1_score': average_rouge1 / n, 'rougeL_score': average_rougeL / n}

# timing function

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

We train our model, saving the parameters that give us the best validation loss.

In [None]:
N_EPOCHS = 1
CLIP = 1

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
    print("Epoch number {}".format(epoch))
    start_time = time.time()
    
    train_loss = train(model, train_iterator, optimizer, criterion, CLIP)
    valid_loss = evaluate(model, train_iterator, criterion)
    
    end_time = time.time()
    
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'model.pt')
    
    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    # print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')

Epoch number 0


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Epoch: 01 | Time: 5m 34s
	Train Loss: 0.407 | Train PPL:   1.502


In [None]:
model.load_state_dict(torch.load('model.pt'))
train_loss = evaluate(model, train_iterator, criterion)
print(f'| Test Loss: {train_loss:.3f} | Test PPL: {math.exp(train_loss):7.3f} |')

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

| Test Loss: 0.394 | Test PPL:   1.484 |


# Performance of the Seq2Seq Model for Cross Lingual Summarization

In [None]:
# loading the dataset in the form of tokens
english_sentences, hindi_sentences = parse_dataset('drive/MyDrive/cs626_dataset/CLS_dataset_test.csv', 'text', 'summary')

# loading the model
attn = Attention(ENC_HID_DIM, DEC_HID_DIM)
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, ENC_HID_DIM, DEC_HID_DIM, DEC_DROPOUT, attn)
model = Seq2Seq(enc, dec, device).to(device)
model.load_state_dict(torch.load('model.pt'))

# obtaining the results
report_performance(model, english_sentences, hindi_sentences)

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().





{'bleu_score': 0.2504476933717128, 'rouge1_score': 0.0, 'rougeL_score': 0.0}

# Some examples of the Seq2Seq Model

The Bombay High Court on Monday summoned the Maharashtra Women and Child Development Secretary after 42 children went missing over the last three years from a Mumbai remand home. The court criticised the Maharashtra government for lack of 'pro-active action' in the matter. The Bombay High Court is hearing a PIL on the allegations of corruption in the remand home.

In [None]:
' '.join(produce_output(model, "The Bombay High Court on Monday summoned the Maharashtra Women and Child Development Secretary after 42 children went missing over the last three years from a Mumbai remand home. The court criticised the Maharashtra government for lack of 'pro-active action' in the matter. The Bombay High Court is hearing a PIL on the allegations of corruption in the remand home.")[:-1])

'<unk> इस के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के'

As many as 76 passengers were rescued from cable cars suspended over a river in German city Cologne after a gondola crashed into a support pillar on Sunday. Passengers were left stranded, and children were seen clinging to parents while dangling as many as 40 metres above the river. The fire department lowered them to safety from the cable cars

In [None]:
' '.join(produce_output(model, "As many as 76 passengers were rescued from cable cars suspended over a river in German city Cologne after a gondola crashed into a support pillar on Sunday. Passengers were left stranded, and children were seen clinging to parents while dangling as many as 40 metres above the river. The fire department lowered them to safety from the cable cars")[:-1])

'<unk> यह के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के'

An 11-year-old tribal boy allegedly committed suicide on Tuesday by hanging himself near his school, after he was caught stealing ?30 from his classmate in Maharashtra's Mokhada. The boy was reportedly ashamed of his act and had tried to force a classmate to commit suicide with him, but he refused. Police said the boy has a history of criminal activities.

In [None]:
' '.join(produce_output(model, "An 11-year-old tribal boy allegedly committed suicide on Tuesday by hanging himself near his school, after he was caught stealing ?30 from his classmate in Maharashtra's Mokhada. The boy was reportedly ashamed of his act and had tried to force a classmate to commit suicide with him, but he refused. Police said the boy has a history of criminal activities.")[:-1])

'<unk> यह के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के'

Four labourers on Monday were reportedly injured after a tree branch fell on them at Dombivli station road in Mumbai. They were admitted to hospital with injuries and were later declared out of danger. Reportedly, tree fall cases are on rise in Kalyan-Dombivli. ""Last year fewer cases were reported. We have been getting complaints of tree falls daily,"

In [None]:
' '.join(produce_output(model, "Four labourers on Monday were reportedly injured after a tree branch fell on them at Dombivli station road in Mumbai. They were admitted to hospital with injuries and were later declared out of danger. Reportedly, tree fall cases are on rise in Kalyan-Dombivli. Last year fewer cases were reported. We have been getting complaints of tree falls daily")[:-1])

'<unk> यह के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के के'