# Neural Translation Model
by Mac Brennan

## Introduction

In this project I will walk through how to build a translation model that takes in a sentence in French and outputs a sentence in English.The model that will be used is called an encoder-decoder network. What this means is we have two neural networks:
- One called the encoder, that extracts the meaning from the French sentence, representing it as a tensor of numbers.
- One called the decoder that converts that tensor of numbers back into a sentence in English

Our job is to train the encoder and decoder learn to do this in a way that the English sentence that the decoder outputs has the same meaning as the input French sentence. Below is a rough visual of what is happening. We will go into greater detail later.

/***************** **Insert picture of the model**

How do we train the networks? At a basic level the networks are a collection of parameters(random numbers) and it uses these parameters to perform various calculations on the data. We will get into the specifics of what those calculations are soon, but the purpose of training is to tweak the values of the parameters so the models translation gets more and more accurate.

So we have a dataset of French sentences and matching English sentences. We give the model a French sentence and it will give us an English sentence that should match the English translation. We will tell it where it was wrong, the model will then update its parameters to account for its mistakes, and then we will repeat the process. After enough iterations of this, our model will get pretty good at translating.

The first thing we must do is process our data into a format that allows our models to manipulate the data. We need to convert our sentences of words into numbers. Let's take a look at our dataset to see what we need to do.

## Preparing the data

In [18]:
# Before we get started we will load all the packages we will need
import os

# Pytorch
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

#### Load the data

In [3]:
with open('data/small_vocab_en', "r") as f:
    data1 = f.read()
with open('data/small_vocab_fr', "r") as f:
    data2 = f.read()
    
# The data is just in a text file with each sentence on its own line
english_sentences = data1.split('\n')
french_sentences = data2.split('\n')

In [5]:
print('Number of English sentences:', len(english_sentences), 
      '\nNumber of French sentences:', len(french_sentences),'\n')
print('Example/Target pair:\n')
print('  '+english_sentences[2])
print('  '+french_sentences[2])

Number of English sentences: 137861 
Number of French sentences: 137861 

Example/Target pair:

  california is usually quiet during march , and it is usually hot in june .
  california est généralement calme en mars , et il est généralement chaud en juin .


#### Vocabulary
We need to get a word count of each word in the dataset. This will give us a clearer picture of our data


##### Word Embeddings

In [17]:
import numpy as np
with open('data/wiki.en.vec', "r") as f:
    en_vecs = f.readline()
    en_vecs = f.readline()
    print(np.float32(en_vecs.split()[1:]))

[-2.3167e-02 -4.2483e-03 -1.0572e-01  4.2783e-02 -1.4316e-01 -7.8954e-02
  7.8187e-02 -1.9454e-01  2.2303e-02  3.1207e-01  5.7462e-02 -1.1589e-01
  9.6633e-02 -9.3229e-02 -3.4229e-02 -1.4652e-01 -1.1094e-01 -1.1102e-01
  6.7728e-02  1.0023e-01 -6.7413e-02  2.3761e-01 -1.3105e-01 -8.3979e-03
 -1.0593e-01  2.4526e-01  6.5903e-02 -2.3740e-01 -1.0758e-01  5.7082e-03
 -8.1413e-02  2.6264e-01 -5.2461e-02  2.0306e-01  5.0620e-02 -1.8866e-01
 -1.1494e-01 -2.5752e-01  4.6799e-02 -5.0525e-02  6.2650e-02  1.5433e-01
 -5.6289e-02 -4.8437e-02 -9.9688e-02 -3.5332e-02 -9.1647e-02 -8.1151e-02
 -1.0844e-03 -8.4140e-02 -1.3026e-01  1.4980e-02 -8.6276e-02 -5.3041e-02
 -1.0644e-01 -4.2314e-02  8.6469e-02  2.2614e-01 -1.6078e-01  1.8845e-01
  5.3098e-02 -2.1475e-01  1.6699e-01 -1.4442e-01 -1.5930e-01  6.2456e-03
 -7.6630e-02 -9.1568e-02 -2.8984e-01  2.7078e-02  2.1275e-02  2.3939e-02
  1.4903e-01 -3.3062e-01 -9.7811e-02 -3.3814e-02  7.0587e-02  2.3294e-02
  6.5382e-02  1.8716e-01 -1.3444e-01  1.4431e-01 -2

#### Preprocess the data
The example sentencs and label sentences need to be converted to ints

In [5]:
english_sentences[0].split()

['new',
 'jersey',
 'is',
 'sometimes',
 'quiet',
 'during',
 'autumn',
 ',',
 'and',
 'it',
 'is',
 'snowy',
 'in',
 'april',
 '.']

In [20]:
seq_len = 5
batch_size = 1
input_dim = 10
hidden_size = 3
hidden_layers = 1
inputs = autograd.Variable(torch.randn((seq_len, batch_size, input_dim)))  # make a sequence of length 5, 1 batch, input 10dim vector

# initialize the hidden state. hidden layers have 3 nodes
hidden = (autograd.Variable(torch.randn(hidden_layers, batch_size, hidden_size)),
          autograd.Variable(torch.randn((hidden_layers, batch_size, hidden_size))))

In [21]:

lstm = nn.LSTM(input_dim, hidden_size)

In [22]:
out, hidden = lstm(inputs, hidden)

In [23]:
# outputs of each hidden node(3) for each item in sequence(5)
print(out)

# final hidden state and final cell state of sequence; notice that the hidden state equals the final output
print(hidden)

Variable containing:
(0 ,.,.) = 
 -0.1632 -0.0587 -0.0624

(1 ,.,.) = 
 -0.0296 -0.1508  0.0113

(2 ,.,.) = 
  0.0398  0.2324 -0.0406

(3 ,.,.) = 
  0.1305 -0.2526 -0.0687

(4 ,.,.) = 
  0.1295  0.1184 -0.1268
[torch.FloatTensor of size 5x1x3]

(Variable containing:
(0 ,.,.) = 
  0.1295  0.1184 -0.1268
[torch.FloatTensor of size 1x1x3]
, Variable containing:
(0 ,.,.) = 
  0.3113  0.2014 -0.2873
[torch.FloatTensor of size 1x1x3]
)


In [24]:
seq_len = 5
batch_size = 1
input_dim = 10
hidden_size = 3
hidden_layers = 1
num_dir = 2 # for bidirectional lstm
inputs = autograd.Variable(torch.randn((seq_len, batch_size, input_dim)))  # make a sequence of length 5, 1 batch, input 10dim vector

# initialize the hidden state. hidden layers have 3 nodes
hidden = (autograd.Variable(torch.randn(hidden_layers*num_dir, batch_size, hidden_size)),
          autograd.Variable(torch.randn((hidden_layers*num_dir, batch_size, hidden_size))))

In [25]:
lstm = nn.LSTM(input_dim, hidden_size, bidirectional=True)

In [26]:
out, hidden = lstm(inputs, hidden)

In [28]:
# outputs of each hidden node in both directions(3*2) for each item in sequence(5)
print(out)

# final hidden and cell state of model in both directions
# notice that the first 3 output of the final item equals the final first hidden state
# the second 3 outputs from the first item equals the final second hidden state
print(hidden)

Variable containing:
(0 ,.,.) = 
 -0.1693  0.7509 -0.0828 -0.5793  0.5285  0.6648

(1 ,.,.) = 
  0.0218  0.1844 -0.1121 -0.6127  0.2034  0.3170

(2 ,.,.) = 
  0.0731 -0.0683  0.0742 -0.0712  0.1552  0.0725

(3 ,.,.) = 
  0.0098 -0.0052 -0.1215 -0.0741  0.2897  0.0834

(4 ,.,.) = 
 -0.0263 -0.3019 -0.1845 -0.1061  0.4653  0.0832
[torch.FloatTensor of size 5x1x6]

(Variable containing:
(0 ,.,.) = 
 -0.0263 -0.3019 -0.1845

(1 ,.,.) = 
 -0.5793  0.5285  0.6648
[torch.FloatTensor of size 2x1x3]
, Variable containing:
(0 ,.,.) = 
 -0.3161 -0.5384 -0.4495

(1 ,.,.) = 
 -0.7322  0.9997  0.9236
[torch.FloatTensor of size 2x1x3]
)


## Model

### Bi-Directional LSTM Encoder

In [None]:
class EncoderBiLSTM(nn.Module):
    def __init__(self,):
        super(EncoderBiLSTM, self).__init__()

### LSTM Decoder with Attention

In [None]:
class AttnDecoderLSTM(nn.Module):
    def __init__(self,):
        super(AttnDecoderLSTM, self).__init__()

## Training

## Visualizing Attention

In [9]:
a = 5