___


# **Simultaneous Translation**

___

# The project


This is a project made by Iker García for the Advanced applications on language technologies course. This course is taught in the master of natural language processing in the EHU/UPV.

# The idea

In the educational world there exists a big problem, the problem of different languages. For example, our university offers courses in Basque to those who do not know the language can not attend. There are also courses in Spanish that people who come from other countries (Erasmus) can not attend. And these are just some examples, the language barrier is a big problem in the educational world. That’s why my project focuses in trying to solve this problem. My idea is to implement a simultaneous translator that can automatically translate on the go a conversation from one language to another. The application will be able to listen to someone speaking and it will display as text what he has said translated to another language

# How the simultaneous translation words

The simultaneous translation is composed of two modules, the “speech to text” module and the “translation” module. I will focus in the second one

- Speech to text: For the “speech to text” module I will use the Speech recognition API provided by Google (https://cloud.google.com/speech-to-text/) 

- Translator: The translator is based in the Transformer model (Vaswani et al 2017. "Attention Is all you need" https://arxiv.org/abs/1706.03762). I have replicated the model using the Pytorch API. 

- Parallel data: To train the model I have used the OpenSubtitles v2018 corpus available here: http://opus.nlpl.eu/.This corpus contains 64,7M sentences aligned for Spanish and English. 


# Resources used for the implementation

- "Attention Is all you need": https://arxiv.org/abs/1706.03762
- How to code The Transformer in Pytorch by Samuel Lynn-Evans: https://towardsdatascience.com/how-to-code-the-transformer-in-pytorch-24db27c8f9ec
- How to use TorchText for neural machine translation, plus hack to make it 5x faster by Samuel Lynn-Evans: https://towardsdatascience.com/how-to-use-torchtext-for-neural-machine-translation-plus-hack-to-make-it-5x-faster-77f3884d95
- The transformer - Attention is all you need by Michał Chromiak: https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/#.XFW52rpKhhE

- The Illustrated Transformer by Jay Alammar: http://jalammar.github.io/illustrated-transformer/
- The Annotated Transformer by Alexander Rush: http://nlp.seas.harvard.edu/2018/04/03/attention.html



___



# Preliminaries

- Import the necessary modules to run the model
- Test if a GPU is available

### Impor the necessary modules to run the model

In the cell below, we will try to import the modules necessary to run the model if any of then cannot be loaded an exception will be displayed. If this occurs please install the missing module.

In [1]:
import torch
import spacy
import torchtext
from torchtext import data
from torchtext.data import Field, BucketIterator, TabularDataset
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F

from sklearn.model_selection import train_test_split

import math

import unicodedata
import re
import os
import pandas as pd 
import pickle

import random
import datetime

import numpy as np
import copy 

from difflib import SequenceMatcher

from tqdm import tqdm

### Test if a GPU is available

If a GPU is available the output of the cell above will be the name of the GPU that Pytorch will use to run the model. If Pytorch is not able to find an available GPU make sure that Cuda and cuDNN libraries are installed or that the conda environment is correctly configured. More info here: https://pytorch.org/get-started/. The model has been tested with CUDA 10.0 and a NVIDIA RTX 2080 Ti. 

The model has been implemented with NVIDIA CUDA in mind. If you have an AMD GPU you may be able to adapt the code to use the ROCm platform. More info here: https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html. 

If there is no GPU available you may be able to use a pretrained model, however, train the model using the CPU will be very slow. 

In [2]:
USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda" if USE_CUDA else "cpu")
torch.cuda.get_device_name(0)

'GeForce GTX 970M'

### Train or Load model
Decide if we want to train a new model, or we want to load a pretrained model

In [3]:
load_pretrained = False
load_dataset = False
generate_dataset=True

input_lang_corpus = 'OpusCorpus/Europarl.en-es.en'
output_lang_corpus = 'OpusCorpus/Europarl.en-es.es'
train_file = 'OpusCorpus/train.csv'
dev_file = "OpusCorpus/dev.csv"

pretrained_embeddings = ''
pretrained_model = 'weights/checkpoint_7_epoch.pth.tar'

batch_size = 800 


___

# Some usefull fuctions

In this section, we will implement some useful functions to print traces, memory usage... that will help us to monitor and debug our program. 

In [4]:
# Print a message and the date when the message has been printed. 
# We will print use this function to know how long each block of code has taken to run

def printTrace(message):
    print("<"+ str(datetime.datetime.now()) + ">  " +str(message))
    
printTrace("Hello World") 

<2019-05-05 02:27:15.523967>  Hello World



___

# Processing and load data

Here we will define the functions that we will use to load and process the data.
While processing the data (tokenize, remove some characters, filter sentences...) may be an easy task, it is very important that we try to do it as optimized and fast as possible. Otherwise, we will cause a huge bottleneck during training, the is no point in using a powerful GPU for training if the GPU spend more time waiting for the next batch than actually training the model. 

In [5]:
#Transfor every string s to the Ascii format
def unicodeToAscii(s):
    
    sentence = s.split(' ')
    
    
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )


# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.,!?¿])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?,¿]+", r" ", s)
    s = re.sub(r"\s+", r" ", s).strip()
    return s




def tokenize_en(sentence):
    return [tok.text for tok in en.tokenizer(normalizeString(sentence))]
def tokenize_es(sentence):
    return [tok.text for tok in es.tokenizer(normalizeString(sentence))]

printTrace('Downloading the spacy models for tokenization...')

os.system("python -m spacy download en")
os.system("python -m spacy download es")
printTrace('Done.')

en = spacy.load('en')
es = spacy.load('es')

EN_TEXT = Field(tokenize=tokenize_en)
ES_TEXT = Field(tokenize=tokenize_es, init_token = "<sos>", eos_token = "<eos>")

<2019-05-05 02:27:15.535485>  Downloading the spacy models for tokenization...
<2019-05-05 02:27:20.490190>  Done.


In [6]:
if generate_dataset:
    printTrace('Loading the corpus files...')

    #Read the files containing the alinged english and spanish sentences
    original_en_corpus = open(input_lang_corpus).read().split('\n')
    original_es_corpus = open(output_lang_corpus).read().split('\n')

    printTrace('Done, printing some random samples for validation: ')

    r = random.randint(2, (min(len(original_en_corpus)-1,len(original_es_corpus)-1)))

    print('1 - English Sentence: ')
    print(original_en_corpus[r])
    print('1 - Spanish Sentence: ')
    print(original_es_corpus[r])
    print('2 - English Sentence: ')
    print(original_en_corpus[r-1])
    print('2 - Spanish Sentence: ')
    print(original_es_corpus[r-1])
    print('2 - English Sentence: ')
    print(original_en_corpus[r-2])
    print('2 - Spanish Sentence: ')
    print(original_es_corpus[r-2])
    print()


    printTrace('Preprocessing sentences')


    printTrace('Downloading the spacy models for tokenization...')
    printTrace('Done.')
    printTrace('Building the dataset...')

    # Transform the dataset to csv 

    printTrace('Transform the text files to csv format...')
    to_csv = {'English' : [line for line in original_en_corpus], 'Spanish': [line for line in original_es_corpus]}
    df = pd.DataFrame(to_csv, columns=["English", "Spanish"])

    printTrace('Filtering sentences...')
    df['en_len'] = df['English'].str.count(' ')
    df['es_len'] = df['Spanish'].str.count(' ')
    #Remove very long sentences
    df = df.query('en_len < 80 & es_len < 80')
    #Remove sentences where the lengh in the different languages is very different 
    df = df.query('es_len < en_len * 1.5 & es_len * 1.5 > en_len')


    printTrace('Train/dev split...')

    train, val = train_test_split(df, test_size=0.1)

    printTrace('Saving csv files...')

    train.to_csv(train_file, index=False)
    val.to_csv(dev_file, index=False)
    
    del train
    del val
    del df
    del original_en_corpus
    del original_es_corpus
    
    
    



<2019-05-05 02:27:21.569800>  Loading the corpus files...
<2019-05-05 02:27:25.128077>  Done, printing some random samples for validation: 
1 - English Sentence: 
I hope so, and I hope that this will be a strong signal that makes you sit up and listen.
1 - Spanish Sentence: 
Eso espero, y también espero que esto sea una señal fuerte que les haga pararse a escuchar.
2 - English Sentence: 
That is what I hope the Commission will take on board, because I believe that we will get an almost unanimous 'yes' tomorrow.
2 - Spanish Sentence: 
Eso es lo que espero que asuma la Comisión, porque creo que mañana obtendremos un "sí" casi unánime.
2 - English Sentence: 
There are, of course, some details of this resolution that I would have liked to have more of or that I would have liked to be different, but the main thing is not that all of the commas are in place; what is important is our collective will to start the process.
2 - Spanish Sentence: 
Por supuesto, hay algunos detalles de esta resolu

In [7]:
if generate_dataset:
    printTrace('Loading data...')

    data_fields = [('English', EN_TEXT), ('Spanish', ES_TEXT)]
    train,dev = TabularDataset.splits(path='OpusCorpus', train='mini.csv', validation='mini.csv', format='csv', fields=data_fields)

    printTrace('Building vocabulary...')
    ES_TEXT.build_vocab(train, dev)
    EN_TEXT.build_vocab(train, dev)


    print('Test word to index:')
    print(EN_TEXT.vocab.stoi['love'])
    print('Test index to word:')
    print(EN_TEXT.vocab.itos[8])
    
    pickle.dump(ES_TEXT, open('weights/ES_TEXT.pkl', 'wb'))
    pickle.dump(EN_TEXT, open('weights/EN_TEXT.pkl', 'wb'))
    pickle.dump(train, open('weights/train_data.pkl', 'wb'))
    pickle.dump(dev, open('weights/dev_data.pkl', 'wb'))

if load_dataset: 

    printTrace('Loading data...')
    train = pickle.load(open('weights/train_data.pkl', 'rb'))
    dev = pickle.load(open('weights/dev_data.pkl', 'rb'))

    #data_fields = [('English', EN_TEXT), ('Spanish', ES_TEXT)]
    #train,dev = TabularDataset.splits(path='OpusCorpus', train='train.csv', validation='dev.csv', format='csv', fields=data_fields)


if load_pretrained:    

    ES_TEXT = pickle.load(open('weights/ES_TEXT.pkl', 'rb'))
    EN_TEXT = pickle.load(open('weights/EN_TEXT.pkl', 'rb'))

    print('Test word to index:')
    print(EN_TEXT.vocab.stoi['love'])
    print('Test index to word:')
    print(EN_TEXT.vocab.itos[8])
    printTrace('Done')
    

<2019-05-05 02:27:52.099416>  Loading data...
<2019-05-05 02:27:52.122418>  Building vocabulary...
Test word to index:
0
Test index to word:
of


TypeError: 'generator' object is not callable

___

### Fast data iterator
This code has been extracted from: http://nlp.seas.harvard.edu/2018/04/03/attention.html
It allows us to create a very fast itertator to read the training data


In [8]:
global max_src_in_batch, max_tgt_in_batch
def batch_size_fn(new, count, sofar):
    "Keep augmenting batch and calculate total number of tokens + padding."
    global max_src_in_batch, max_tgt_in_batch
    if count == 1:
        max_src_in_batch = 0
        max_tgt_in_batch = 0
    max_src_in_batch = max(max_src_in_batch,  len(new.English))
    max_tgt_in_batch = max(max_tgt_in_batch,  len(new.Spanish) + 2)
    src_elements = count * max_src_in_batch
    tgt_elements = count * max_tgt_in_batch
    #print(max(src_elements, tgt_elements))
    return max(src_elements, tgt_elements)


class MyIterator(data.Iterator):
    def create_batches(self):
        if self.train:
            def pool(d, random_shuffler):
                for p in data.batch(d, self.batch_size * 100):
                    p_batch = data.batch(
                        sorted(p, key=self.sort_key),
                        self.batch_size, self.batch_size_fn)
                    for b in random_shuffler(list(p_batch)):
                        yield b
            self.batches = pool(self.data(), self.random_shuffler)
            
        else:
            self.batches = []
            for b in data.batch(self.data(), self.batch_size,
                                          self.batch_size_fn):
                self.batches.append(sorted(b, key=self.sort_key))

In [9]:

train_iter = None


if load_dataset or generate_dataset:

    printTrace('Generating train iterator')

    train_iter = MyIterator(train, batch_size=batch_size, device=device,
                            repeat=False, sort_key= lambda x:
                            (len(x.English), len(x.Spanish)),
                            batch_size_fn=batch_size_fn, train=True,
                            shuffle=True)
    

    printTrace('Done.')

<2019-05-05 02:28:41.006882>  Generating train iterator
<2019-05-05 02:28:41.007274>  Done.


In [10]:
def create_mask_input(batch):
    input_seq = batch.English.transpose(0,1)
    input_pad = EN_TEXT.vocab.stoi['<pad>']
    input_msk = (input_seq != input_pad).unsqueeze(1)
    
    return input_seq, input_msk

def create_mask_output(batch):
    output_seq = batch.Spanish.transpose(0,1)
    output_pad = ES_TEXT.vocab.stoi['<pad>']
    output_msk = (target_seq != target_pad).unsqueeze(1)
    size = target_seq.size(1)
    #Make sure that the decoder only sees the encoder ouput until the last word predicted. 
    nopeak_mask = np.triu(np.ones(1, size, size),k=1).astype('uint8')
    nopeak_mask = Variable(torch.from_numpy(nopeak_mask) == 0)
    output_msk = target_msk & nopeak_mask
    
    return output_seq, output_msk


def create_masks(src, trg):
    
    src_mask = (src != EN_TEXT.vocab.stoi['<pad>']).unsqueeze(-2)

    if trg is not None:
        trg_mask = (trg != ES_TEXT.vocab.stoi['<pad>']).unsqueeze(-2)
        size = trg.size(1) # get seq_len for matrix
        np_mask = nopeak_mask(size)
        if trg.is_cuda:
            np_mask.cuda()
        trg_mask = trg_mask & np_mask
        
    else:
        trg_mask = None
    return src_mask, trg_mask

def nopeak_mask(size):
    np_mask = np.triu(np.ones((1, size, size)),k=1).astype('uint8')
    np_mask =  Variable(torch.from_numpy(np_mask) == 0)
    np_mask = np_mask.cuda()
    return np_mask







___



___


# The transformer

In this section of the notebook we will define the trasnformer model. 
![TransformerCompleteModel](Images/TheTransformerModel.png)

___

## Word Embeddings

Since we have millions of sentences to use for trainning we will train our own word embeddings with the model.


In [27]:
class Embedder(nn.Module):
    def __init__(self, vocab_size, d_model):
        super().__init__()
        self.emb = nn.Embedding(vocab_size, d_model)
    def forward(self,x):
        print(x)
        return self.emb(x)

## The positional encoding

Since this model does not use recurrent neural networks, we need to give to the model the information of the order of the words in the input and output sentence, otherwise the model will not be able to know that "John gives the pencil to Mery" has a different meaning than "Mary gives the pencil to John". 
To achieve this we will use positional encoding vectors. We will create positional vectors using the formula proposed by Vaswani et al. and we will add this vectors to the regular word embeddings. This way our word embeddings will represent the meaning of the word and the position in the sentence of the words.

![PosEncodingFormula1](Images/PosEncodingFormula1.png)
![PosEncodingFormula1](Images/PosEncodingFormula2.png)

In [28]:
class PositionalEncoder(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_sentence_len=80):
        super().__init__()
        self.d_model = d_model
        self.dropout = nn.Dropout(p=dropout)
        pe = torch.zeros(max_sentence_len, d_model)
        for pos in range(max_sentence_len):
            for i in range(0,d_model,2):
                pe[pos,i] = math.sin(pos/10000.0 ** (2*i/d_model))
                pe[pos,i+1] = math.cos(pos/10000.0 ** ((2 * (i + 1)/d_model)))
                
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe)
        
    def forward(self,x):
        #Escale embeddings to avoid positional vectors to dominate word embeddings and cause loose of information
        x = x * math.sqrt(self.d_model)
        seq_len = x.size(1)
        x = x + Variable(self.pe[:,:seq_len],requires_grad=False).cuda()
        return self.dropout(x)
    
    
    

___

# Multi-Header attention

<img src="Images/MHA.png" width="350">



STEP 1:

<img src="Images/mattention1.png" width="350">


STEP 2: 
<img src="Images/mattention2.png" width="450">

Images from: http://jalammar.github.io/illustrated-transformer/




In [29]:
class  MultiHeadAttention(nn.Module):
    def __init__(self, heads, d_model, dropout = 0.1):
        super().__init__()
        self.d_model = d_model
        self.d_k = d_model // heads
        self.h = heads
        
        self.q_linear = nn.Linear(d_model, d_model)
        self.v_linear = nn.Linear(d_model, d_model)
        self.k_linear = nn.Linear(d_model, d_model)
        self.dropout = nn.Dropout(dropout)
        self.out = nn.Linear(d_model, d_model)
        
        
    def forward(self, q, k, v, mask=None):
        bs = q.size(0)
        
        #STEP 1
        k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
        q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
        v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
       
        k = k.transpose(1,2)
        q = q.transpose(1,2)
        v = v.transpose(1,2)
        
        #STEP 2
        scores = torch.matmul(q, k.transpose(-2, -1)) /  math.sqrt(self.d_k)
        if mask is not None:
            mask = mask.unsqueeze(1)
            #print('scores: ' + str(scores.size()))
            #print('mask: ' + str(mask.size()))
            scores = scores.masked_fill(mask == 0, -1e9)
        scores = F.softmax(scores, dim=-1)
        scores = self.dropout(scores)
        scores = torch.matmul(scores, v)
        
        
        
        concat = scores.transpose(1,2).contiguous().view(bs, -1, self.d_model)
        output = self.out(concat)
        return output



___

# Feed-Forward Network
Simple perceptron with Relu activation

<img src="Images/perceptron.gif" width="250">


In [30]:
class FeedForward(nn.Module):
    def __init__(self, d_model, d_ff=2048, dropout=0.1):
        super().__init__()
        self.w_1 = nn.Linear(d_model, d_ff)
        self.w_2 = nn.Linear(d_ff, d_model)
        self.dropout = nn.Dropout(dropout)
    def forward(self, x):
        r = F.relu(self.w_1(x))
        r = self.dropout(r)
        r = self.w_2(r)
        return r
        
        

___
# Normalization

We will normalioze the returns between each layer in the encoder/decoder

<img src="Images/Normalization.png" width="450">

In [31]:
class Norm(nn.Module):
    def __init__(self, d_model, eps = 1e-6):
        super().__init__()
        self.size = d_model
        self.alpha = nn.Parameter(torch.ones(self.size))
        self.bias = nn.Parameter(torch.zeros(self.size))
        self.eps = eps
        
    def forward(self,x):
        mean = x.mean(-1, keepdim=True)
        std = x.std(-1, keepdim=True)
        return self.alpha * (x - mean) / (std + self.eps) + self.bias

___

# Encoder Layer
<img src="Images/encoder.png" width="650">
Image from: http://jalammar.github.io/illustrated-transformer/

In [32]:
class EncoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout = 0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.att = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model)
        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        
    def forward(self, x, mask):
        x2 = self.norm_1(x)
        self_att = self.att(x2,x2,x2,mask)
        self_att = self.dropout_1(self_att)
        x = x + self_att
        
        x2  = self.norm_2(x)
        feed_forward = self.ff(x2)
        feed_forward = self.dropout_2(feed_forward)
        x = x + feed_forward
        return x
    

___

# Decoder Layer
<img src="Images/decoder.png" width="750">
Image from: http://jalammar.github.io/illustrated-transformer/

In [33]:
class DecoderLayer(nn.Module):
    def __init__(self, d_model, heads, dropout=0.1):
        super().__init__()
        self.norm_1 = Norm(d_model)
        self.norm_2 = Norm(d_model)
        self.norm_3 = Norm(d_model)

        self.dropout_1 = nn.Dropout(dropout)
        self.dropout_2 = nn.Dropout(dropout)
        self.dropout_3 = nn.Dropout(dropout)

        self.attn_1 = MultiHeadAttention(heads, d_model)
        self.attn_2 = MultiHeadAttention(heads, d_model)
        self.ff = FeedForward(d_model).cuda()
        
        
    def forward(self, x, e_outputs, src_mask, trg_mask):

        x2 = self.norm_1(x)
        self_att = self.attn_1(x2, x2, x2, trg_mask)
        self_att = self.dropout_1(self_att)

        x = x + self_att

        x2 = self.norm_2(x)
        encoder_decoder_att =  self.attn_2(x2, e_outputs, e_outputs, src_mask)
        encoder_decoder_att = self.dropout_2(encoder_decoder_att)

        x = x + encoder_decoder_att


        x2 = self.norm_3(x)
        feed_forward = self.ff(x2)
        feed_forward = self.dropout_3(feed_forward)

        x = x + feed_forward

        return x

___

In [34]:
def get_clones(module, N):
    return nn.ModuleList([copy.deepcopy(module) for i in range(N)])

___

# Transformer: Encoder and Decoder
<img src="Images/EncoderDecoder.png" width="750">
Image from: http://jalammar.github.io/illustrated-transformer/

In [35]:
class Encoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()
        self.N = N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(EncoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
        
    def forward(self, src, mask):
        x = self.embed(src)
        x = self.pe(x)
        for i in range(N):
            x = self.layers[i](x, mask)
        return self.norm(x)
    
class Decoder(nn.Module):
    def __init__(self, vocab_size, d_model, N, heads):
        super().__init__()   
        self.N=N
        self.embed = Embedder(vocab_size, d_model)
        self.pe = PositionalEncoder(d_model)
        self.layers = get_clones(DecoderLayer(d_model, heads), N)
        self.norm = Norm(d_model)
        
    def forward(self, trg, e_outputs, src_mask, trg_mask):
        x = self.embed(trg)
        x = self.pe(x)
        for i in range(self.N):
            x = self.layers[i](x, e_outputs, src_mask, trg_mask)
        return self.norm(x)
    

In [36]:
class Transformer(nn.Module):
    def __init__(self, src_vocab, trg_vocab, d_model, N, heads):
        super().__init__()   
        self.encoder = Encoder(src_vocab, d_model, N, heads)
        self.decoder = Decoder(trg_vocab, d_model, N, heads)
        self.out = nn.Linear(d_model, trg_vocab)
    def forward(self, src, trg, src_mask, trg_mask):
        e_outputs = self.encoder(src, src_mask)
        d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
        output = self.out(d_output)
        return output

___
___

# Training

Inicialization

In [37]:
d_model = 512
heads = 8
N = 6
src_vocab = len(EN_TEXT.vocab)
trg_vocab = len(ES_TEXT.vocab)

model = Transformer(src_vocab, trg_vocab, d_model, N, heads).cuda()

for p in model.parameters():
    if p.dim() > 1:
        nn.init.xavier_uniform_(p)
        
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9) 



if load_pretrained:
    checkpoint = torch.load(pretrained_model)
    model.load_state_dict(checkpoint['state_dict'])

Translate a sentence (for dev acc): 

In [38]:
def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

def evaluate(model, num_eval = 5000):
    devf = pd.read_csv('OpusCorpus/train.csv',nrows=num_eval)
    model.eval()
    english_sentences= devf['English'].values
    spanish_sentences= devf['Spanish'].values
    max_len = 80
    cc = 0
    ccn = 0
    
    for x in tqdm(range(len(english_sentences))):
        src= english_sentences[x]
        trg = spanish_sentences[x]
        src = tokenize_en(src)
        src= Variable(torch.LongTensor([[EN_TEXT.vocab.stoi[tok] for tok in src]])).cuda()
        

        src_mask = (src != EN_TEXT.vocab.stoi['<pad>']).unsqueeze(-2)
        e_outputs = model.encoder(src, src_mask)
    
        outputs = torch.zeros(max_len).type_as(src.data)
        outputs[0] = torch.LongTensor([ES_TEXT.vocab.stoi['<sos>']])
        
        for i in range(1, max_len):    
            

            trg_mask = np.triu(np.ones((1, i, i)),k=1).astype('uint8')
            trg_mask= Variable(torch.from_numpy(trg_mask) == 0).cuda()
           
            out = model.out(model.decoder(outputs[:i].unsqueeze(0),
            e_outputs, src_mask, trg_mask))
            out = F.softmax(out, dim=-1)
            val, ix = out[:, -1].data.topk(1)

            outputs[i] = ix[0][0]
            if ix[0][0] == ES_TEXT.vocab.stoi['<eos>']:
                break
                               
        r_sentence = ' '.join([ES_TEXT.vocab.itos[ix] for ix in outputs[:i]])
                                   
        cc+=similar(r_sentence,trg)
        ccn+=1
        
    return cc/ccn
                                   
        

Training loop

In [41]:
def train_model(model, epochs, print_every=100, save_every=10000, save_path = 'weights/Model.pytorch'):
    model.train()
    
    for epoch in range(epochs):
        total_loss = 0
        for i, batch in enumerate(train_iter):
            src = batch.English.transpose(0,1)
            trg = batch.Spanish.transpose(0,1)
            print(src.shape)
            print(trg.shape)
            print(type(src))
            #print(src)
            #print(trg)
            #print('src: ' + str(src.size()))
            #print('trg: ' + str(trg.size()))
            
            #@todo: FIX this in data loading
            if trg.size()[1] > 80 or src.size()[1]>80:
                continue
            trg_input = trg[:, :-1]
            targets = trg[:, 1:].contiguous().view(-1)
            src_mask, trg_mask = create_masks(src, trg_input)
            
            preds = model(src, trg_input, src_mask, trg_mask)
            ys = trg[:, 1:].contiguous().view(-1)
            #print('mask training: ' + str(trg_mask.size()))
            
            
            
            optimizer.zero_grad()
            loss = F.cross_entropy(preds.view(-1, preds.size(-1)),ys, ignore_index=ES_TEXT.vocab.stoi['<pad>'])
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
            if (i + 1) % print_every == 0:
                loss_avg = total_loss / print_every
                #dev_score = evaluate(model)
                printTrace("epoch %d, iter = %d, loss = %.3f" % (epoch + 1, i + 1, loss_avg))
                total_loss = 0
            # Save checkpoint
            if (i+1) % save_every == 0:
                torch.save(model, save_path)

In [42]:
if load_pretrained == False:
    train_model(model,10)

torch.Size([10, 33])
torch.Size([10, 37])
<class 'torch.Tensor'>
tensor([[ 55,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  75, 118,  34,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [111, 116,   4, 107, 109,   4,  58,  57,  28,  56,   5,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  80,  67,  65,   2,  69,  49,  19,   2,  61,  76,  12,  97,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [ 14, 115,   9, 117,  31,  25,   2,  20,  74,  62,   2,  48,  42,  51,
          88,  82,   5,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
     

torch.Size([10, 33])
torch.Size([10, 37])
<class 'torch.Tensor'>
tensor([[ 55,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  75, 118,  34,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [111, 116,   4, 107, 109,   4,  58,  57,  28,  56,   5,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  80,  67,  65,   2,  69,  49,  19,   2,  61,  76,  12,  97,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [ 14, 115,   9, 117,  31,  25,   2,  20,  74,  62,   2,  48,  42,  51,
          88,  82,   5,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
     

torch.Size([10, 33])
torch.Size([10, 37])
<class 'torch.Tensor'>
tensor([[ 55,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  75, 118,  34,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [111, 116,   4, 107, 109,   4,  58,  57,  28,  56,   5,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  80,  67,  65,   2,  69,  49,  19,   2,  61,  76,  12,  97,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [ 14, 115,   9, 117,  31,  25,   2,  20,  74,  62,   2,  48,  42,  51,
          88,  82,   5,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
     

torch.Size([10, 33])
torch.Size([10, 37])
<class 'torch.Tensor'>
tensor([[ 55,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  75, 118,  34,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [111, 116,   4, 107, 109,   4,  58,  57,  28,  56,   5,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  80,  67,  65,   2,  69,  49,  19,   2,  61,  76,  12,  97,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [ 14, 115,   9, 117,  31,  25,   2,  20,  74,  62,   2,  48,  42,  51,
          88,  82,   5,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
     

torch.Size([10, 33])
torch.Size([10, 37])
<class 'torch.Tensor'>
tensor([[ 55,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  75, 118,  34,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [111, 116,   4, 107, 109,   4,  58,  57,  28,  56,   5,   1,   1,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [  2,  80,  67,  65,   2,  69,  49,  19,   2,  61,  76,  12,  97,   1,
           1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
           1,   1,   1,   1,   1],
        [ 14, 115,   9, 117,  31,  25,   2,  20,  74,  62,   2,  48,  42,  51,
          88,  82,   5,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,
     

In [None]:
train_model(model,10)

In [25]:
def translate(model, sentence):
        model.eval()
        max_len = 80
        src = tokenize_en(sentence)
        src= Variable(torch.LongTensor([[EN_TEXT.vocab.stoi[tok] for tok in src]])).cuda()
        

        src_mask = (src != EN_TEXT.vocab.stoi['<pad>']).unsqueeze(-2)
        e_outputs = model.encoder(src, src_mask)
        
        outputs = torch.zeros(max_len).type_as(src.data)
        outputs[0] = torch.LongTensor([ES_TEXT.vocab.stoi['<sos>']])
        
        for i in range(1, max_len):    
            

            trg_mask = np.triu(np.ones((1, i, i)),k=1).astype('uint8')
            trg_mask= Variable(torch.from_numpy(trg_mask) == 0).cuda()
           
            out = model.out(model.decoder(outputs[:i].unsqueeze(0),
            e_outputs, src_mask, trg_mask))
            out = F.softmax(out, dim=-1)
            val, ix = out[:, -1].data.topk(1)

            outputs[i] = ix[0][0]
            if ix[0][0] == ES_TEXT.vocab.stoi['<eos>']:
                break
                               
        return ' '.join([ES_TEXT.vocab.itos[ix] for ix in outputs[1:i]])


In [26]:
translate(model,'I am doing a presentation about natual language processing')

'estoy haciendo una presentacion sobre el procesamiento de idiomas'

In [27]:
translate(model,'test')

'pruebas de pruebas de pruebas de pruebas'

In [1]:
#FROM GOOGLE API


from __future__ import division

import re
import sys
import json
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import pyaudio
from six.moves import queue
from googletrans import Translator

# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10)  # 100ms

import os 
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/iker/Documents/SpeachToText-78c991f80cbb.json"


class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""
    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk

        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True

    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=1, rate=self._rate,
            input=True, frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )

        self.closed = False

        return self

    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()

    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue

    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]

            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            yield b''.join(data)


def listen_print_loop(responses, simultaneous_translator,custom_translator, model=None):
    """Iterates through server responses and prints them.

    The responses passed is a generator that will block until a response
    is provided by the server.

    Each response may contain multiple results, and each result may contain
    multiple alternatives; for details, see https://goo.gl/tjCPAU.  Here we
    print only the transcription for the top alternative of the top result.

    In this case, responses are provided for interim results as well. If the
    response is an interim one, print a line feed at the end of it, to allow
    the next result to overwrite it, until the response is a final one. For the
    final one, print a newline to preserve the finalized transcription.
    """
    
    if not custom_translator:
        translator = Translator()
        
    num_chars_printed = 0
    for response in responses:
        if not response.results:
            continue

        # The `results` list is consecutive. For streaming, we only care about
        # the first result being considered, since once it's `is_final`, it
        # moves on to considering the next utterance.
        result = response.results[0]
        if not result.alternatives:
            continue

        # Display the transcription of the top alternative.
        transcript = result.alternatives[0].transcript

        # Display interim results, but with a carriage return at the end of the
        # line, so subsequent lines will overwrite them.
        #
        # If the previous result was longer than this one, we need to print
        # some extra spaces to overwrite the previous result
        overwrite_chars = ' ' * (num_chars_printed - len(transcript))
        #print(transcript)
        
        #print(result.is_final)
        if not result.is_final:
            
            if simultaneous_translator:

                if custom_translator:
                    t= translate(model,transcript)
                else:
                    t = translator.translate(transcript, src='en', dest='es').text
                
                
                
                sys.stdout.write(t + overwrite_chars + '\r')
                sys.stdout.flush()
                
                num_chars_printed = len(t)

            #else:
            #    sys.stdout.write( t + '\r')
            #    sys.stdout.flush()

                
            #num_chars_printed = len(str(t))
        else:
            print('Speech to Tex: ' + transcript + overwrite_chars)
         
            

            if custom_translator:
                print('Translator: ' + translate(model,transcript))
            else:
                print('Translator: ' + translator.translate(transcript, src='en', dest='es').text)

            
            # Exit recognition if any of the transcribed phrases could be
            # one of our keywords.
            if re.search(r'\b(exit|quit)\b', transcript, re.I):
                print('Exiting..')
                break

            num_chars_printed = 0


def main(model, simultaneous_translator=True,custom_translator=True):
    # See http://g.co/cloud/speech/docs/languages
    # for a list of supported languages.
    language_code = 'en-UK'  # a BCP-47 language tag
    
    client = speech.SpeechClient()
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=RATE,
        language_code=language_code)
    streaming_config = types.StreamingRecognitionConfig(
        config=config,
        interim_results=True)

    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        requests = (types.StreamingRecognizeRequest(audio_content=content)
                    for content in audio_generator)

        responses = client.streaming_recognize(streaming_config, requests)
        # Now, put the transcription responses to use.
        listen_print_loop(responses, simultaneous_translator,custom_translator, model)



Could not import the PyAudio C module '_portaudio'.


ImportError: libportaudio.so.2: cannot open shared object file: No such file or directory

In [32]:
main(model, simultaneous_translator=True,custom_translator=False)

KeyboardInterrupt: 

## Examples

- This is a test of simultaneous translation
- Please, say your name
- Hi class, please open the book in chapter one
- Now everybody can attend a class
