## The goal of this notebook is to introduce sequence to sequence language translation (seq2seq) and Attention mechanism.
The notebook deals with a sequence to sequence model for English to Hindi translation. After training the model one will be able to input a English sentence and get back its Hindi translation.

>RNNs are also capable of doing natural language translation, aka. machine translation. It involves two RNNs, one for the source language and one for the target language. One of them is called an encoder, and the other one decoder. The reason is that, the first one encodes the sentence into a vector and the second one converts the encoded vector into a sentence in target language. The decoder is a separete RNN. Given the encoded sentence, it produces the translated sentence in target language. Attention lets the decoder to focus on specific parts of the input sentence for each output word. This helps the input and output sentences to align with one another.

We obtained the dataset used from Kaggle: https://www.kaggle.com/aiswaryaramachandran/hindienglish-corpora

<h2> References: </h2>
<li></a> Sequence to Sequence Learning with Neural Networks (Research Publication)</li>
<li></a> https://www.tensorflow.org/tutorials/text/nmt_with_attention </li>
<li></a> Using stochastic computation graphs formalism for optimization of sequence-to-sequence model (Research Publication) </li>
</ul>

### Import libraries

In [None]:
import os
import string
import numpy as np
import pandas as pd
from string import digits
import matplotlib.pyplot as plt
%matplotlib inline
import re
import logging
import tensorflow as tf
tf.enable_eager_execution()
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
logging.getLogger('tensorflow').setLevel(logging.FATAL)
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split
import unicodedata
import io
import time
import warnings
import sys


for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
PATH = "../input/combination/mixsen_csv.csv"

## Preprocess English and Hindi sentences

In [None]:
def unicode_to_ascii(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn')

def preprocess_sentence(w):
    w = unicode_to_ascii(w.lower().strip())
    w = re.sub(r"([?.!,¿])", r" \1 ", w)
    w = re.sub(r'[" "]+', " ", w)
    w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
    w = w.rstrip().strip()
    return w

def hindi_preprocess_sentence(w):
    w = w.rstrip().strip()
    return w

In [None]:
def create_dataset(path=PATH):
    lines=pd.read_csv(path,encoding='utf-8')
    en = []
    hd = []
    for i, j in zip(lines['english_sentence'], lines['hindi_sentence']):
        print(i)
        print(j)
        if len(i.split(' '))<=20 and len(j.split(' '))<=20:
            en_1 = [preprocess_sentence(w) for w in i.split(' ')]
            en_1.append('<end>')
            en_1.insert(0, '<start>')
            hd_1 = [hindi_preprocess_sentence(w) for w in j.split(' ')]
            hd_1.append('<end>')
            hd_1.insert(0, '<start>')
            en.append(en_1)
            hd.append(hd_1)
    return hd, en

In [None]:
def max_length(tensor):
    return max(len(t) for t in tensor)

### Tokenization of the data

In [None]:
def tokenize(lang):
  lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
  lang_tokenizer.fit_on_texts(lang)
  tensor = lang_tokenizer.texts_to_sequences(lang)
  tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,padding='post')
  return tensor, lang_tokenizer

In [None]:
def load_dataset(path=PATH):
    targ_lang, inp_lang = create_dataset(path)
    input_tensor, inp_lang_tokenizer = tokenize(inp_lang)
    target_tensor, targ_lang_tokenizer = tokenize(targ_lang)
    return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer

In [None]:
input_tensor, target_tensor, inp_lang, targ_lang = load_dataset(PATH)
max_length_targ, max_length_inp = max_length(target_tensor), max_length(input_tensor)

### Create Train and Test dataset

In [None]:
input_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = train_test_split(input_tensor, target_tensor, test_size=0.2)
print(len(input_tensor_train), len(target_tensor_train), len(input_tensor_val), len(target_tensor_val))

In [None]:
def convert(lang, tensor):
  for t in tensor:
    if t!=0:
      print ("%d ----> %s" % (t, lang.index_word[t]))
    
print ("Input Language; index to word mapping")
convert(inp_lang, input_tensor_train[0])
print ()
print ("Target Language; index to word mapping")
convert(targ_lang, target_tensor_train[0])

### Create Dataset
> We are using minimal configuration as the notebbok is not focussed on metrics performance but rather the implementation.

In [None]:
BUFFER_SIZE = len(input_tensor_train)
BATCH_SIZE = 32
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
embedding_dim = 512
units = 512
vocab_inp_size = len(inp_lang.word_index)+1
vocab_tar_size = len(targ_lang.word_index)+1

dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

## Encoder Decoder with Attention Model

> Encoder Decoder with Attention model is a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. It uses a multilayered Gated Recurrent Unit (GRU) to map the input sequence to a vector of a fixed dimensionality, and then another deep GRU to decode the target sequence from the vector.
<img src="https://www.researchgate.net/profile/Vlad_Zhukov2/publication/321210603/figure/fig1/AS:642862530191361@1530281779831/An-example-of-sequence-to-sequence-model-with-attention-Calculation-of-cross-entropy.png" width="800" alt="attention mechanism">

> A sequence to sequence model has two parts – an encoder and a decoder. Both the parts are practically two different neural network models combined into one giant network. the task of an encoder network is to understand the input sequence, and create a smaller dimensional representation of it. This representation is then forwarded to a decoder network which generates a sequence of its own that represents the output. The input is put through an encoder model which gives us the encoder output. Here, each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence. We use Bahdanau attention for the encoder.



### Encoder

In [None]:
class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, state = self.gru(x, initial_state = hidden)
    return output, state

  def initialize_hidden_state(self):
    return tf.zeros((self.batch_sz, self.enc_units))

encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)

### Attention Mechanism

In [None]:
class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    hidden_with_time_axis = tf.expand_dims(query, 1)
    score = self.V(tf.nn.tanh(
        self.W1(values) + self.W2(hidden_with_time_axis)))
    attention_weights = tf.nn.softmax(score, axis=1)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)
    return context_vector, attention_weights

### Decoder

In [None]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    context_vector, attention_weights = self.attention(hidden, enc_output)
    x = self.embedding(x)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
    output, state = self.gru(x)
    output = tf.reshape(output, (-1, output.shape[2]))
    x = self.fc(output)
    return x, state, attention_weights

decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)

### Optimizer

In [None]:
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')

def loss_function(real, pred):
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  loss_ = loss_object(real, pred)
  mask = tf.cast(mask, dtype=loss_.dtype)
#   print(type(mask))
  loss_ *= mask
  return tf.reduce_mean(loss_)

In [None]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

## Training

>1. Pass *input* through *encoder* to get *encoder output*..
>2. Then encoder output, encoder hidden state and the decoder input is passed to decoder.
>3. Decoder returns *predictions* and *decoder hidden state*.
>4. Decoder hidden state is then passed back to model.
>5. Predictions are used to calculate loss.
>6. Use *teacher forcing* (technique where the target word is passed as the next input to the decoder) for the next input to the decoder.
>7. Calculate gradients and apply it to *optimizer* for backpropogation.

In [None]:
@tf.function
def train_step(inp, targ, enc_hidden):
  loss = 0
  with tf.GradientTape() as tape:
    enc_output, enc_hidden = encoder(inp, enc_hidden)
    dec_hidden = enc_hidden
    dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
    # Teacher forcing
    for t in range(1, targ.shape[1]):
      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
      loss += loss_function(targ[:, t], predictions)
      dec_input = tf.expand_dims(targ[:, t], 1)

  batch_loss = (loss / int(targ.shape[1]))
  variables = encoder.trainable_variables + decoder.trainable_variables
  gradients = tape.gradient(loss, variables)
  optimizer.apply_gradients(zip(gradients, variables))      
  return batch_loss

In [None]:
EPOCHS = 30

plot_loss=[]

for epoch in range(EPOCHS):
  start = time.time()
  enc_hidden = encoder.initialize_hidden_state()
  total_loss = 0
  for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
    batch_loss = train_step(inp, targ, enc_hidden)
    total_loss += batch_loss
    if batch % 100 == 0:
        print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                     batch,
                                                     batch_loss.numpy()))
  if (epoch + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)

  print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                      total_loss / steps_per_epoch))
  plot_loss.append(total_loss/steps_per_epoch)
  print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

import matplotlib.pyplot as plt
plt.plot(plot_loss)
plt.legend(['validation'])
plt.show()

In [None]:
def evaluate(sentence):
    attention_plot = np.zeros((max_length_targ, max_length_inp))
    sentence = preprocess_sentence(sentence)
    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
    inputs = tf.keras.preprocessing.sequence.pad_sequences([inputs],
                                                           maxlen=max_length_inp,
                                                           padding='post')
    inputs = tf.convert_to_tensor(inputs)
    result = ''
    hidden = [tf.zeros((1, units))]
    enc_out, enc_hidden = encoder(inputs, hidden)
    dec_hidden = enc_hidden
    dec_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)
    for t in range(max_length_targ):
        predictions, dec_hidden, attention_weights = decoder(dec_input,
                                                             dec_hidden,
                                                             enc_out)
        predicted_id = tf.argmax(predictions[0]).numpy()
        result += targ_lang.index_word[predicted_id] + ' '
        if targ_lang.index_word[predicted_id] == '<end>':
            return result, sentence
        dec_input = tf.expand_dims([predicted_id], 0)
    return result, sentence

In [None]:
from nltk.translate.bleu_score import corpus_bleu

# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

sentences={'Phone':
['Block calls from Jacob Andy right away',
'Call 917 886-5091 on speaker',
 'Edit the time of the USSR time zone reminder to go off every year in December',
 'Connect me to the work number of Michael Smith using speakerphone'
],

'Reminder':
['Create a reminder for every Wednesday at 1:00pm for waking up',
"Go to reminders and find Monica's birthday reminder",
 'Are there any reminders for this Thursday containing the word Manchester united game'
],

'Weather':
["Show this week's weather forecast for Toledo Ohio on the master bedroom TV",
'Tell us the low temp in vegas next friday',
'Show the apparent temperature this afternoon in Libertyville'],

'Launcher':
['Quit all background applications',
'Bixby who called me today']}

translated=[]
actual=['ब्लॉक याकूब एंडी से तुरंत कॉल करें',
        'स्पीकर पर 917 886-5091 पर कॉल करें',
        'यूएसएसआर समय क्षेत्र अनुस्मारक के समय को हर साल दिसंबर में बंद करने के लिए संपादित करें',
        'स्पीकरफ़ोन का उपयोग करके मुझे माइकल स्मिथ के कार्य संख्या से कनेक्ट करें',
        'जागने के लिए हर बुधवार दोपहर 1:00 बजे के लिए एक अनुस्मारक बनाएँ',
        'रिमाइंडर पर जाएं और मोनिका के जन्मदिन की याद दिलाएं',
        'क्या इस गुरुवार के लिए कोई अनुस्मारक है जिसमें मैनचेस्टर एकजुट खेल शब्द है',
        'मास्टर बेडरूम टीवी पर टोलेडो ओहियो के लिए इस सप्ताह के मौसम का पूर्वानुमान दिखाएं',
        'हमें अगले शुक्रवार को वेजेस में कम अस्थायी बताएं',
        'लिबर्टीविले में आज दोपहर को स्पष्ट तापमान दिखाएं',
        'सभी पृष्ठभूमि अनुप्रयोगों से बाहर निकलें',
        'बिक्सबी जिसने मुझे आज बुलाया है'
       ]

def translate(sentence):
    result, sentence = evaluate(sentence)
    print('Input: %s' % (sentence))
    print('Predicted translation: {}'.format(result))
    translated.append(result)


for domians in sentences.keys():
    text=sentences[domians]
    for lines in text:
        translate(lines)


print(corpus_bleu(actual, translated, weights=(1.0, 0, 0, 0)))
print(corpus_bleu(actual, translated, weights=(0.5, 0.5, 0, 0)))
print(corpus_bleu(actual, translated, weights=(0.3, 0.3, 0.3, 0)))
print(corpus_bleu(actual, translated, weights=(0.25, 0.25, 0.25, 0.25)))

In [None]:
translate(u'')