<a href="https://colab.research.google.com/github/lalitpandey02/PythonNotebooks/blob/main/Seq2Seq_Model_for_Machine_Translation_(Practical).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="https://github.com/insaid2018/Term-1/blob/master/Images/INSAID_Full%20Logo.png?raw=true" width="240" height="100" /></center>

# <center><h1>Seq2Seq Model for Neural Machine Translation</center>


---
# **Table of Contents**
---

**1.** [**Introduction to Seq2Seq Model**](#Section1)<br>
**2.** [**Problem Description**](#Section2)<br>
**3.** [**Installing & Importing Libraries**](#Section3)<br>
**4.** [**Data Acquisition & Description**](#Section4)<br>
**5.** [**Data Preprocessing**](#Section5)<br>
**6.** [**Machine Translation Model**](#Section6)<br>
  - **6.1** [**Define Encoder Model**](#Section61)
  - **6.2** [**Define Decoder Model**](#Section62) 
  - **6.3** [**Model Training**](#Section63)
  - **6.4** [**Inference Model**](#Section61)
  - **6.5** [**Making Predictions**](#Section62)

**7.** [**Conclusion**](#Section7)<br>

---
<a name = Section1></a>
# **1. Introduction to Seq2seq model**
---

- The **encoder-decoder model** provides a pattern for using **recurrent neural networks** to address challenging **sequence-to-sequence** prediction problems, such as **machine translation**.

<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/encoder_decoder4.png"width="600" height="230"/></center>

<br>

<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/encoder_decoder5.png"width="900" height="500"/></center>


<br>  


- **Sequence to Sequence** models are a special **class** of **Recurrent Neural** Network architectures typically used (but not restricted) to **solve** complex **Language** related problems like **Machine Translation**, **Question** **Answering**, creating **Chat-bots**, **Text Summarization**, etc.

- Sequence-to-Sequence (Seq2Seq) modelling is about **training** the models that can **convert** sequences from one **domain** to sequences of another **domain**, for example, English to French **Language Transaltion**.

- Encoder-decoder is the **standard** modeling **paradigm** for sequence-to-sequence tasks. This **framework** consists of two **components**:

  - **Encoder** - Reads **source sequence** and **produces** its representation;
  - **Decoder** - Uses **source representation** from the **encoder** to generate the **target** sequence.

<br>  
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/s2s_data/SS10.gif"width="600" height="300"/></center>

<br>  

---
<a name = Section2></a>
# **2. Problem Statement**
---

- The most popular **sequence-to-sequence** task is **translation**. Usually, from **one** natural language to **another**. 

- In the last couple of years, **commercial systems** became surprisingly **good** at **machine translation** for example, **Google Translate**, **Yandex** **Translate**, **DeepL Translator**, **Bing Microsoft Translator**. 

<br>  
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/s2s_data/ss12.png"width="690" height="350"/></center>

<br>  
- In the **machine translation** task, we have an **input** sequence
and an **output** sequence. Translation can be **thought** of as finding the **target** sequence that is the most **probable** given the **input**.

- Formally, the **target** sequence that **maximizes** the conditional **probability** for next **output**.

<br>  
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/s2s_data/ss22.png"width="690" height="300"/></center>

<br>

---
<a name = Section3></a>
# **3. Installing and Importing Libraries**
---

In [None]:
# Checking whether GPU is available or not, to be used with tensorflow.
import tensorflow as tf 
device_name = tf.test.gpu_device_name() 
if device_name != '/device:GPU:0': raise SystemError('GPU device not found') 
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
import numpy as np
from __future__ import print_function
import urllib

- The **dataset** used in the example involves short **Spanish** and **English** sentence **pairs**.

- The dataset is called **Tab-delimited Bilingual Sentence Pairs** and is part of the **Tatoeba Project** and listed on the **ManyThings.org** site for helping English as a Second Language students.

- The problem is framed as a **sequence prediction problem** where input sequences of characters are in **English** and output sequences of characters are in **Spanish**.

In [None]:
# Importing the dataset from github.

response = urllib.request.urlopen('https://raw.githubusercontent.com/insaid2018/DeepLearning/master/Data/spa.txt')
lines = response.readlines()

text = []
for sent in lines:
    text.append(sent.decode('utf8'))
    
lines = text

In [None]:
# Checking a few samples from the dataset.
print(lines[0])
print(lines[1])
print(lines[2500])

Go.	Ve.

Go.	Vete.

We're going.	Vamos a ir.



In [None]:
type(lines)

list

In [None]:
# Checking the number of samples in the dataset.
len(lines)

123376

---
<a name = Section5></a>
# **5. Data Preprocessing**
---

In [None]:
batch_size = 32           # Batch size for training.
epochs = 100              # Number of epochs to train for.
latent_dim = 256          # Latent dimensionality of the encoding space.
num_samples = 10000       # Number of samples to train on.

- **Vectorizing** the **data**.

In [None]:
# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()

- Creating two list input_texts and target_texts containing **all** the **inputs** and **targets**.

- Creating two sets **input_characters** and **target_characters** containing the **unique** characters **present** in the input and the target respectively.

In [None]:
for line in lines[: min(num_samples, len(lines) - 1)]:
    input_text, target_text = line.split('\t')
    
    # We use "tab" as the "start sequence" character for the targets, and "\n" as "end sequence" character.
    target_text = '\t' + target_text + '\n'
    
    input_texts.append(input_text)
    target_texts.append(target_text)
    
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
    
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)

In [None]:
len(target_characters)

85

- Converting the sets **`input_characters`** and **`target_characters`** into **lists** and **sorting** them.

In [None]:
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))

- Creating variables **`num_encoder_tokens`** and **`num_decoder_tokens`** having value equal to the number of **unique** values present in the sets **input_characters** and **target_characters** respectively.

In [None]:
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)

In [None]:
print(num_encoder_tokens)
print(num_decoder_tokens)

69
85


In [None]:
### Creating variables max_encoder_seq_length and max_decoder_seq_length having values equal to the length of longest sequences present in the input and the target respectively.
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

In [None]:
print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)

Number of samples: 10000
Number of unique input tokens: 69
Number of unique output tokens: 85
Max sequence length for inputs: 16
Max sequence length for outputs: 43


In [None]:
"""Creating dictionaries input_token_index and 
target_token_index which have unique characters present 
in the input and the target as keys and
 their position as the values respectively."""
 
input_token_index = dict([(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict([(char, i) for i, char in enumerate(target_characters)])

In [None]:
input_token_index

{' ': 0,
 '!': 1,
 '$': 2,
 "'": 3,
 ',': 4,
 '-': 5,
 '.': 6,
 '0': 7,
 '1': 8,
 '2': 9,
 '3': 10,
 '4': 11,
 '5': 12,
 '6': 13,
 '7': 14,
 '8': 15,
 '9': 16,
 ':': 17,
 '?': 18,
 'A': 19,
 'B': 20,
 'C': 21,
 'D': 22,
 'E': 23,
 'F': 24,
 'G': 25,
 'H': 26,
 'I': 27,
 'J': 28,
 'K': 29,
 'L': 30,
 'M': 31,
 'N': 32,
 'O': 33,
 'P': 34,
 'Q': 35,
 'R': 36,
 'S': 37,
 'T': 38,
 'U': 39,
 'V': 40,
 'W': 41,
 'Y': 42,
 'a': 43,
 'b': 44,
 'c': 45,
 'd': 46,
 'e': 47,
 'f': 48,
 'g': 49,
 'h': 50,
 'i': 51,
 'j': 52,
 'k': 53,
 'l': 54,
 'm': 55,
 'n': 56,
 'o': 57,
 'p': 58,
 'q': 59,
 'r': 60,
 's': 61,
 't': 62,
 'u': 63,
 'v': 64,
 'w': 65,
 'x': 66,
 'y': 67,
 'z': 68}

In [None]:
target_token_index

{'\t': 0,
 '\n': 1,
 ' ': 2,
 '!': 3,
 '"': 4,
 "'": 5,
 ',': 6,
 '-': 7,
 '.': 8,
 '0': 9,
 '1': 10,
 '2': 11,
 '3': 12,
 '4': 13,
 '5': 14,
 '6': 15,
 '7': 16,
 '8': 17,
 ':': 18,
 '?': 19,
 'A': 20,
 'B': 21,
 'C': 22,
 'D': 23,
 'E': 24,
 'F': 25,
 'G': 26,
 'H': 27,
 'I': 28,
 'J': 29,
 'K': 30,
 'L': 31,
 'M': 32,
 'N': 33,
 'O': 34,
 'P': 35,
 'Q': 36,
 'R': 37,
 'S': 38,
 'T': 39,
 'U': 40,
 'V': 41,
 'W': 42,
 'Y': 43,
 'a': 44,
 'b': 45,
 'c': 46,
 'd': 47,
 'e': 48,
 'f': 49,
 'g': 50,
 'h': 51,
 'i': 52,
 'j': 53,
 'k': 54,
 'l': 55,
 'm': 56,
 'n': 57,
 'o': 58,
 'p': 59,
 'q': 60,
 'r': 61,
 's': 62,
 't': 63,
 'u': 64,
 'v': 65,
 'w': 66,
 'x': 67,
 'y': 68,
 'z': 69,
 '¡': 70,
 '«': 71,
 '»': 72,
 '¿': 73,
 'Á': 74,
 'É': 75,
 'Ó': 76,
 'Ú': 77,
 'á': 78,
 'é': 79,
 'í': 80,
 'ñ': 81,
 'ó': 82,
 'ú': 83,
 'ü': 84}

In [None]:

"""    Creating 3 matrices containing only zeroes: encoder_input_data, decoder_input_data, decoder_target_data.

    The shape of each matrix is equal to (len(input_texts), max_seq_length, num_tokens).

    max_seq_length and num_tokens have different values for encoder and decoder."""


encoder_input_data = np.zeros((len(input_texts), max_encoder_seq_length, num_encoder_tokens), dtype='float32')
decoder_input_data = np.zeros((len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype='float32')
decoder_target_data = np.zeros((len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype='float32')

In [None]:
print(encoder_input_data.shape)
print(decoder_input_data.shape)
print(decoder_target_data.shape)

(10000, 16, 69)
(10000, 43, 85)
(10000, 43, 85)


- **Replacing** the **zeroes** in the above matrices with **1** based on whether a character is **present** at that location or not.


- This is similar to **one-hot-encoding** the features in **3** dimensions.

In [None]:
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.
    
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.
        
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.

In [None]:
encoder_input_data

array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0.

In [None]:
decoder_input_data

array([[[1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0.

In [None]:
decoder_target_data

array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0.

**Observation:**

 - **Input Sequences**: Padded to a **maximum length** of **16** characters with a **vocabulary** of **69** different characters **(10000, 16, 69)**.
 
 - **Output Sequences**: Padded to a **maximum length** of **43** characters with a **vocabulary** of **85** different characters **(10000, 43, 85)**.

---
<a name = Section6></a>
# **6. Machine Translation Model with Attention Mechanism**
---

- We'll be using the following **process sequence** in this notebook:

<br> 
<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/img0.png"width="690" height="350"/></center>

<a name = Section11></a>
### **6.1 Define Encoder Model**

- The **input** to the **encoder** is a sequence of **characters**, each encoded as **one-hot** vectors with length of **`num_encoder_tokens`**.

In [None]:
encoder_inputs = Input(shape=(None, num_encoder_tokens)) # Define an input sequence and process it.

In [None]:
encoder = LSTM(latent_dim, return_state=True)

encoder_outputs, state_h, state_c = encoder(encoder_inputs) 

#This returns the hidden state output returned by LSTM layers

#as well as the hidden and cell state for all cells in the layer. 

In [None]:
state_h.shape

TensorShape([None, 256])

In [None]:
state_c.shape

TensorShape([None, 256])

- We **discard `encoder_outputs`** and only keep the **states**.

In [None]:
encoder_states = [state_h, state_c]

In [None]:
encoder_states

[<KerasTensor: shape=(None, 256) dtype=float32 (created by layer 'lstm_2')>,
 <KerasTensor: shape=(None, 256) dtype=float32 (created by layer 'lstm_2')>]

<a name = Section11></a>
### **6.2 Define Decoder Model**

- The decoder **input** is defined as a **sequence** of **Spanish** characters one-hot **encoded** to binary vectors with a length of **`num_decoder_tokens`**.

- The final **hidden** and **cell states** are ignored and only the **output** sequence of **hidden states** is referenced.

- Importantly, the final **hidden** (state_h) & **cell** (state_c) state from the **encoder** is used to **initialize** the state of the **decoder**. 
 
 - This means every time that the **encoder** model encodes an **input** sequence, the final **internal** states of the encoder model are **used** as the **starting point** for **outputting** the first character in the **output** sequence. 
  
 
 - This also means that the **encoder** and **decoder** layers must have the same **number** of cells, in this case, **256**.


- A **Dense** output layer is used to **predict** each **character**.

 - This **Dense** layer is used to produce each **character** in the **output sequence** in a **one-shot** manner, rather than recursively, at **least** during training. 

In [None]:
decoder_inputs = Input(shape=(None, num_decoder_tokens))

In [None]:


""""
    We set up our decoder to return full output sequences, and to return internal states as well.

    We don't use the return states in the training model, but we will use them in inference.

    Set up the decoder, using encoder_states as initial state.

"""

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

<a name = Section11></a>
### **6.3 Model Training**

- Define the **model** that will turn `encoder_input_data` & `decoder_input_data` to `decoder_target_data`.

In [None]:
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

In [None]:
# Compiling the model.
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

In [None]:
# Run training.
model.fit([encoder_input_data, decoder_input_data], decoder_target_data, 
          batch_size=batch_size, epochs=epochs, validation_split=0.2, verbose = 2)

Epoch 1/100
250/250 - 10s - loss: 1.2800 - val_loss: 1.4369 - 10s/epoch - 41ms/step
Epoch 2/100
250/250 - 2s - loss: 1.2184 - val_loss: 1.4150 - 2s/epoch - 9ms/step
Epoch 3/100
250/250 - 2s - loss: 1.1928 - val_loss: 1.3778 - 2s/epoch - 7ms/step
Epoch 4/100
250/250 - 2s - loss: 1.1620 - val_loss: 1.3286 - 2s/epoch - 7ms/step
Epoch 5/100
250/250 - 2s - loss: 1.1267 - val_loss: 1.3518 - 2s/epoch - 7ms/step
Epoch 6/100
250/250 - 2s - loss: 1.0920 - val_loss: 1.2984 - 2s/epoch - 10ms/step
Epoch 7/100
250/250 - 2s - loss: 1.0562 - val_loss: 1.2373 - 2s/epoch - 7ms/step
Epoch 8/100
250/250 - 2s - loss: 1.0267 - val_loss: 1.1687 - 2s/epoch - 7ms/step
Epoch 9/100
250/250 - 2s - loss: 1.0028 - val_loss: 1.1587 - 2s/epoch - 7ms/step
Epoch 10/100
250/250 - 2s - loss: 0.9793 - val_loss: 1.1090 - 2s/epoch - 7ms/step
Epoch 11/100
250/250 - 2s - loss: 0.9596 - val_loss: 1.1000 - 2s/epoch - 7ms/step
Epoch 12/100
250/250 - 2s - loss: 0.9433 - val_loss: 1.0942 - 2s/epoch - 7ms/step
Epoch 13/100
250/250 

<keras.callbacks.History at 0x7f16c5aa1a90>

- Creating a `weights` directory.

In [None]:
!mkdir weights

mkdir: cannot create directory ‘weights’: File exists


- **Saving** the **model** in the `weights` directory.

In [None]:
model.save('weights/s2s.h5')

<a name = Section11></a>
### **6.4 Inference Models**



<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/img7.png"width="690" height="350"/></center>

- Once the defined model is **fit**, it can be used to make **predictions**. 

- The **model** defined for training has learned **weights** for this operation, but the **structure** of the model is **not** designed to be called **recursively** to generate one **character** at a time.


- Instead, new **models** are required for the **prediction** step:

  - Specifically a model for **encoding English** input sequences of **characters**.

  - A model that takes the sequence of **Spanish characters** generated so far and the **encoding** as **input** and **predicts** the next **character** in the sequence.

- Defining the **inference** models requires **reference** to elements of the model used for **training** in the example. 

- Alternately, one could **define** a new model with the **same shapes** and **load** the **weights** from file.

- The **encoder** model is defined as taking the **input layer** from the encoder in the trained model (**`encoder_inputs`**) and outputting the **hidden** and **cell state** tensors (**`encoder_states`**).

#### Process of applying Inference Model

1. **Encode** input and **retrieve** initial decoder state.
  
2. **Run** one step of decoder with this **initial state** and a start of **sequence** token as **target**.

  - **Output** will be the **next target token**.
  
3. **Repeat** with the current **target** token and current states.

In [None]:
# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)

In [None]:
### The decoder requires the hidden and cell states from the encoder as the initial state of the newly defined encoder model. 

decoder_state_input_h = Input(shape=(latent_dim,))

decoder_state_input_c = Input(shape=(latent_dim,))

In [None]:
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)

In [None]:
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

- **Reverse-lookup** token index to **decode sequences** back to something readable.

In [None]:
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

**Observation:**

- Both the **encoder** and **decoder** will be called **recursively** for each character that is to be generated in the **translated** sequence.

- On the first call, the **hidden** and **cell** states from the encoder will be used to **initialize** the **decoder** LSTM layer, provided as **input** to the model directly.


In [None]:
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

**Observations:**
 - On subsequent recursive calls to the decoder, the **last hidden** and **cell state** must be provided to the model. 
 
  - These **state** values are already within the **decoder**. 

  - We must **re-initialize** the **state** on each call given the way that the model was **defined** in order to take the **final** states from the **encoder** on the first call.

 - Therefore, the **decoder** must **output** the hidden and cell **states** along with the **predicted character** on each call, so that these states can be assigned to a **variable** and used on each **subsequent** recursive call for a given **input** sequence of **English** text to be translated.

<a id=section605></a>
### **6.5 Making Predictions**

- Making **predictions** on the training set.

<center><img src="https://raw.githubusercontent.com/insaid2018/DeepLearning/master/images/img8.png"length="350"width="600"/></center>

In [None]:
for seq_index in range(100):
    # Take one sequence (part of the training set) for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

-
Input sentence: Go.
Decoded sentence: Se a una.

-
Input sentence: Go.
Decoded sentence: Se a una.

-
Input sentence: Go.
Decoded sentence: Se a una.

-
Input sentence: Go.
Decoded sentence: Se a una.

-
Input sentence: Hi.
Decoded sentence: Ella.

-
Input sentence: Run!
Decoded sentence: ¡Correa.

-
Input sentence: Run!
Decoded sentence: ¡Correa.

-
Input sentence: Run!
Decoded sentence: ¡Correa.

-
Input sentence: Run!
Decoded sentence: ¡Correa.

-
Input sentence: Run.
Decoded sentence: ¡Corra.

-
Input sentence: Who?
Decoded sentence: ¿Quine esto.

-
Input sentence: Wow!
Decoded sentence: ¡Vara a lo canar.

-
Input sentence: Fire!
Decoded sentence: ¡Despara eso esos.

-
Input sentence: Fire!
Decoded sentence: ¡Despara eso esos.

-
Input sentence: Fire!
Decoded sentence: ¡Despara eso esos.

-
Input sentence: Help!
Decoded sentence: ¡Socondo.

-
Input sentence: Help!
Decoded sentence: ¡Socondo.

-
Input sentence: Help!
Decoded sentence: ¡Socondo.

-
Input sentence: Jump!
Decoded sen

KeyboardInterrupt: ignored

- **Validate** the **model** using our own example:

  - Let's have the model translate something simple, like:

    **"How are you?"**

- Creating a variable **`input_sentence`** consisting of the **input sentence** to be translated.


- Creating **`test_sentence_tokenized`**, which is a 3 **dimensional** matrix of **zeroes** and filling zeroes with **1** if character is **present** at that location.

In [None]:
input_sentence = "Cheer up!"
test_sentence_tokenized = np.zeros((1, max_encoder_seq_length, num_encoder_tokens), dtype='float32')
for t, char in enumerate(input_sentence):
    test_sentence_tokenized[0, t, input_token_index[char]] = 1.
print(input_sentence)
print(decode_sequence(test_sentence_tokenized))

Cheer up!
¡entiente a cosa a casa.



----

<a id=section7></a>
# **7. Conclusion**
----

- Sequence to Sequence Model is used in various NLP Tasks like

  - **Speech Recognition**

  - **Machine Language Translation**

  - **Name entity/Subject extraction**

  - **Relation Classification**

  - **Path Query Answering**

  - **Speech Generation**

  - **Chatbot**

  - **Text Summarization**

  - **Product Sales Forecasting**

- The encoder **compressed** the whole source **sentence** into a **single** vector. This can very hard - the number of **possible meanings** of source is **infinite**. When the **encoder** is forced to put all **information** into a single vector, it is likely to **forget** something.

- - Encoder's final **hidden states**, along with the **start-of-sequence** character, were used as **input** for the **decoder**.

- Each **predicted** character was then **fed** back into the **decoder** while the hidden states were updated. 

- We **repeated** this until the **decoder** predicted the **end-of-sequence** character **telling** us the **predicted** sequence is complete.

- Techniques like **Beam Search**, **Attention Mechanism** can be used to improve performence of Seq2seq Models.