# Machine Translation using Sequence to Sequence LSTM networks

Machine translation refers to the use of machines or software to translate text to speech or speech from one language to another language.

In this assignment we will work out the demonstration of how we can apply the LSTM networks to translate speech from one language to another.

We will be translating sequences from English to Chinese.

## Datasets
We will be using this dataset. https://www.manythings.org/anki/cmn-eng.zip.

Few samples of the dataset looks as follows:

**$ENGLISH \hspace{10mm} CHINESE$**

$Go. \hspace{30mm} 走 $

$Run! \hspace{30mm} 跑!$

$Fire! \hspace{30mm} A火！$

$Help! \hspace{30mm} 救命!$

$Jump. \hspace{30mm} 跳.$

$Stop! \hspace{30mm} 停止!$

We can see that on the left column, we have a list of english sequences like, GO, RUN, FIRE, HELP, etc and on the right we have their respective CHINESE tranlsations.

Here, the input to the model will be the list of English sentences and the target will be the list of Chinese translations.

Here in the dataset,

We will follow the following steps duing the machine translation:

1. Preprocess the training sequenes
2. Develop sequence to sequence LSTM model
3. Train LSTM model
4. Evalute model by testing the results

### Importing necessary libraries

In [1]:
import re
import string
import random
import numpy as np
import pandas as pd
import torch
from torch import nn
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

In [2]:
# from google.colab import drive
# drive.mount('/content/drive')

Saving cmn.txt to cmn (1).txt


Let's read the dataset from the directory.

### Reading the datasets

In [3]:
# lines = pd.read_table('/content/drive/MyDrive/Colab Notebooks/cmn.txt', names=['eng', 'chin', 'info'])
lines = pd.read_table('./cmn.txt', names=['eng', 'chin', 'info'])
lines

Unnamed: 0,eng,chin,info
0,Hi.,嗨。,CC-BY 2.0 (France) Attribution: tatoeba.org #5...
1,Hi.,你好。,CC-BY 2.0 (France) Attribution: tatoeba.org #5...
2,Run.,你用跑的。,CC-BY 2.0 (France) Attribution: tatoeba.org #4...
3,Wait!,等等！,CC-BY 2.0 (France) Attribution: tatoeba.org #1...
4,Wait!,等一下！,CC-BY 2.0 (France) Attribution: tatoeba.org #1...
...,...,...,...
15852,When did she promise to meet him?,她答应几时见他？,CC-BY 2.0 (France) Attribution: tatoeba.org #3...
15853,When did you change your address?,你什麼時候更改了你的地址?,CC-BY 2.0 (France) Attribution: tatoeba.org #6...
15854,When did your baby start talking?,你的寶寶，什麼時候開始說話的？,CC-BY 2.0 (France) Attribution: tatoeba.org #3...
15855,When was this university founded?,这所大学是什么时候建的？,CC-BY 2.0 (France) Attribution: tatoeba.org #5...


We are concerned with only english to chinese translation, so we choose these two columns.
### Exercise 1
#### Task 1
<b><div style="text-align: right">[POINTS: 2]</div></b>
* Select the English and Chinese translation columns
* Select the first 10,000 samples and store it on `lines`

In [4]:
### Ex-1-Task-1
# lines = None

# Select `eng` and `chin` columns from the table
# take only first 10,000 samples to train

### BEGIN SOLUTION
# your code here
lines = lines[["eng","chin"]][:10000]
# raise NotImplementedError
### END SOLUTION

In [5]:
assert lines is not None

In [6]:
# displaying shape of the training set
lines.shape

(10000, 2)

### Data-Preprocessing

We will follow the following preprocessing steps in order to clean the dataset and fit into model training.

#### Lowercasing

In [7]:
# lowercase inputs on the columns
lines.eng = lines.eng.apply(lambda x: x.lower())
lines.chin = lines.chin.apply(lambda x: x.lower())

### Exercise 2

Above, we just performed lowercasing of the input samples. Lowercasing is the first operation we performed. Now, we will perform other operations like: `Removing Quotes`, `Removing Special Characters`, `Removing Uneven Spaces`, `Adding <START> and <END> tokens`, etc. Perform similar implementations like that mentioned above.

#### Removing Quotes
<b><div style="text-align: right">[POINTS: 3]</div></b>
#### Task 1
<b><div style="text-align: right">[POINTS: 1]</div></b>

In [8]:
### Ex-2-Task-1

# remove all the quotes "'" from the columns

# lines.eng = None
# lines.chin = None

# Exercise 2 | Task 1
### BEGIN SOLUTION
# your code here
lines["eng"] = lines["eng"].apply(lambda x: re.sub("'", '', x))
lines["chin"] = lines["chin"].apply(lambda x: re.sub("'", '', x))
# raise NotImplementedError
### END SOLUTION

In [9]:
assert lines.eng is not None
assert lines.chin is not None


#### Removing Special Characters

In [10]:
# Set of all special characters
sets_of_punctuations = set(string.punctuation)

# Removing sets of all special characters from the inputs
lines.eng = lines.eng.apply(lambda x: ''.join(char for char in x if char not in sets_of_punctuations))
lines.chin = lines.chin.apply(lambda x: ''.join(char for char in x if char not in sets_of_punctuations))

#### Removing Uneven Spaces
#### Task 2
<b><div style="text-align: right">[POINTS: 1]</div></b>

In [11]:
### Ex-2-Task-2
# lines.eng = None
# lines.chin = None

# There may be uneven spaces in the inputs
# We have to remove the extra spaces too

# Exercise 2 | Task 2
### BEGIN SOLUTION
# your code here
lines["eng"] = lines["eng"].apply(lambda x:re.sub("\s+"," ",x))
lines["chin"] = lines["chin"].apply(lambda x:re.sub("\s+"," ",x))

# raise NotImplementedError
### END SOLUTION

In [12]:
assert lines.eng is not None
assert lines.chin is not None


#### Adding `<START>` and `<END>` Tokens
E.g.
'Hi' = '`<START>` Hi `<END>`'
#### Task 3
<b><div style="text-align: right">[POINTS: 1]</div></b>
We will perform this operations over Chinese column samples only because we are converting English sequences to chinese only for now.

In [13]:
### Ex-2-Task-3
# lines.chin = None

# Adding <START> and <END> tokens with trailing spaces

# Exercise 2 | Task 3
### BEGIN SOLUTION
# your code here
lines.chin = lines.chin.apply(lambda x: "<START> "+x+" <END>")
# raise NotImplementedError
### END SOLUTION

In [14]:
assert lines.chin is not None


Now, our next task is to create a list of vocabularies of English and Chinese Inputs.

Following code will tokenize the words present in the English and Chinese dataset that we use to train the model

### Tokenizing

Tokenizing the English and the Chinese words in to set `all_english_vocabs` and `all_chinese_vocabs`.

In [15]:
# Collect English Vocabs
all_english_vocabs = set()
for english in lines.eng:
    words = english.split()
    for word in words:
        if word not in all_english_vocabs:
            all_english_vocabs.add(word)

In [16]:
# Collect Chinese Vocabs
all_chinese_vocabs = set()
for chinese in lines.chin:
    words = chinese.split()
    for word in words:
        if word not in all_chinese_vocabs:
            all_chinese_vocabs.add(word)

Let's implement the following codes to find the maximum sequence length of input.

In [17]:
# Max Length of input sequence
sequence_length = []
for line in lines.eng:
    sequence_length.append(len(line.split(' ')))
max_length_inp = np.max(sequence_length)
print(max_length_inp)

8


In [18]:
# Max Length of target sequence
sequence_length = []
for line in lines.chin:
    sequence_length.append(len(line.split(' ')))
max_length_targ = np.max(sequence_length)
max_length_targ

5

With this, we can see that the maximum input sequence is 8 and the maximum target sequence is 5.

### Exercise 3
<b><div style="text-align: right">[POINTS: 2]</div></b>
#### Task 1
<b><div style="text-align: right">[POINTS: 1]</div></b>
Sort and store the tokenized English and Chinese words on the variables `input_words` and `target_words`

In [19]:
### Ex-3-Task-1

# input_words = None
# target_words = None

# Sorting and Storing the tokens of English and Chinese words

# Exercise 3
### BEGIN SOLUTION
# your code here
input_words = sorted(all_english_vocabs)
target_words = sorted(all_chinese_vocabs)
# raise NotImplementedError
### END SOLUTION

In [20]:
assert input_words is not None
assert target_words is not None


#### Task 2
<b><div style="text-align: right">[POINTS: 1]</div></b>
Since, we are performing Machine translation, we have an encoder and decoder kind of architecture. We will have the encoder architecture as following:

<div align="center">
<figure>
<img src="https://doc.google.com/a/fusemachines.com/uc?id=1voHxN0hllGSLfyPJSy6tI_hzTNO6hHRl" >
<figcaption>Figure 1. Machine Translation
</figcaption>
</figure>
</div>

Here, the green denoted LSTM cells represent the encoder part and the red LSTM cells represent the decoder part of a Machine Translation network.

In [21]:
### Ex-3-Task-2
# counting the total tokens of English and Chinese words
num_encoder_tokens = None
num_decoder_tokens = None
### BEGIN SOLUTION
# your code here
num_encoder_tokens = len(input_words)
num_decoder_tokens = len(target_words)
# raise NotImplementedError
### END SOLUTION
print(num_encoder_tokens, num_decoder_tokens)

3380 9023


In [22]:
# For zero padding we add one extra token
num_decoder_tokens += 1
num_encoder_tokens += 1
num_encoder_tokens, num_decoder_tokens

(3381, 9024)

In [23]:
# compute and store the tokens with index in dictionary as word, index format
input_token_index = dict([(word, i + 1) for i, word in enumerate(input_words)])
target_token_index = dict([(word, i + 1) for i, word in enumerate(target_words)])

In [24]:
# compute and store the tokens with index in dictionary as index, word format
reverse_input_char_index = dict((i, word) for word, i in input_token_index.items())
reverse_target_char_index = dict((i, word) for word, i in target_token_index.items())

In [25]:
# shuffling the lines to make better predictions
lines = shuffle(lines)
lines.head(5)

Unnamed: 0,eng,chin
5252,get a grip on yourself,<START> 冷静下来！ <END>
9217,toms name was on the list,<START> 湯姆的名字在名單上。 <END>
2969,that river is long,<START> 那條河流很長。 <END>
6219,i think tom is talented,<START> 我認為湯姆有才能。 <END>
3692,we were born to die,<START> 我们是为了死亡而诞生的。 <END>


### Train-Test Split

In [26]:
# Train - Test Split
X, y = lines.eng, lines.chin
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1)
X_train.shape, X_test.shape

((9000,), (1000,))

Following code is to generate batch of training and testing data. If you are interested in the code you can go line by line and explore the details.

In [27]:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
    '''Function to generate a batch of data '''
    for j in range(0, len(X), batch_size):
        encoder_input_data = np.zeros((max_length_inp, batch_size),dtype='float32')
        decoder_input_data = np.zeros((max_length_targ, batch_size),dtype='float32')
        decoder_target_data = np.zeros((max_length_targ, batch_size ,num_decoder_tokens),dtype='float32')
        for i, (input_text, target_text) in enumerate(zip(X[j:j+batch_size], y[j:j+batch_size])):
            for t, word in enumerate(input_text.split()):
                encoder_input_data[t, i] = input_token_index[word] # encoder input seq
            for t, word in enumerate(target_text.split()):
                if t<len(target_text.split())-1:
                    decoder_input_data[t, i] = target_token_index[word] # decoder input seq
                if t>0:
                    # decoder target sequence (one hot encoded)
                    # does not include the START_ token
                    # Offset by one timestep
                    decoder_target_data[t-1, i , target_token_index[word]] = 1.
        yield([encoder_input_data, decoder_input_data], decoder_target_data)

In [28]:
# Input to the Encoder
encoder_input_data = np.zeros((len(lines.eng), 9),dtype='float32')

# output from the encoder or input to the decoder
decoder_input_data = np.zeros((len(lines.chin), 5),dtype='float32')

# output by the decoder
decoder_target_data = np.zeros((len(lines.chin), 5, num_decoder_tokens),dtype='float32')

In [29]:
for i, (input_text, target_text) in enumerate(zip(lines.eng, lines.chin)):
    for t, word in enumerate(input_text.split()):
        encoder_input_data[i, t] = input_token_index[word]
    for t, word in enumerate(target_text.split()):
        decoder_input_data[i, t] = target_token_index[word]
        if t > 0:
            # decoder target data is ahead of decoder input by one timestep
            decoder_target_data[i, t - 1, target_token_index[word]] = 1.

### Encoder - Decoder Model Architecture

In [30]:
latent_dim = 50

### Exercise 4
<b><div style="text-align: right">[POINTS: 4]</div></b>
#### Task 1
<b><div style="text-align: right">[POINTS: 1]</div></b>
Store the hidden state and context vector as a result of encoder outputs on variable `encoder_states`.

Store in the form of [__hiddenstate__, __contextstate__]

In [31]:
### Ex-4-Task-1
# encoder_states = None

# Encoder Architecture
class Encoder(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size):
        super(Encoder, self).__init__()
        self.input_size = input_size

        self.embedding_size = embedding_size

        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(self.input_size, self.embedding_size)

        self.LSTM = nn.LSTM(self.embedding_size, self.hidden_size)

    def forward(self, x):
        embedding = self.embedding(x)
        outputs, (hidden_state, cell_state) = self.LSTM(embedding)

        ### BEGIN SOLUTION
        # your code here
        encoder_states = [hidden_state,cell_state]
        # raise NotImplementedError
        ### END SOLUTION
        return encoder_states

In [32]:
# Intentionally left blank

In [33]:
encoder = Encoder(num_encoder_tokens, latent_dim, latent_dim)
print(encoder)

Encoder(
  (embedding): Embedding(3381, 50)
  (LSTM): LSTM(50, 50)
)


In [34]:
class Decoder(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size, output_size):
        super(Decoder, self).__init__()

        # Size of the one hot vectors that will be the input to the decoder
        self.input_size = input_size

        # Output size of the word embedding NN
        self.embedding_size = embedding_size

        # Dimension of the NN's inside the lstm cell/ (hs,cs)'s dimension.
        self.hidden_size = hidden_size

        # Size of the one hot vectors that will be the output of the decoder
        self.output_size = output_size

        self.embedding = nn.Embedding(self.input_size, self.embedding_size)
        self.LSTM = nn.LSTM(self.embedding_size, hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, x, enc_states):
        x = x.unsqueeze(0)
        embedding = self.embedding(x)

        # (passing encoder's hs, cs - context vectors)
        outputs, (hidden_state, cell_state) = self.LSTM(embedding, enc_states)

        predictions = self.fc(outputs)

        predictions = predictions.squeeze(0)

        decoder_states = (hidden_state, cell_state)

        return predictions, decoder_states

In [35]:
decoder = Decoder(num_decoder_tokens, latent_dim, latent_dim, num_decoder_tokens)
print(decoder)

Decoder(
  (embedding): Embedding(9024, 50)
  (LSTM): LSTM(50, 50)
  (fc): Linear(in_features=50, out_features=9024, bias=True)
)


In [36]:
class Seq2Seq(nn.Module):
    def __init__(self, Encoder_LSTM, Decoder_LSTM):
        super(Seq2Seq, self).__init__()
        self.Encoder_LSTM = Encoder_LSTM
        self.Decoder_LSTM = Decoder_LSTM

    def forward(self, source, target, tfr=0.5):
        batch_size = source.shape[1]

        target_len = target.shape[0]
        target_vocab_size = num_decoder_tokens

        outputs = torch.zeros(target_len, batch_size, target_vocab_size)

        hidden_state, cell_state = self.Encoder_LSTM(source)

        x = target[0]

        for i in range(1, target_len):
            output, ( hidden_state, cell_state ) = self.Decoder_LSTM(x, (hidden_state, cell_state))
            outputs[i] = output
            best_guess = output.argmax(1) # 1st dimension is word embedding, 0th dimension is batchsize
            x = target[i] if random.random() < tfr else best_guess # Either pass the next word correctly from the dataset or use the earlier predicted word

        return outputs

In [37]:
# Hyperparameters

learning_rate = 0.001
step = 0

model = Seq2Seq(encoder, decoder)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

criterion = nn.CrossEntropyLoss()

#### Task 2
<b><div style="text-align: right">[POINTS: 3]</div></b>
Increase the number of epochs and train the model to obtain a good accuracy score.

In [38]:
# Some model hyperparameters
train_samples = len(X_train)
val_samples = len(X_test)
batch_size = 256
num_epochs = 10

In [39]:
# Training the model
epoch_loss = 0.0
best_loss = 999999
losses = []
best_epoch = -1
ts1  = []

for epoch in range(num_epochs):
    epoch_loss_list = []
    print("Epoch - {} / {}".format(epoch+1, num_epochs))


    model.train(True)
    for batch_idx, ( input_data, target_data ) in enumerate(generate_batch(batch_size=batch_size)):
        input_data_enc = torch.tensor(input_data[0]).long()
        input_data_dec = torch.tensor(input_data[1]).long()
        target = torch.tensor(target_data.argmax(2)).long()

        # Pass the input and target for model's forward method
        output = model(input_data_enc, target)
        output = output[1:].reshape(-1, output.shape[2])
        target = target[1:].reshape(-1)

        # Clear the accumulating gradients
        optimizer.zero_grad()

        # Calculate the loss value for every epoch
        loss = criterion(output, target)

        # Calculate the gradients for weights & biases using back-propagation
        loss.backward()

        # Update the weights values using the gradients we calculated using bp
        optimizer.step()
        step += 1
        epoch_loss += loss.item()

        epoch_loss_list.append(loss.item())

        if epoch_loss < best_loss:
            best_loss = epoch_loss
            best_epoch = epoch
        if ((epoch - best_epoch) >= 10):
            print("no improvement in 10 epochs, break")
            break
        print("Iterations / loss -  {} / {}".format(batch_idx,loss.item()))
        print()
    losses.append(np.mean(epoch_loss_list))

torch.save({
          'model_state_dict': model.state_dict(),
          'loss': losses
          },"lstm_seq2seq")

Epoch - 1 / 10
Iterations / loss -  0 / 9.072749137878418

Iterations / loss -  1 / 9.026429176330566

Iterations / loss -  2 / 8.985509872436523

Iterations / loss -  3 / 8.939216613769531

Iterations / loss -  4 / 8.88586711883545

Iterations / loss -  5 / 8.828453063964844

Iterations / loss -  6 / 8.797380447387695

Iterations / loss -  7 / 8.663753509521484

Iterations / loss -  8 / 8.666975975036621

Iterations / loss -  9 / 8.564804077148438

Iterations / loss -  10 / 8.442745208740234

Iterations / loss -  11 / 8.369649887084961

Iterations / loss -  12 / 8.267593383789062

Iterations / loss -  13 / 8.159355163574219

Iterations / loss -  14 / 8.066217422485352

Iterations / loss -  15 / 7.958845615386963

Iterations / loss -  16 / 7.827176570892334

Iterations / loss -  17 / 7.726943492889404

Iterations / loss -  18 / 7.59357213973999

Iterations / loss -  19 / 7.455556869506836

Iterations / loss -  20 / 7.335023880004883

Iterations / loss -  21 / 7.192366123199463

Iterati

In [40]:
### Ex-4-Task-2
loss = None

# Model Loss
# Store the model's loss from trained above

# Exercise 4 | Task 2
### BEGIN SOLUTION
# your code here
loss = losses[-1]
# raise NotImplementedError
### END SOLUTION

In [41]:
loss

0.037509294485466346

In [42]:
#INTENTIONALLY LEFT BLANK
assert loss is not None

Now, after we performed some preprocessing and model training steps, then we will start working on the model inferencing. We will see how well the model predicts the results. Also, we will discover what can be the possible solution to this problem.

If we look at the model training results, we can see that the model is not performing really well. This may be because the LSTM network we are using is not able to learn the appropriate feature inputs. The no. of tokens is also pretty large which is giving the model a hard time to learn the input feature itself. So, the possible solution to these problems could be the `Attention Mechanisms`. We have not used attention mechanism, however if we use attention mechanism the result will surely turn out better.

Moreover, we can validate the performance of the model by also inferencing on the model.

###  Decode Sample Sequences

Following is the code to decode the input sequence to the machine translation network.

In [43]:
model = Seq2Seq(Encoder(num_encoder_tokens, latent_dim, latent_dim), Decoder(num_decoder_tokens, latent_dim, latent_dim, num_decoder_tokens))


checkpoint = torch.load("lstm_seq2seq")
model.load_state_dict(checkpoint['model_state_dict'])

def decode_sequence(sentence, max_length=50):
    model.eval()
    # lower, removing punctuations,
    tokens =  (''.join(char for char in re.sub(" +", " ", re.sub("'", '', sentence).lower()) if char not in sets_of_punctuations)).split()

    text_to_indices = [ input_token_index[token] for token in tokens]
    sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1)

    # Build encoder hidden, cell state
    with torch.no_grad():
        hidden, cell = model.Encoder_LSTM(sentence_tensor)

    outputs = [target_token_index["<START>"]]

    for _ in range(max_length):
        previous_word = torch.LongTensor([outputs[-1]])

        with torch.no_grad():
            output, ( hidden, cell ) = model.Decoder_LSTM(previous_word, (hidden, cell))
            best_guess = output.argmax(1).item()

        outputs.append(best_guess)

        # Model predicts it's the end of the sentence
        if best_guess == "<END>":
            break

    translated_sentence = [reverse_target_char_index.get(idx, '<PAD>') for idx in outputs]
    return translated_sentence[1:]

### Evaluation on Train Dataset

Generating the sample to check some of the results predicted by the machine

In [44]:
k=0
decoded_sentence = decode_sequence(X_train[k:k+1].values[0])
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Chinese Translation:', y_train[k:k+1].values[0])
print('Predicted Chinese Translation:', decoded_sentence)

Input English sentence: thats so perfect
Actual Chinese Translation: <START> 那是完美的。 <END>
Predicted Chinese Translation: ['<END>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


In [45]:
k+=1
decoded_sentence = decode_sequence(X_train[k:k+1].values[0])
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Chinese Translation:', y_train[k:k+1].values[0])
print('Predicted Chinese Translation:', decoded_sentence)

Input English sentence: he worked very hard
Actual Chinese Translation: <START> 他工作很努力。 <END>
Predicted Chinese Translation: ['<END>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


In [46]:
k+=1
decoded_sentence = decode_sequence(X_train[k:k+1].values[0])
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Chinese Translation:', y_train[k:k+1].values[0])
print('Predicted Chinese Translation:', decoded_sentence)

Input English sentence: im not afraid of death
Actual Chinese Translation: <START> 我不怕死。 <END>
Predicted Chinese Translation: ['<END>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


In [47]:
k+=1
decoded_sentence = decode_sequence(X_train[k:k+1].values[0])
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Chinese Translation:', y_train[k:k+1].values[0])
print('Predicted Chinese Translation:', decoded_sentence)

Input English sentence: its really not important
Actual Chinese Translation: <START> 真的不重要。 <END>
Predicted Chinese Translation: ['<END>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


In [48]:
k+=1
decoded_sentence = decode_sequence(X_train[k:k+1].values[0])
print('Input English sentence:', X_train[k:k+1].values[0])
print('Actual Chinese Translation:', y_train[k:k+1].values[0])
print('Predicted Chinese Translation:', decoded_sentence)

Input English sentence: whats not necessary
Actual Chinese Translation: <START> 什麼是不必要的 <END>
Predicted Chinese Translation: ['<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


CONGRATULATIONS!!! on completing the Assignment.