<a href="https://colab.research.google.com/github/jeyavigneshr/Deep-Learning/blob/master/final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Machine Translation - Assignment 2

In this task, you will develop a neural machine translation (NMT) system to translate text from one language into another. For this, you will need to choose the data to train the models, perform data processing and train a sequence2sequence neural model.


## Section 1- Data Collection and Preprocessing 


---


**Task 1  (5 marks)**

---

There are few datasets to train an NMT system available from Tatoeba Project (http://www.manythings.org/anki/) or OPUS project (http://opus.nlpl.eu/).

*  Download a language pair (preferably European language) and **extract** the file(s) and upload it to CoLab
*  Create a list of lines by splitting the text file at every occurrence accordingly, i.e. source and target language and remove copyright information
*  Print number of sentences
*  Limit the number of sentences to 10,000 lines (but more than 5,000 lines)
*  Split the data into train and test [You can split validation set here or while training use Keras validation_split option]
*  Print 100th sentence in original script[ not unicode] for source and target language

In [None]:
!wget http://www.manythings.org/anki/deu-eng.zip

In [None]:
import numpy as np
import os, sys , re , unicodedata
import tensorflow as tf

In [None]:
TOT_SENTENCES = 10000
NUM_WORDS = 10000

In [None]:
source_sentences = []
target_sentences = []
output_sentences_inputs = []

# print(len(lines))
count = 0
for line in lines:

    if count > TOT_SENTENCES-1:
        break

    if '\t' not in line:
        continue

    input_sentence, output , _ = line.rstrip().split('\t')
    if input_sentence in source_sentences:
        continue
    output_sentence = output + ' <eos>'
    output_sentence_input = '<bof> ' + output

    source_sentences.append(input_sentence)
    target_sentences.append(output_sentence)
    output_sentences_inputs.append(output_sentence_input)
    count += 1


In [None]:
print(len(source_sentences))
print(source_sentences[100] , target_sentences[100])

10000
Pull it. Zieh dran. <eos>


**Task 2 (5 marks)** 

---

* Add '<bof\>' to denote beginning of sentence and '<eos\>' to denote the end of the sentence to each target line.
* Preprocess (word tokenisation, lowercasing) the text.

In [None]:
#Source reference https://www.tensorflow.org/tutorials/text/nmt_with_attention
def unicode_to_ascii(s):
  s =  ''.join(c for c in unicodedata.normalize('NFD', s)
      if unicodedata.category(c) != 'Mn')
  s = re.sub(r"[^a-zA-Z?¿<>']+", " ", s)
  s = re.sub(r"'",'',s)
  s = s.lower()
  return s
def preprocess(w):
    w = unicode_to_ascii(w)
    w = w.strip().rstrip()
    w = re.sub(r"([?.!,])", r"", w)
    w = re.sub(r'[" "]+', " ", w)
    return w

In [None]:
source_sentences = [preprocess(w) for w in source_sentences]
target_sentences = [preprocess(w) for w in target_sentences]
output_sentences_inputs =[preprocess(w) for w in output_sentences_inputs]

In [None]:
print(source_sentences[0], target_sentences[0])

go geh <eos>


In [None]:
print(preprocess("¿Qué? isn't él"))

¿que isnt el


In [None]:
input_tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=NUM_WORDS)
input_tokenizer.fit_on_texts(source_sentences)
input_integer_seq = input_tokenizer.texts_to_sequences(source_sentences)

Source_vocabulary = input_tokenizer.word_index
num_source_tokens = len(Source_vocabulary)

max_source_seq_length = max(len(sen) for sen in input_integer_seq)

In [None]:
output_tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=NUM_WORDS, filters='')
output_tokenizer.fit_on_texts(target_sentences + output_sentences_inputs)
output_integer_seq = output_tokenizer.texts_to_sequences(target_sentences)
output_input_integer_seq = output_tokenizer.texts_to_sequences(output_sentences_inputs)

target_vocabulary = output_tokenizer.word_index
num_target_tokens = len(target_vocabulary)

num_words_output = len(target_vocabulary) + 1
max_target_seq_length = max(len(sen) for sen in output_integer_seq)

In [None]:
encoder_input_sequences = tf.keras.preprocessing.sequence.pad_sequences(input_integer_seq, maxlen=max_source_seq_length)
decoder_input_sequences = tf.keras.preprocessing.sequence.pad_sequences(output_input_integer_seq, maxlen=max_target_seq_length, padding='post')
decoder_output_sequences = tf.keras.preprocessing.sequence.pad_sequences(output_integer_seq, maxlen=max_target_seq_length, padding='post')

In [None]:
print('Number of samples:', len(eng_sentences))
print('Number of unique source language tokens:', num_source_tokens)
print('Number of unique target language tokens:', num_target_tokens)
print('Max sequence length of source language:', max_source_seq_length)
print('Max sequence length of target language:', max_target_seq_length)
print("Source Vocabulary",list(Source_vocabulary.keys()))
print("Target Vocabulary",list(target_vocabulary.keys()))

Number of samples: 10000
Number of unique source language tokens: 2837
Number of unique target language tokens: 3748
Max sequence length of source language: 5
Max sequence length of target language: 10
Source Vocabulary ['tom', 'i', 'is', 'it', 'a', 'im', 'you', 'was', 'me', 'we', 'he', 'its', 'this', 'that', 'to', 'the', 'my', 'do', 'are', 'go', 'ill', 'can', 'toms', 'were', 'she', 'dont', 'they', 'not', 'like', 'have', 'here', 'be', 'thats', 'come', 'up', 'in', 'love', 'him', 'get', 'well', 'isnt', 'did', 'want', 'youre', 'need', 'has', 'no', 'on', 'am', 'know', 'home', 'who', 'got', 'how', 'too', 'hes', 'now', 'will', 'cant', 'us', 'your', 'there', 'see', 'saw', 'theyre', 'help', 'one', 'stop', 'lets', 'let', 'look', 'out', 'take', 'just', 'may', 'what', 'very', 'busy', 'good', 'all', 'at', 'keep', 'try', 'hate', 'back', 'car', 'so', 'mary', 'looks', 'down', 'please', 'her', 'for', 'had', 'work', 'wait', 'ok', 'happy', 'didnt', 'eat', 'alone', 'wasnt', 'give', 'lost', 'ive', 'must',

In [None]:
from sklearn.model_selection import train_test_split

X_train , X_test , Y_train, Y_test = train_test_split(encoder_input_sequences, decoder_input_sequences , test_size=0.2)

**Task 3 (5 marks)** 

---

---

*  Assign each unique word an integer value (5 marks).
*  Create word embedding for your vocabulary using pre-trained Glove embeddings (10 marks) (http://nlp.stanford.edu/data/glove.6B.zip)
* Print the first line of the embeddings (see below) 

In [None]:
s_word_dict = {}
for i,w in enumerate(Source_vocabulary):
    s_word_dict[w] = i

In [None]:
t_word_dict = {}
for i,w in enumerate(target_vocabulary):
    t_word_dict[w] = i

In [None]:
print(s_word_dict)
print(t_word_dict)

{'tom': 0, 'i': 1, 'is': 2, 'it': 3, 'a': 4, 'im': 5, 'you': 6, 'was': 7, 'me': 8, 'we': 9, 'he': 10, 'its': 11, 'this': 12, 'that': 13, 'to': 14, 'the': 15, 'my': 16, 'do': 17, 'are': 18, 'go': 19, 'ill': 20, 'can': 21, 'toms': 22, 'were': 23, 'she': 24, 'dont': 25, 'they': 26, 'not': 27, 'like': 28, 'have': 29, 'here': 30, 'be': 31, 'thats': 32, 'come': 33, 'up': 34, 'in': 35, 'love': 36, 'him': 37, 'get': 38, 'well': 39, 'isnt': 40, 'did': 41, 'want': 42, 'youre': 43, 'need': 44, 'has': 45, 'no': 46, 'on': 47, 'am': 48, 'know': 49, 'home': 50, 'who': 51, 'got': 52, 'how': 53, 'too': 54, 'hes': 55, 'now': 56, 'will': 57, 'cant': 58, 'us': 59, 'your': 60, 'there': 61, 'see': 62, 'saw': 63, 'theyre': 64, 'help': 65, 'one': 66, 'stop': 67, 'lets': 68, 'let': 69, 'look': 70, 'out': 71, 'take': 72, 'just': 73, 'may': 74, 'what': 75, 'very': 76, 'busy': 77, 'good': 78, 'all': 79, 'at': 80, 'keep': 81, 'try': 82, 'hate': 83, 'back': 84, 'car': 85, 'so': 86, 'mary': 87, 'looks': 88, 'down': 

In [None]:
!wget nlp.stanford.edu/data/glove.6B.zip

--2020-04-16 00:26:28--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2020-04-16 00:26:28--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2020-04-16 00:26:29--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2020-0

In [None]:
!unzip glove.6B.zip

Archive:  glove.6B.zip
  inflating: glove.6B.50d.txt        
  inflating: glove.6B.100d.txt       
  inflating: glove.6B.200d.txt       
  inflating: glove.6B.300d.txt       


In [None]:
#Example from keras.io

embeddings_dict = {}
with open("glove.6B.50d.txt", 'r') as f:
    for line in f:
        values = line.split()
        word = values[0]
        vector = np.asarray(values[1:], "float32")
        embeddings_dict[word] = vector

In [None]:
#Example from keras.io
num_words = min(TOT_WORDS, len(Source_vocabulary) + 1)
embedding_matrix = np.zeros((num_words, 50))
for word, index in Source_vocabulary.items():
    embedding_vector = embeddings_dict.get(word)
    if embedding_vector is not None:
        embedding_matrix[index] = embedding_vector

In [None]:
print(embeddings_dict[list(embeddings_dict.keys())[0]])

[ 4.1800e-01  2.4968e-01 -4.1242e-01  1.2170e-01  3.4527e-01 -4.4457e-02
 -4.9688e-01 -1.7862e-01 -6.6023e-04 -6.5660e-01  2.7843e-01 -1.4767e-01
 -5.5677e-01  1.4658e-01 -9.5095e-03  1.1658e-02  1.0204e-01 -1.2792e-01
 -8.4430e-01 -1.2181e-01 -1.6801e-02 -3.3279e-01 -1.5520e-01 -2.3131e-01
 -1.9181e-01 -1.8823e+00 -7.6746e-01  9.9051e-02 -4.2125e-01 -1.9526e-01
  4.0071e+00 -1.8594e-01 -5.2287e-01 -3.1681e-01  5.9213e-04  7.4449e-03
  1.7778e-01 -1.5897e-01  1.2041e-02 -5.4223e-02 -2.9871e-01 -1.5749e-01
 -3.4758e-01 -4.5637e-02 -4.4251e-01  1.8785e-01  2.7849e-03 -1.8411e-01
 -1.1514e-01 -7.8581e-01]


## Section 2 Translation Model training



---



**Task 4 (15 marks)**
* Provide code for the encoder using Keras LSTM (5 marks)
* Provide code for the decoder using Keras LSTM (5 marks)
* Train the sequence2sequence (encoder-decoder) model (5 marks) 


In [None]:
embedding_layer = tf.keras.layers.Embedding(num_words, 50, weights=[embedding_matrix], input_length=max_source_seq_length)

In [None]:
decoder_targets_one_hot = np.zeros((
        len(source_sentences),
        max_target_seq_length,
        num_words_output
    ),
    dtype='float32'
)

In [None]:
# decoder_targets_one_hot = None
# model = None
# del decoder_targets_one_hot
# del model
# # del encoder_states
# # del decoder_dense
# import gc
# gc.collect()

In [None]:
decoder_targets_one_hot.shape

(10000, 10, 3749)

In [None]:
for i, d in enumerate(decoder_output_sequences):
    for t, word in enumerate(d):
        decoder_targets_one_hot[i, t, word] = 1

In [None]:
encoder_placeholder = tf.keras.layers.Input(shape=(max_source_seq_length,),name='encoder_layer')
layer_inp = embedding_layer(encoder_placeholder)
encoder = tf.keras.layers.LSTM(256, return_state=True)

encoder_outputs, h, c = encoder(layer_inp)
encoder_states = [h, c]

In [None]:
# decoder code goes here

decoder_placeholder = tf.keras.layers.Input(shape=(max_target_seq_length,),name='decoder_layer')

decoder_embedding_layer = tf.keras.layers.Embedding(num_words_output, 256 )
decoder_inp = decoder_embedding_layer(decoder_placeholder)


decoder_lstm = tf.keras.layers.LSTM(256, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inp, initial_state=encoder_states)

In [None]:
decoder_dense = tf.keras.layers.Dense(num_words_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

In [None]:
model =  tf.keras.models.Model([encoder_placeholder,decoder_placeholder], decoder_outputs)
model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_layer (InputLayer)      [(None, 5)]          0                                            
__________________________________________________________________________________________________
decoder_layer (InputLayer)      [(None, 10)]         0                                            
__________________________________________________________________________________________________
embedding_10 (Embedding)        (None, 5, 50)        141900      encoder_layer[0][0]              
__________________________________________________________________________________________________
embedding_11 (Embedding)        (None, 10, 256)      959744      decoder_layer[0][0]              
____________________________________________________________________________________________

In [None]:
X_train = np.array(X_train)
Y_train = np.array(Y_train)

In [None]:
r = model.fit(
    [encoder_input_sequences, decoder_input_sequences],
    decoder_targets_one_hot,
    batch_size=128,
    epochs=25,
    # initial_epochs=25,
    validation_split=0.1,
)  
model.save('seq2seq_source_target.h5')

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


## Section 3 Testing

---

**Task 5 (20 marks)**

* Use the trained model to translate the text from the source into the target language (10 marks). 
* Use the test/evaluation set (see Section 1) and perform an automatic evaluation with the BLEU metric (10 marks). 
You can use the NLTK library to calculate BLEU.

In [None]:
encoder_model = tf.keras.models.Model(encoder_placeholder, encoder_states)

In [None]:
decoder_h = tf.keras.layers.Input(shape=(256,))
decoder_c = tf.keras.layers.Input(shape=(256,))
decoder_states_inputs = [decoder_h, decoder_c]

In [None]:
decoder_inputs_single = tf.keras.layers.Input(shape=(1,))
decoder_inputs_single_x = decoder_embedding_layer(decoder_inputs_single)

In [None]:
decoder_outputs, h, c = decoder_lstm(decoder_inputs_single_x, initial_state=decoder_states_inputs)

In [None]:
decoder_states = [h, c]
decoder_outputs = decoder_dense(decoder_outputs)

In [None]:
decoder_model = tf.keras.models.Model(
    [decoder_inputs_single] + decoder_states_inputs,
    [decoder_outputs] + decoder_states
)

In [None]:
idx2word_input = {v:k for k, v in Source_vocabulary.items()}
idx2word_target = {v:k for k, v in target_vocabulary.items()}

In [None]:
def translate(sequence):
    states_value = encoder_model.predict(sequence)
    target_seq = np.zeros((1, 1))
    target_seq[0, 0] = target_vocabulary['<bof>']
    eos = target_vocabulary['<eos>']
    output_sentence = []

    for _ in range(max_target_seq_length):
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
        idx = np.argmax(output_tokens[0, 0, :])

        if eos == idx:
            break

        word = ''

        if idx > 0:
            word = idx2word_target[idx]
            output_sentence.append(word)

        target_seq[0, 0] = idx
        states_value = [h, c]

    return ' '.join(output_sentence)

In [None]:
# i = np.random.choice(len(source_sentences))
i  = 7926
input_seq = encoder_input_sequences[i:i+1]
translation = translate(input_seq)
print( eng_sentences[i])
print( translation)

i like this tea
ich mag tee


In [None]:
reference = target_sentences[i].split()
hypothesis = translation.split()
reference = reference[:-1]
print(reference,hypothesis)

['dieser', 'tee', 'gefallt', 'mir'] ['ich', 'mag', 'tee']


In [None]:
import nltk
from nltk.translate.bleu_score import SmoothingFunction
smoothing = SmoothingFunction().method4

In [None]:
def get_bleu(reference,hyp):
    try:
        return nltk.translate.bleu_score.sentence_bleu([reference], hyp,smoothing_function = smoothing,weights=(0.35,0.35,0.20,0.10))
    except:
        print("Zero" , reference ,hyp)
        return 0


In [None]:
get_bleu(reference,hypothesis)

0.15019024921825153

In [None]:
_inp = encoder_input_sequences.tolist()

In [None]:
#X_test = X_test[:200]

In [None]:
bleu_scores = list()

In [None]:
for _i,x in enumerate(X_test):
    idx = _inp.index(x.tolist())
    seq = encoder_input_sequences[idx:idx+1] 
    translation = translate(seq)
    reference = deu_sentences[idx].split()
    hypothesis = translation.split()
    reference = reference[:-1]
    _score = get_bleu(reference,hypothesis)
    bleu_scores.append( _score )

Zero ['tom', 'summt'] ['tom']
Zero ['gesundheit'] ['gesundheit']
Zero ['geh'] ['geh']


In [None]:
_bleu_scores = np.array(bleu_scores)
print("Avg Score: "  ,_bleu_scores.mean() )

Avg Score:  0.5712977231168633


# Section 4 Attention

---



In [None]:
tf.keras.backend.clear_session()
# del model
import gc
gc.collect()

46939

**Task 5 (40 Marks)** Sequence2Sequence

* Extend the existing Seq2Seq model with an attention mechanism [Discussed in Class]
* Create sequence2sequence model with attention (15 marks)
* Train the model with the same data from Section 1 (10 marks)
* Translate the evaluation set using the sequence2sequence attention model (10 marks)
* Evaluate the translations made with the sequence2sequence attention model and compare it with the model without attention using BLEU (5 marks)

In [None]:
#example from tensorfow.org
embedding_size = 50
enc_input = tf.keras.layers.Input(shape=(max_source_seq_length,))
encoder = tf.keras.layers.Embedding(num_words, embedding_size, weights= [embedding_matrix], trainable=False)(enc_input)
encoder = tf.keras.layers.LSTM(256, return_sequences=True, unroll=True )(encoder)
encoder_last_layer = encoder[:,-1,:]
dec_inp = tf.keras.layers.Input(shape=(max_target_seq_length,))
decoder = tf.keras.layers.Embedding(num_words_output, embedding_size)(dec_inp)
decoder = tf.keras.layers.LSTM(256, return_sequences=True, unroll=True)(decoder, initial_state=[encoder_last_layer,encoder_last_layer])
attention = tf.keras.layers.dot([decoder, encoder], axes=[2, 2])
attention = tf.keras.layers.Activation('softmax', name='attention')(attention)

context = tf.keras.layers.dot([attention, encoder], axes=[2,1])

decoder_combined_context = tf.keras.layers.concatenate([context, decoder])

output = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(64, activation="tanh"))(decoder_combined_context)
output = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_words_output, activation="softmax"))(output)

In [None]:
model = tf.keras.models.Model(inputs =[enc_input, dec_inp], outputs =[output])
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            [(None, 5)]          0                                            
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, 5, 50)        141900      input_3[0][0]                    
__________________________________________________________________________________________________
input_4 (InputLayer)            [(None, 10)]         0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 5, 256)       314368      embedding_2[0][0]                
______________________________________________________________________________________________

In [None]:
model.fit(
    [encoder_input_sequences, decoder_input_sequences],
    decoder_targets_one_hot,
    batch_size=256,
    epochs=25,
    validation_split=0.1
)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x7f52830e0470>

In [None]:
def translate_attention(seq):
    decoder_input = np.ones(shape=(seq.shape[0], max_target_seq_length))
    decoder_input[:,0] = target_vocabulary['<bof>']
    for i in range(1, max_target_seq_length):
        output_ = model.predict([seq , decoder_input]).argmax(axis=2)
        decoder_input[:,i] = output_[:,i]
    decoder_output = decoder_input[:,1:]
    decoder_output = np.array(decoder_output,dtype=int)
    sequence = decoder_output[0]
    text = ''
    for i in sequence:
        if i == 0:
            continue
        if idx2word_target[i] == '<eos>':
            break
        else:
            text += ' ' + idx2word_target[i]
    return text.strip().rstrip()

In [None]:
i = 0
input_seq = encoder_input_sequences[i:i+1] 

trans = translate_attention(input_seq)
print(trans)
print(deu_sentences[i], eng_sentences[i])
reference = deu_sentences[i]
reference = reference[:-1]
print(get_bleu(trans , reference))


geh <eos> go
0


In [None]:
bleu_scores_att = list()

In [None]:
for _i,x in enumerate(X_test):
    idx = _inp.index(x.tolist())
    seq = encoder_input_sequences[idx:idx+1]
    translation = translate_attention(seq)
    reference = deu_sentences[idx].split()
    hypothesis = translation.split()
    reference = reference[:-1]
    _score = get_bleu(reference,hypothesis)
    bleu_scores_att.append( _score )


Zero ['tom', 'wird', 'helfen'] ['wird']
Zero ['ist', 'tom', 'zuverlassig'] ['tom']
Zero ['wir', 'sind', 'in', 'ordnung'] ['sind']
Zero ['sie', 'sind', 'familie'] ['sind']
Zero ['wir', 'sind', 'unschuldig'] ['sind']
Zero ['war', 'es', 'kalt'] ['es']
Zero ['kannst', 'du', 'schwimmen'] ['du']
Zero ['wir', 'sind', 'gerettet'] ['sind']
Zero ['krahen', 'sind', 'klug'] ['sind']
Zero ['wir', 'sind', 'furchterlich'] ['sind']
Zero ['ist', 'das', 'normal'] ['das']
Zero ['tom', 'ist', 'verletzt'] ['ist']
Zero ['tom', 'hielt', 'an'] ['tom']
Zero ['tom', 'hat', 'angerufen'] ['tom']
Zero ['ich', 'bin', 'glucklich'] ['bin']
Zero ['ich', 'bin', 'entspannt'] ['bin']
Zero ['wir', 'sind', 'gewinner'] ['sind']
Zero ['wir', 'sind', 'hier'] ['sind']
Zero ['das', 'sind', 'unsere'] ['sind']
Zero ['tom', 'hat', 'gesprochen'] ['tom']
Zero ['ist', 'tom', 'brav'] ['tom']
Zero ['ist', 'das', 'notig'] ['das']
Zero ['wir', 'sind', 'bewaffnet'] ['sind']
Zero ['ist', 'tom', 'entlassen', 'worden'] ['tom']
Zero ['wer', '

In [None]:
_bleu_scores_att = np.array(bleu_scores_att)
print("Avg after Attention : ", _bleu_scores_att.mean())

Avg after Attention :  62.04565930763136
