# Neural Machine Translation (NMT) - Translating English sentences to Vietnam sentences

Machine Translation refers to translating phrases across languages using deep learning and specifically with RNN ( Recurrent Neural Nets ). Most of these are complex systems that is they are a combined system of various algorithms. But, at its core, NMT uses sequence-to-sequence ( seq2seq ) RNN cells. Such models could be character level but word level models remain common.

![NMT system](https://3.bp.blogspot.com/-3Pbj_dvt0Vo/V-qe-Nl6P5I/AAAAAAAABQc/z0_6WtVWtvARtMk0i9_AtLeyyGyV6AI4wCLcB/s1600/nmt-model-fast.gif)

I insist to change the runtime to a GPU runtime so that training could be faster.

## What are we going to do?
We will basically create an encoder-decoder LSTM model using [Keras Functional API](https://www.tensorflow.org/alpha/guide/keras/functional) ( with [TensorFlow](https://www.tensorflow.org/) ). We will convert the English sentences to [Marathi](https://en.wikipedia.org/wiki/Marathi_language) ( A language native to India ). But, why Marathi?


*   Has special characters and much complex.
*   Has a totally new script ( Devnagiri ) with no pretrained word-embeddings available yet.

Here's an example,

Hello --> Xin chào

So, let's get started.



## Preparing the Data

### 1) Importing the libraries

We will import TensorFlow and Keras. From Keras, we import various modules which help in building NN layers, preprocess data and construct LSTM models.

In [1]:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers , activations , models , preprocessing , utils
import pandas as pd


### 2) Reading the data


Our dataset which contains more than 30K pairs of English-Marathi phrases. This amazing dataset is available at http://www.manythings.org/anki/ and it also other 50+ sets of bilingual sentences. We download the dataset for English-Marathi phrases, unzip it and read it using [Pandas](https://pandas.pydata.org/).

In [2]:

!wget http://www.manythings.org/anki/vie-eng.zip -O vie-eng.zip
!unzip vie-eng.zip


--2022-05-20 03:00:57--  http://www.manythings.org/anki/vie-eng.zip
Resolving www.manythings.org (www.manythings.org)... 172.67.186.54, 104.21.92.44, 2606:4700:3030::6815:5c2c, ...
Connecting to www.manythings.org (www.manythings.org)|172.67.186.54|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 320614 (313K) [application/zip]
Saving to: ‘vie-eng.zip’


2022-05-20 03:00:58 (2.17 MB/s) - ‘vie-eng.zip’ saved [320614/320614]

Archive:  vie-eng.zip
  inflating: _about.txt              
  inflating: vie.txt                 


In [2]:
lines = pd.read_table( 'vie.txt' , names=[ 'eng' , 'vie' ] )

In [3]:
lines.reset_index( level=0 , inplace=True )

In [4]:
lines

Unnamed: 0,index,eng,vie
0,Run!,Chạy!,CC-BY 2.0 (France) Attribution: tatoeba.org #9...
1,Help!,Giúp tôi với!,CC-BY 2.0 (France) Attribution: tatoeba.org #4...
2,Go on.,Tiếp tục đi.,CC-BY 2.0 (France) Attribution: tatoeba.org #2...
3,Hello!,Chào bạn.,CC-BY 2.0 (France) Attribution: tatoeba.org #3...
4,Hurry!,Nhanh lên nào!,CC-BY 2.0 (France) Attribution: tatoeba.org #1...
...,...,...,...
8045,"In 2009, Selena Gomez became the youngest pers...","Vào năm 2009, Sê-lê-na Gô-mét đã được lựa chọn...",CC-BY 2.0 (France) Attribution: tatoeba.org #5...
8046,"In 2009, Selena Gomez became the youngest pers...","Vào năm 2009, Selena Gomez đã được lựa chọn để...",CC-BY 2.0 (France) Attribution: tatoeba.org #5...
8047,"In 2009, Selena Gomez became the youngest pers...","Vào năm 2009, Selena Gomez đã trở thành Đại sứ...",CC-BY 2.0 (France) Attribution: tatoeba.org #5...
8048,The people here are particular about what they...,Những người ở đây khá là khó tính về khẩu vị ă...,CC-BY 2.0 (France) Attribution: tatoeba.org #2...


In [4]:
lines.rename( columns={ 'index' : 'eng' , 'eng' : 'vie' , 'vie' : 'c' } , inplace=True )

### 3) Preparing input data for the Encoder ( `encoder_input_data` )
The Encoder model will be fed input data which are preprocessed English sentences. The preprocessing is done as follows :


1.   Tokenizing the English sentences from `eng_lines`.
2.   Determining the maximum length of the English sentence that's `max_input_length`.
3.   Padding the `tokenized_eng_lines` to the max_input_length.
4.   Determining the vocabulary size ( `num_eng_tokens` ) for English words.





In [5]:
eng_lines = list()
for line in lines.eng:
    eng_lines.append( line ) 

In [6]:
tokenizer = preprocessing.text.Tokenizer()
tokenizer.fit_on_texts( eng_lines ) 
tokenized_eng_lines = tokenizer.texts_to_sequences( eng_lines ) 

In [7]:
length_list = list()
for token_seq in tokenized_eng_lines:
    length_list.append( len( token_seq ))
max_input_length = np.array( length_list ).max()
print( 'English max length is {}'.format( max_input_length ))

English max length is 32


In [8]:
padded_eng_lines = preprocessing.sequence.pad_sequences( tokenized_eng_lines , maxlen=max_input_length , padding='post' )
encoder_input_data = np.array( padded_eng_lines )
print( 'Encoder input data shape -> {}'.format( encoder_input_data.shape ))

Encoder input data shape -> (8050, 32)


In [9]:
eng_word_dict = tokenizer.word_index
num_eng_tokens = len( eng_word_dict )+1
print( 'Number of English tokens = {}'.format( num_eng_tokens))

Number of English tokens = 3802


### 4) Preparing input data for the Decoder ( `decoder_input_data` )
The Decoder model will be fed the preprocessed Marathi lines. The preprocessing steps are similar to the ones which are above. This one step is carried out before the other steps.


*   Append `<START>` tag at the first position in  each Marathi sentence.
*   Append `<END>` tag at the last position in  each Marathi sentence.





In [10]:
vie_lines = list()
for line in lines.vie:
    vie_lines.append( '<START> ' + line + ' <END>' )  

In [11]:
tokenizer = preprocessing.text.Tokenizer()
tokenizer.fit_on_texts( vie_lines ) 
tokenized_vie_lines = tokenizer.texts_to_sequences( vie_lines ) 

In [12]:
length_list = list()
for token_seq in tokenized_vie_lines:
    length_list.append( len( token_seq ))
max_output_length = np.array( length_list ).max()
print( 'Vietnam max length is {}'.format( max_output_length ))

Vietnam max length is 43


In [13]:
padded_vie_lines = preprocessing.sequence.pad_sequences( tokenized_vie_lines , maxlen=max_output_length, padding='post' )
decoder_input_data = np.array( padded_vie_lines )
print( 'Decoder input data shape -> {}'.format( decoder_input_data.shape ))

Decoder input data shape -> (8050, 43)


In [14]:
vie_word_dict = tokenizer.word_index
num_vie_tokens = len( vie_word_dict )+1
print( 'Number of Vietnam tokens = {}'.format( num_vie_tokens))

Number of Vietnam tokens = 2384


### 5) Preparing target data for the Decoder ( decoder_target_data ) 

We take a copy of `tokenized_mar_lines` and modify it like this.



1.   We remove the `<start>` tag which we appended earlier. Hence, the word ( which is `<start>` in this case  ) will be removed.
2.   Convert the `padded_mar_lines` ( ones which do not have `<start>` tag ) to one-hot vectors.

For example :

```
 [ '<start>' , 'hello' , 'world' , '<end>' ]

```

wil become 

```
 [ 'hello' , 'world' , '<end>' ]

```


In [15]:

decoder_target_data = list()
for token_seq in tokenized_vie_lines:
    decoder_target_data.append( token_seq[ 1 : ] ) 
    
padded_vie_lines = preprocessing.sequence.pad_sequences( decoder_target_data , maxlen=max_output_length, padding='post' )
onehot_vie_lines = utils.to_categorical( padded_vie_lines , num_vie_tokens )
decoder_target_data = np.array( onehot_vie_lines )
print( 'Decoder target data shape -> {}'.format( decoder_target_data.shape ))


Decoder target data shape -> (8050, 43, 2384)


## Defining and Training the models

### 1) Defining the Encoder-Decoder model
The model will have Embedding, LSTM and Dense layers. The basic configuration is as follows.


*   2 Input Layers : One for `encoder_input_data` and another for `decoder_input_data`.
*   Embedding layer : For converting token vectors to fix sized dense vectors. **( Note :  Don't forget the `mask_zero=True` argument here )**
*   LSTM layer : Provide access to Long-Short Term cells.

Working : 

1.   The `encoder_input_data` comes in the Embedding layer (  `encoder_embedding` ). 
2.   The output of the Embedding layer goes to the LSTM cell which produces 2 state vectors ( `h` and `c` which are `encoder_states` )
3.   These states are set in the LSTM cell of the decoder.
4.   The decoder_input_data comes in through the Embedding layer.
5.   The Embeddings goes in LSTM cell ( which had the states ) to produce seqeunces.









In [16]:

encoder_inputs = tf.keras.layers.Input(shape=( None , ))
encoder_embedding = tf.keras.layers.Embedding( num_eng_tokens, 256 , mask_zero=True ) (encoder_inputs)
encoder_outputs , state_h , state_c = tf.keras.layers.LSTM( 128 , return_state=True  )( encoder_embedding )
encoder_states = [ state_h , state_c ]

decoder_inputs = tf.keras.layers.Input(shape=( None ,  ))
decoder_embedding = tf.keras.layers.Embedding( num_vie_tokens, 256 , mask_zero=True) (decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM( 128 , return_state=True , return_sequences=True)
decoder_outputs , _ , _ = decoder_lstm ( decoder_embedding , initial_state=encoder_states )
decoder_dense = tf.keras.layers.Dense( num_vie_tokens , activation=tf.keras.activations.softmax ) 
output = decoder_dense ( decoder_outputs )

model = tf.keras.models.Model([encoder_inputs, decoder_inputs], output )
model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='categorical_crossentropy')

model.summary()


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, None, 256)    973312      input_1[0][0]                    

### 2) Training the model
We train the model for a number of epochs with RMSprop optimizer and categorical crossentropy loss function.

In [17]:
from tensorflow.keras.callbacks import ModelCheckpoint

In [18]:
filename = 'saveModel/model_eng_vie_4.h5'
checkpoint = ModelCheckpoint(filename, monitor='loss', verbose=1, save_best_only=True, mode='min')
model.fit([encoder_input_data , decoder_input_data], decoder_target_data, batch_size=250, epochs=100, callbacks=[checkpoint], verbose=2)

Train on 8050 samples
Epoch 1/100

Epoch 00001: loss improved from inf to 1.42992, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 84s - loss: 1.4299
Epoch 2/100

Epoch 00002: loss improved from 1.42992 to 1.24880, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 93s - loss: 1.2488
Epoch 3/100

Epoch 00003: loss improved from 1.24880 to 1.20099, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 94s - loss: 1.2010
Epoch 4/100

Epoch 00004: loss improved from 1.20099 to 1.16324, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 102s - loss: 1.1632
Epoch 5/100

Epoch 00005: loss improved from 1.16324 to 1.13065, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 93s - loss: 1.1307
Epoch 6/100

Epoch 00006: loss improved from 1.13065 to 1.10049, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 94s - loss: 1.1005
Epoch 7/100

Epoch 00007: loss improved from 1.10049 to 1.07204, saving model to saveModel/model_eng_vie_4.h5
8050/8050 - 96s - loss: 1

<tensorflow.python.keras.callbacks.History at 0x22c525a8c50>

In [17]:
# model.fit([encoder_input_data , decoder_input_data], decoder_target_data, batch_size=250, epochs=20) 

Train on 8050 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x221b58d9e80>

In [20]:
# model.save( 'saveModel/model_eng_vie.hdf5' ) 

In [19]:
model = tf.keras.models.load_model('saveModel/model_eng_vie_3.h5')

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


## Inferencing on the models

### 1) Defining inference models
We create inference models which help in predicting translations.

**Encoder inference model** : Takes the English sentence as input and outputs LSTM states ( `h` and `c` ).

**Decoder inference model** : Takes in 2 inputs, one are the LSTM states ( Output of encoder model ), second are the Marathi input seqeunces ( ones not having the `<start>` tag ). It will output the translations of the English sentence which we fed to the encoder model and its state values.





In [19]:

def make_inference_models():
    
    encoder_model = tf.keras.models.Model(encoder_inputs, encoder_states)
    
    decoder_state_input_h = tf.keras.layers.Input(shape=( 128 ,))
    decoder_state_input_c = tf.keras.layers.Input(shape=( 128 ,))
    
    decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
    
    decoder_outputs, state_h, state_c = decoder_lstm(
        decoder_embedding , initial_state=decoder_states_inputs)
    decoder_states = [state_h, state_c]
    decoder_outputs = decoder_dense(decoder_outputs)
    decoder_model = tf.keras.models.Model(
        [decoder_inputs] + decoder_states_inputs,
        [decoder_outputs] + decoder_states)
    
    return encoder_model , decoder_model


### 2) Making some translations


1.   First, we take a English sequence and predict the state values using `enc_model`.
2.   We set the state values in the decoder's LSTM.
3.   Then, we generate a sequence which contains the `<start>` element.
4.   We input this sequence in the `dec_model`.
5.   We replace the `<start>` element with the element which was predicted by the `dec_model` and update the state values.
6.   We carry out the above steps iteratively till we hit the `<end>` tag or the maximum sequence length.







In [20]:

def str_to_tokens( sentence : str ):
    words = sentence.lower().split()
    tokens_list = list()
    for word in words:
        tokens_list.append( eng_word_dict[ word ] ) 
    return preprocessing.sequence.pad_sequences( [tokens_list] , maxlen=max_input_length , padding='post')


In [22]:
import speech_recognition as sr
from gtts import gTTS
import playsound
import os

# Initialize the recognizer
r = sr.Recognizer()

# Loop infinitely for user to
# speak
 
while(1):   
     
    # Exception handling to handle
    # exceptions at the runtime
    try:
         
        # use the microphone as source for input.
        with sr.Microphone() as source2:
             
            # wait for a second to let the recognizer
            # adjust the energy threshold based on
            # the surrounding noise level
            r.adjust_for_ambient_noise(source2, duration=0.2)
             
            #listens for the user's input
            audio2 = r.listen(source2)
             
            # Using google to recognize audio
            MyText = r.recognize_google(audio2)
            MyText = MyText.lower()
 
            print("Did you say "+MyText)

            enc_model , dec_model = make_inference_models()

            states_values = enc_model.predict( str_to_tokens( MyText ) )
            empty_target_seq = np.zeros( ( 1 , 1 ) )
            empty_target_seq[0, 0] = vie_word_dict['start']
            stop_condition = False
            decoded_translation = ''
            while not stop_condition :
                dec_outputs , h , c = dec_model.predict([ empty_target_seq ] + states_values )
                sampled_word_index = np.argmax( dec_outputs[0, -1, :] )
                sampled_word = None
                for word , index in vie_word_dict.items() :
                    if sampled_word_index == index :
                        decoded_translation += ' {}'.format( word )
                        sampled_word = word
                
                if sampled_word == 'end' or len(decoded_translation.split()) > max_output_length:
                    stop_condition = True
                    
                empty_target_seq = np.zeros( ( 1 , 1 ) )  
                empty_target_seq[ 0 , 0 ] = sampled_word_index
                states_values = [ h , c ] 

            print( decoded_translation )    
            
            # text = "Em nhà ở đâu thế" 
            output = gTTS(decoded_translation, lang="vi", slow=False)
            output.save("output.mp3")
            playsound.playsound('output.mp3', True)
            os.remove("output.mp3")
             
    except sr.RequestError as e:
        print("Could not request results; {0}".format(e))
         
    except sr.UnknownValueError:
        print("unknown error occured")

unknown error occured


KeyboardInterrupt: 

In [21]:

enc_model , dec_model = make_inference_models()

states_values = enc_model.predict( str_to_tokens( 'hello' ) )
empty_target_seq = np.zeros( ( 1 , 1 ) )
empty_target_seq[0, 0] = vie_word_dict['start']
stop_condition = False
decoded_translation = ''
while not stop_condition :
    dec_outputs , h , c = dec_model.predict([ empty_target_seq ] + states_values )
    sampled_word_index = np.argmax( dec_outputs[0, -1, :] )
    sampled_word = None
    for word , index in vie_word_dict.items() :
        if sampled_word_index == index :
            decoded_translation += ' {}'.format( word )
            sampled_word = word
    
    if sampled_word == 'end' or len(decoded_translation.split()) > max_output_length:
        stop_condition = True
        
    empty_target_seq = np.zeros( ( 1 , 1 ) )  
    empty_target_seq[ 0 , 0 ] = sampled_word_index
    states_values = [ h , c ] 

print( decoded_translation )
    


 nhớ điểm thống thút gratin với súng cực cực cực cực ủng cực cực đời đời to ủng đội cãi lành lành sắp sắp sắp đợt cocktail làn chút nhạc cuả dâu chay nĩa phố góc waikiki khí nghĩa trai lồ dẹp thứ tàng


In [26]:

# enc_model , dec_model = make_inference_models()

# for epoch in range( encoder_input_data.shape[0] ):
#     states_values = enc_model.predict( str_to_tokens( input( 'Enter eng sentence : ' ) ) )
#     #states_values = enc_model.predict( encoder_input_data[ epoch ] )
#     empty_target_seq = np.zeros( ( 1 , 1 ) )
#     empty_target_seq[0, 0] = vie_word_dict['start']
#     stop_condition = False
#     decoded_translation = ''
#     while not stop_condition :
#         dec_outputs , h , c = dec_model.predict([ empty_target_seq ] + states_values )
#         sampled_word_index = np.argmax( dec_outputs[0, -1, :] )
#         sampled_word = None
#         for word , index in vie_word_dict.items() :
#             if sampled_word_index == index :
#                 decoded_translation += ' {}'.format( word )
#                 sampled_word = word
        
#         if sampled_word == 'end' or len(decoded_translation.split()) > max_output_length:
#             stop_condition = True
            
#         empty_target_seq = np.zeros( ( 1 , 1 ) )  
#         empty_target_seq[ 0 , 0 ] = sampled_word_index
#         states_values = [ h , c ] 

#     print( decoded_translation )


Enter eng sentence : hello
 chào bạn end
Enter eng sentence : how are you
 bạn thế nào end
Enter eng sentence : help me
 cứu tôi với end
Enter eng sentence : come here
 lại đây nào end
Enter eng sentence : are you ok
 bạn có sao không end
Enter eng sentence : how old are you
 bạn bao nhiêu tuổi end
Enter eng sentence : please get me hotel security
 làm ơn cho tôi gặp bảo vệ khách sạn end
Enter eng sentence : my hobby is taking pictures of wild flowers
 sở thích của tôi là chụp những bức ảnh hoa dại end
Enter eng sentence : people used to think that only humans could use language
 mọi người thường nghĩ rằng chỉ có con người mới có thể sử dụng ngôn nghĩ end


KeyboardInterrupt: ignored