## AIDI 2000 - Applied Machine Learning 
### Assignment 2 - Machine Translation

#### Michael Molnar (100806823)

This is the conclusion of my assignment.  In this notebook I will use my model to perform machine translation on user input French text.

In [1]:
import string
import re
import pickle 
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import load_model

### Define Necessary Functions From Last Notebook

In [2]:
def clean_text(text):
    """
    Preprocessing text.
    """
    # Split the text into tokens at white spaces
    tokens = text.split()
    # Use regex to filter for punctuation
    no_punc = re.compile('[%s]'% re.escape(string.punctuation))
    # Remove the punctuation
    tokens = [no_punc.sub('', char) for char in tokens]
    # Remove any tokens that are non-alphabetic 
    tokens = [word for word in tokens if word.isalpha()]
    # Lowercase all text
    tokens = [word.lower() for word in tokens]
    return tokens

def int_to_word(integer, tokenizer):
    """
    Converting encoded integer tokens back into words.
    """
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None

def get_translation(model, text, tokenizer):
    """
    Translating cleaned and tokenized text from French to English.
    """
    # Reshape text vector 
    text = text.reshape((1, text.shape[0]))
    # Make prediction
    prediction = model.predict(text, verbose=0)[0]
    # Get the integer indexes of the predicted words
    integers = [np.argmax(vector) for vector in prediction]
    # Convert this into a text sequence
    target = []
    for i in integers:
        word = int_to_word(i, tokenizer)
        # Stop when the sequence hits the None padding
        if word is None:
            break
        target.append(word)
    return ' '.join(target)

### Load the Model and the Tokenizers

In [3]:
# Load the model
model = load_model('fra_to_eng.h5')

# Load the tokenizers
fra_tokenizer = pickle.load(open('fra_tokenizer.pkl', 'rb'))
eng_tokenizer = pickle.load(open('eng_tokenizer.pkl', 'rb'))

### Make Predictions

In [4]:
def translate():
    # Get user's input
    fra_input = input('Enter a French sentence: ')
    # Clean the text
    clean_fra_input = clean_text(fra_input)
    # Use the French tokenizer to encode the input text 
    encoded_input = fra_tokenizer.texts_to_sequences(clean_fra_input)
    # Create a list of integers
    # NOTE:  some input words may not exist in the tokenizer's vocabulary
    # if they did not appear in the training data
    encoded_input = [i[0] for i in encoded_input if i]
    # Pad with 0's until it matches the proper length of 14 tokens
    while len(encoded_input) < 14:
        encoded_input.append(0)
    # Convert to a Numpy array
    encoded_input = np.array(encoded_input)
    # Get the translation
    translation = get_translation(model, encoded_input, eng_tokenizer)
    print('Predicted English translation:', translation)

#### Test 1 - "Je suis heureux' ('I am happy')

In [5]:
translate()

Enter a French sentence: Je suis heureux
Predicted English translation: im happy


#### Test 2 - 'Salut, mon nom est Michael' ('Hi, my name is Michael')

In [6]:
translate()

Enter a French sentence: Salut, mon nom est Michael
Predicted English translation: wheres is catch


#### Test 3 - 'Il fait chaud aujourd'hui' ('It is hot today')

In [7]:
translate()

Enter a French sentence: Il fait chaud aujourd'hui
Predicted English translation: its is hot today


#### Test 4 - 'C'est marrant' ('This is fun')

In [8]:
translate()

Enter a French sentence: C'est marrant
Predicted English translation: its fun


#### Test 5 - 'Ce ne fonctionne pas très bien' ('It doesn't work very well')

In [9]:
translate()

Enter a French sentence: Ce ne fonctionne pas très bien
Predicted English translation: this curry is very


#### Test 6 - 'certains d'entre eux sont étranges' ('Some of them are weird')

In [12]:
translate()

Enter a French sentence: certains d'entre eux sont étranges
Predicted English translation: none of are died


#### Test 7 - 'Bonne journée' ('Have a nice day')

In [11]:
translate()

Enter a French sentence: Bonne journée
Predicted English translation: have a nice day
