# Spelling Correction using Deep Learning

Inspired by https://medium.com/@majortal/deep-spelling-9ffef96a24f6. Code can be found at https://github.com/MajorTal/DeepSpell/blob/master/keras_spell.py.

**Character Sequence to Sequence** code pulled from https://github.com/mdcramer/deep-learning/tree/master/seq2seq.

Environment initialization:
* open Acaconda terminal
* \>activate tensorflow
* \>jupyter notebook

When running on EC2 with Udactiy AMI:
* Fix email addresses
* \>source activate dl
* \>conda update --all
* \>pip install tensorflow-gpu==1.1 # Tensorflow v1.1 is required
* \>jupyter notebook

Useful commands:
* \>nohup python -u deep_speeling.py small > small_output.txt & # '2>nohup.err </dev/null' before '&' is optional
* \>nohup python -u deep_speeling.py > large_output.txt & # '2>nohup.err </dev/null' before '&' is optional
* \>jobs # list all nohup jobs
* \>ps -ef # list all running processes
* \>kill PID # kills process with specific PID
* \>watch -n 0.5 nvidia-smi # display GPU utilization
* \>rm -r mydir # removes directory

"I see that you have made three spelling mistakes." - Marquis de Favras, purportedly, upon the reading of his death warrant prior to be hanged in 1790.
<img src="images/MarquisdeFavras.jpg"/>

# Initialize global variables
**Make sure to run this cell first each time**

Continue to work with big data. Jump down to work with small data.

In [44]:
import os
import sys
import errno

# Global variable around input length
MIN_INPUT_LEN = 5 # minimum number of characters in a sentence
MAX_INPUT_LEN = 60 # maximum number of characters in a sentence

# Filenames
NEWS_FILE_NAME = os.path.join(os.path.expanduser("data"), "news.2013.en.shuffled") # uncompressed data file
NEWS_FILE_NAME_CLEAN = os.path.join(os.path.expanduser("data"), "news.2013.en.clean") # clean data file
NEWS_FILE_NAME_ENGLISH = os.path.join(os.path.expanduser("data"), "news.2013.en.english") # non-English removed
NEWS_FILE_NAME_FILTERED = os.path.join(os.path.expanduser("data"), "news.2013.en.filtered")
NEWS_FILE_NAME_TRAIN = os.path.join(os.path.expanduser("data"), "news.2013.en.train")
NEWS_FILE_NAME_VALIDATE = os.path.join(os.path.expanduser("data"), "news.2013.en.validate")

# Check for command line argument to use small data
print ("Command line args are: {}".format(str(sys.argv)))
small = 'small' in str(sys.argv)
# small = True # Use this to force small data. Comment out when running script.

if (small):
    print("Using the small data.")
    directory = "small_graph"
    # This is where the small graph is going to be saved and reloaded
    GRAPH_PARAMETERS = "small_graph/graph_params" # Filename for storing parameters associated with the graph    
    SOURCE_INT_TO_LETTER = "small_graph/sourceinttoletter.json" # Filename for INT to letter List for source sentences
    TARGET_INT_TO_LETTER = "small_graph/targetinttoletter.json" # Filename for INT to letter List for target sentences
    SOURCE_LETTER_TO_INT = "small_graph/sourcelettertoint.json" # Filename for letter to INT List for source sentences
    TARGET_LETTER_TO_INT = "small_graph/targetlettertoint.json" # Filename for letter to INT List for source sentences
    checkpoint = "./small_graph/best_model.ckpt"
else:
    print("Using the large data.")
    # This is where the large graph is going to be saved and reloaded
    directory = "large_graph"
    GRAPH_PARAMETERS = "large_graph/graph_params" # Filename for storing parameters associated with the graph    
    SOURCE_INT_TO_LETTER = "large_graph/sourceinttoletter.json" # Filename for INT to letter List for source sentences
    TARGET_INT_TO_LETTER = "large_graph/targetinttoletter.json" # Filename for INT to letter List for target sentences
    SOURCE_LETTER_TO_INT = "large_graph/sourcelettertoint.json" # Filename for letter to INT List for source sentences
    TARGET_LETTER_TO_INT = "large_graph/targetlettertoint.json" # Filename for letter to INT List for source sentences
    checkpoint = "./large_graph/best_model.ckpt"

# create directory for data, large or small, if it does not already exist
try:
    os.makedirs(directory)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise

Command line args are: ['C:\\Users\\mcram\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\ipykernel_launcher.py', '-f', 'C:\\Users\\mcram\\AppData\\Roaming\\jupyter\\runtime\\kernel-e966c798-3ff8-487d-9554-53c1d3adb58f.json']
Using the large data.


# Function for sending email updates from AWS

In [45]:
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

# AWS Config
EMAIL_HOST = 'email-smtp.us-west-2.amazonaws.com'
EMAIL_HOST_USER = 'AKIAJKVANBDPILI5UNYA'
EMAIL_HOST_PASSWORD = 'Ava4fqQT7ux9iz22ToSjFqvZB5mzHE/wzD3Ib4p/27VJ'
EMAIL_PORT = 587

def send_email(subject, message):

    # Do not upload to Github with real email addresses
    me = "m@mba.edu"
    you = ["m@alum.edu", "bf@gmail.com"]

    # Construct email
    msg = MIMEMultipart('alternative')
    msg['Subject'] = subject
    msg['From'] = me
    msg['To'] = ", ".join(you)
    msg.attach(MIMEText(message, 'plain'))

    # html = open('index.html').read()
    # mime_text = MIMEText(html, 'html')
    # msg.attach(mime_text)

    s = smtplib.SMTP(EMAIL_HOST, EMAIL_PORT)
    s.starttls()
    s.login(EMAIL_HOST_USER, EMAIL_HOST_PASSWORD)
    s.sendmail(me, you, msg.as_string()) # (from, to, message)
    s.quit()
    
    print("Email update sent.")

## Download raw data file from the internet and uncompress it

[One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling](https://research.google.com/pubs/pub41880.html)

**Start here if you are going to work with the big dataset** The dataset lives in the /data/ folder.

**Skip to below to work with the small dataset**

In [46]:
def download_raw_datafile():
    
    import errno
    import requests
    import gzip

    NEWS_FILE_NAME_COMPRESSED = os.path.join(os.path.expanduser("data"), "news.2013.en.shuffled.gz") # 1.1 GB file
    DATA_FILES_URL = "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz" # file location

    # create directory for data, if it does not already exist
    try:
        os.makedirs(os.path.dirname(NEWS_FILE_NAME_COMPRESSED))
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

    # check size of current data file
    try:
        current_size = os.path.getsize(NEWS_FILE_NAME_COMPRESSED)
    except:
        current_size = 0

    # check size of data file on internet
    response = requests.get(DATA_FILES_URL, stream=True)
    total_length = response.headers.get('content-length') # returns a str
    total_length = int(total_length)

    # download file if it is larger than the one already in the data directory
    if (total_length > current_size):
        print("Download compressed data file")
        with open(NEWS_FILE_NAME_COMPRESSED, "wb") as output_file: # open for writing in binary mode
            downloaded = percentage = 0
            print("»"*100)
            for data in response.iter_content(chunk_size=4096):
                downloaded += len(data)
                output_file.write(data)
                new_percentage = 100 * downloaded // total_length # // is floor divide
                if new_percentage > percentage:
                    print("o", end="") # end="" remove carriage return
                    percentage = new_percentage
        print() # add carriage return at the end of progress indicator
    else:
        print("Local copy of compressed data file is up to date.")

    # uncompress data
    if (os.path.isfile(NEWS_FILE_NAME_COMPRESSED[:-3])): # check to see if file already exists
        print("Data file is already uncompressed.")
    else:
        print("Uncompress data file.") # uncompress the file if it does not
        with gzip.open(NEWS_FILE_NAME_COMPRESSED, 'rb') as compressed_file:
            with open(NEWS_FILE_NAME_COMPRESSED[:-3], 'wb') as outfile: #2.5 GB file
                outfile.write(compressed_file.read())
        print("Data file uncompressed.")

## Clean the data
Takes the `news.2013.en.shuffled` as input and produces `news.2013.en.clean`.

In [47]:
def clean_data():
    
    import re

    NORMALIZE_WHITESPACE_REGEX = re.compile(r'[^\S\n]+', re.UNICODE) # match all whitespace except newlines
    RE_DASH_FILTER = re.compile(r'[\-\˗\֊\‐\‑\‒\–\—\⁻\₋\−\﹣\－]', re.UNICODE)
    RE_APOSTROPHE_FILTER = re.compile(r'&#39;|[ʼ՚＇‘’‛❛❜ߴߵ`‵´ˊˋ{}{}{}{}{}{}{}{}{}]'
                                      .format(chr(768), chr(769), chr(832), chr(833), chr(2387),
                                              chr(5151), chr(5152), chr(65344), chr(8242)), re.UNICODE)
    RE_LEFT_PARENTH_FILTER = re.compile(r'[\(\[\{\⁽\₍\❨\❪\﹙\（]', re.UNICODE)
    RE_RIGHT_PARENTH_FILTER = re.compile(r'[\)\]\}\⁾\₎\❩\❫\﹚\）]', re.UNICODE)
    ALLOWED_CURRENCIES = """¥£₪$€฿₨"""
    ALLOWED_PUNCTUATION = """-!?/;"'%&<>.()[]{}@#:,|=*""" # string.punctuation: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
    RE_BASIC_CLEANER = re.compile(r'[^\w\s{}{}]'
                                  .format(re.escape(ALLOWED_CURRENCIES), re.escape(ALLOWED_PUNCTUATION)), re.UNICODE)

    def file_len(fname):
        with open(fname, encoding="utf8") as f:
            for i, l in enumerate(f):
                pass
        return i + 1

    def clean_text(text):
        # Clean the text - remove unwanted chars, fold punctuation etc.
        result = NORMALIZE_WHITESPACE_REGEX.sub(' ', text.strip())
        result = RE_DASH_FILTER.sub('-', result)
        result = RE_APOSTROPHE_FILTER.sub("'", result)
        result = RE_LEFT_PARENTH_FILTER.sub("(", result)
        result = RE_RIGHT_PARENTH_FILTER.sub(")", result)
        result = RE_BASIC_CLEANER.sub('', result)
        return result

    if (os.path.isfile(NEWS_FILE_NAME_CLEAN)):
        print("Data file is already clean.")
    else:    
        print("Cleaning data file:")
        number_lines = file_len(NEWS_FILE_NAME)
        with open(NEWS_FILE_NAME_CLEAN, "wb") as clean_data:
            processed = percentage = 0
            for line in open(NEWS_FILE_NAME, encoding="utf8"):
                processed += 1
                # decoded_line = line.decode('utf-8') # https://stackoverflow.com/a/28583969/852795
                cleaned_line = clean_text(line)
                encoded_line = cleaned_line.encode("utf-8")
                clean_data.write(encoded_line + b"\n")
                new_percentage = 100 * processed // number_lines
                if (new_percentage > percentage):
                    print("{0:2d}".format(new_percentage), "%: ", line, end="")
                    percentage = new_percentage

## Remove non-English sentences
Use the package `langdetect` to identify and remove non-English sentences from the training set. Takes the `news.2013.en.clean` file as input and produces `news.2013.en.english`.

In [48]:
def remove_non_english():
    
    from langdetect import detect_langs
    import string
    
    if (os.path.isfile(NEWS_FILE_NAME_ENGLISH)):
        
        print("Non-English already removed from data file.")
        
    else:
        
        print("Removing non-English from data file:")
        num_lines = 0
        non_english = 0
        
        with open(NEWS_FILE_NAME_ENGLISH, "wb") as output_file:
            for line in open(NEWS_FILE_NAME_CLEAN, encoding="utf8"):
                
                num_lines += 1
                
                # detect language
                try: # detect_langs() will throw an error is text is a URL, email or something else 'undetectable'
                    langs = detect_langs(line)
                except:
                    langs = "" # remove line is detect_langs() throws an error
                    print("Error at line {:,} with {}".format(num_lines, line))
                    
                # write the line if it is English
                if ("en" in str(langs)): # if the sentence has any chance of being English...    
                    output_file.write(line.encode("utf8")) # ... write to the English file
                else:
                    non_english += 1
                    
                # periodically update progress
                if (num_lines % 100000 == 0):
                    print("{0:10,d}".format(num_lines), ": ", line, end="")
        
        # report results of process
        print("Finshed extracting non-English lines.")
        print("There were {} non-English lines among a total of {} lines.".format(non_english, num_lines))

## Reduce data set to most popular words
Maintain only sentences that contain the most popular words. Takes the `news.2013.en.clean` file as input and produces `news.2013.en.popular`.

In [49]:
def popular_words():

    from collections import Counter
    import string

    with open(NEWS_FILE_NAME, encoding="utf8") as f:
        wordcount = Counter(f.read().lower().translate(str.maketrans('','',string.punctuation))
                            .translate(str.maketrans('','','1234567890')).split())

    common_words = [word for word, _ in wordcount.most_common(50000)]

    i = 0
    for line in open(NEWS_FILE_NAME):
        if (i < 100):
            print(line, end="")
            print(set(line.lower().translate(str.maketrans('','',string.punctuation)).split()))
            print(set(line.lower().translate(str.maketrans('','',string.punctuation)).split()) - set(common_words))
            print()
        i += 1

    print("There are {:,} words in the vocabulary.".format(len(wordcount)))
    # print(common_words)

## Analyze the charaters

Get counts of all of the characters and select the top ones for processing and filter only sentences with the right charcters. Eliminate any sentences that are too small or too long.

Takes `news.2013.en.english` as input and produces `news.2013.en.filtered` and `news.2013.en.char_frequency.json`.

In [50]:
def analyze_characters():
    
    from collections import Counter
    import json

    NUMBER_OF_CHARS = 75 # Quantity of most popular characters to keep. Was 100 in original code.
    CHAR_FREQUENCY_FILE_NAME = os.path.join(os.path.expanduser("data"), "news.2013.en.char_frequency.json")

    # create character frequency file
    if (os.path.isfile(CHAR_FREQUENCY_FILE_NAME)):
        
        print("Character frequency file already created.")
        
    else:
        
        print("Creating character frequency file.")
        counter = Counter()
        print("Reading data file:")
        for line in open(NEWS_FILE_NAME_ENGLISH, encoding="utf8"):
            counter.update(line)
        print("Done. Writing to file:")
        with open(CHAR_FREQUENCY_FILE_NAME, 'wb') as output_file:
            output_file.write(json.dumps(counter).encode("utf-8"))
        most_popular_chars = {key for key, _value in counter.most_common(NUMBER_OF_CHARS)}

    # Read top characters that were saved to file
    chars = json.loads(open(CHAR_FREQUENCY_FILE_NAME).read())
    counter = Counter(chars)
    most_popular_chars = {key for key, _value in counter.most_common(NUMBER_OF_CHARS)}
    print("The top {} chars are:".format(NUMBER_OF_CHARS))
    print("".join(sorted(most_popular_chars)))

    # Filter only sentences with the right chars
    if (os.path.isfile(NEWS_FILE_NAME_FILTERED)):
        
        print("\nFiltered file already created.")
        
    else:
        
        print("\nReading and filtering data:")
        num_lines = 0
        with open(NEWS_FILE_NAME_FILTERED, "wb") as output_file:
            for line in open(NEWS_FILE_NAME_ENGLISH, encoding="utf8"):
                if line and (not bool(set(line) - most_popular_chars)) and (MAX_INPUT_LEN >= len(line) > MIN_INPUT_LEN):
                    output_file.write(line.encode("utf8"))
                    num_lines += 1
                    if (num_lines % 1000000 == 0):
                        print("{0:10,d}".format(num_lines), ": ", line, end="")
                        
        print("Done. Filtered file contains {:,} lines.".format(num_lines))

## Split the data into training and validation sets
Takes `news.2013.en.filtered` as input and produces `news.2013.en.train` and `news.2013.en.validate`.

In [51]:
def split_data():
    
    from numpy.random import shuffle as random_shuffle, seed as random_seed
    
    random_seed(123) # Reproducibility

    if (os.path.isfile(NEWS_FILE_NAME_TRAIN)):
        
        print("Training and Validation files already created.")
        
    else:
        
        answers = open(NEWS_FILE_NAME_FILTERED, encoding="utf8").read().split("\n")
        print('shuffle', end=" ")
        random_shuffle(answers)
        print("Done")
        # Explicitly set apart 10% for validation data that we never train over
        # TODO skip if files already exist
        split_at = len(answers) - len(answers) // 50 # 50 = leave 2% for validation
        with open(NEWS_FILE_NAME_TRAIN, "wb") as output_file:
            output_file.write("\n".join(answers[:split_at]).encode('utf-8'))
        with open(NEWS_FILE_NAME_VALIDATE, "wb") as output_file:
            output_file.write("\n".join(answers[split_at:]).encode('utf-8'))
        print("\nTraining and Validation files written.")

## Load the target data and generate source data by injecting mistakes

In [52]:
def add_noise_to_string(a_string, amount_of_noise): # Add artificial spelling mistakes to string    
    
    from numpy.random import choice as random_choice, randint as random_randint, rand

    CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")

    if rand() < amount_of_noise * len(a_string):
        # Replace a character with a random character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + random_choice(CHARS[:-1]) + a_string[random_char_position + 1:]
    if rand() < amount_of_noise * len(a_string):
        # Delete a character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + a_string[random_char_position + 1:]
    if len(a_string) < MAX_INPUT_LEN and rand() < amount_of_noise * len(a_string):
        # Add a random character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + random_choice(CHARS[:-1]) + a_string[random_char_position:]
    if rand() < amount_of_noise * len(a_string):
        # Transpose 2 characters
        random_char_position = random_randint(len(a_string) - 1)
        a_string = (a_string[:random_char_position] + a_string[random_char_position + 1] + 
                    a_string[random_char_position] + a_string[random_char_position + 2:])
    return a_string

In [53]:
def load_big_data():

    AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN

    #TODO save file with source_sentences so no need to recompute
    target_sentences = open(NEWS_FILE_NAME_TRAIN, encoding="utf8").read().split("\n")    
    source_sentences = open(NEWS_FILE_NAME_TRAIN, encoding="utf8").read().split("\n")
    for i in range(len(source_sentences)):
        source_sentences[i] = add_noise_to_string(source_sentences[i], AMOUNT_OF_NOISE)

    print('\nFirst 10 sentence:')
    for i in range (0, 10):
        print("\nSource --> " + source_sentences[i])
        print("Target --> " + target_sentences[i])
        
    # Datasets
    # Take a look at the initial source of target datasets
    print("\nThe source is comprised of {:,} sentences. Here are the first 10.".format(len(source_sentences)))
    print("\n".join(source_sentences[:10]))
    
    print("\nThe target is comprised of {:,} sentences. Here are the first 10.".format(len(target_sentences)))
    print("\n".join(target_sentences[:10]))
    
    return source_sentences, target_sentences

## Run this cell to get the tiny data set
**Start here if you are going to run with the small dataset** If you are using the big dataset, make sure to skip this.

In [54]:
# Run this cell to grab the small data sets that came with this model. Otherwise skip it.
# The dataset lives in the /data/ folder. At the moment, it is made up of the following files:
# letters_source.txt: The list of input letter sequences. Each sequence is its own line. 
# letters_target.txt: The list of target sequences we'll use in the training process.
# Each sequence here is a response to the input sequence in letters_source.txt with the same line number.

def load_small_data():
    
    import helper

    source_path = 'data/letters_source.txt'
    target_path = 'data/letters_target.txt'

    source_sentences = helper.load_data(source_path).split('\n') # added .split('\n) to be consistent with big data
    target_sentences = helper.load_data(target_path).split('\n')

    # source_sentences contains the entire input sequence file as text delimited by newline symbols.
    print("Source: {}".format(source_sentences[:10]))
    # target_sentences contains the entire output sequence file as text delimited by newline symbols.
    # Each line corresponds to the line from source_sentences. target_sentences contains sorted characters of the line.
    print("Target: {}".format(target_sentences[:10]))

    print("\nThe source is comprised of {:,} sentences.".format(len(source_sentences)))
    
    return source_sentences, target_sentences

# Load the Data - Big or Small
If the command line switch "small" is set then load the small data. Otherwise load the big data.

In [55]:
if (small):
    print("Load up the small data.")
    source_sentences, target_sentences = load_small_data()
else:
    print("Load up the big data.")
    download_raw_datafile()
    clean_data()
    remove_non_english()
    analyze_characters()
    split_data()
    source_sentences, target_sentences = load_big_data()

Load up the big data.
Local copy of compressed data file is up to date.
Data file is already uncompressed.
Data file is already clean.
Non-English already removed from data file.
Creating character frequency file.
Reading data file:
Done. Writing to file:
The top 75 chars are:

 "$'(),-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWYabcdefghijklmnopqrstuvwxyz£

Reading and filtering data:
 1,000,000 :  Matt Kvesic (Worcester)
 2,000,000 :  That is the way it is and we can't get away from that.
 3,000,000 :  It's an incredibly exciting moment.
 4,000,000 :  Parents can start with a few (non-judgmental) questions.
Done. Filtered file contains 4,246,705 lines.
shuffle Done

Training and Validation files written.

First 10 sentence:

Source --> Two sets of twins is mor ethan enough for anyone.
Target --> Two sets of twins is more than enough for anyone.

Source --> 'Customer Service', one star, reviewed on April 21, 2013
Target --> 'Customer Service', one star, reviewed on April 21, 2013

Source --

## Preprocess
To do anything useful with it, turn the each string into a list of characters. Then convert the characters to their int values as declared in the vocabulary.

In [56]:
import json

# Define global variables
source_int_to_letter = []
target_int_to_letter = []
source_letter_to_int = []
target_letter_to_int = []

def extract_character_vocab(data):
    special_words = ['<PAD>', '<UNK>', '<GO>',  '<EOS>']

    #set_words = set([character for line in data.split('\n') for character in line])
    set_words = set([character for line in data for character in line])
    int_to_vocab = {word_i: word for word_i, word in enumerate(special_words + list(set_words))}
    vocab_to_int = {word: word_i for word_i, word in int_to_vocab.items()}

    return int_to_vocab, vocab_to_int
    
def load_int_letter_translations(source, target):
    
    global source_int_to_letter, target_int_to_letter, source_letter_to_int, target_letter_to_int
    
    # Check to see if conversion files have already been created
    if (os.path.isfile(SOURCE_INT_TO_LETTER)):

        print()
        # Load up all of the conversion files
        with open(SOURCE_INT_TO_LETTER, 'r') as file:
            try:
                source_int_to_letter = json.load(file)
                print("Read {} data from file.".format(SOURCE_INT_TO_LETTER))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        source_int_to_letter = {int(k):v for k,v in source_int_to_letter.items()}
        with open(TARGET_INT_TO_LETTER, 'r') as file:
            try:
                target_int_to_letter = json.load(file)
                print("Read {} data from file.".format(TARGET_INT_TO_LETTER))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        target_int_to_letter = {int(k):v for k,v in target_int_to_letter.items()}
        with open(SOURCE_LETTER_TO_INT, 'r') as file:
            try:
                source_letter_to_int = json.load(file)
                print("Read {} data from file.".format(SOURCE_LETTER_TO_INT))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        source_letter_to_int = {k:int(v) for k,v in source_letter_to_int.items()}
        with open(TARGET_LETTER_TO_INT, 'r') as file:
            try:
                target_letter_to_int = json.load(file)
                print("Read {} data from file.".format(TARGET_LETTER_TO_INT))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        target_letter_to_int = {k:int(v) for k,v in target_letter_to_int.items()}

    else:

        # Build int2letter and letter2int dicts
        source_int_to_letter, source_letter_to_int = extract_character_vocab(source)
        target_int_to_letter, target_letter_to_int = extract_character_vocab(target)
        print("Source INT to letter: {}".format(source_int_to_letter))
        print("Target INT to letter: {}\n".format(target_int_to_letter))

        # Save source_int_to_letter, target_int_to_letter & source_letter_to_int for loading later after graph is saved
        with open(SOURCE_INT_TO_LETTER, 'w') as output_file:
            json.dump(source_int_to_letter, output_file)
        print("Wrote {} data to file.".format(SOURCE_INT_TO_LETTER))
        with open(TARGET_INT_TO_LETTER, 'w') as output_file:
            json.dump(target_int_to_letter, output_file)
        print("Wrote {} data to file.".format(TARGET_INT_TO_LETTER))
        with open(SOURCE_LETTER_TO_INT, 'w') as output_file:
            json.dump(source_letter_to_int, output_file)
        print("Wrote {} data to file.".format(SOURCE_LETTER_TO_INT))
        with open(TARGET_LETTER_TO_INT, 'w') as output_file:
            json.dump(target_letter_to_int, output_file)
        print("Wrote {} data to file.".format(TARGET_LETTER_TO_INT))

def produce_letter_ids(source, target):
    
    if (not source_int_to_letter):
        load_int_letter_translations(source, target)
    
    # Convert characters to ids
    source_ids = [[source_letter_to_int.get(letter, source_letter_to_int['<UNK>']) for letter in line] \
                         for line in source]
    target_ids = [[target_letter_to_int.get(letter, target_letter_to_int['<UNK>']) for letter in line] \
                         + [target_letter_to_int['<EOS>']] for line in target]
    
    return source_ids, target_ids

In [57]:
# Convert source and target sentences into IDs
source_letter_ids, target_letter_ids = produce_letter_ids(source_sentences, target_sentences)

print("\nExample source sequences")
print(source_letter_ids[:3])
print("\nExample target sequences")
print(target_letter_ids[:3])
print()

Source INT to letter: {0: '<PAD>', 1: '<UNK>', 2: '<GO>', 3: '<EOS>', 4: 'o', 5: "'", 6: '-', 7: 'G', 8: 'A', 9: 'c', 10: '5', 11: 'E', 12: '8', 13: 'Y', 14: 'v', 15: 'W', 16: 'F', 17: 'e', 18: 'r', 19: 'y', 20: 'O', 21: 's', 22: 'D', 23: '7', 24: 'x', 25: ' ', 26: 'X', 27: 'P', 28: 'p', 29: 'd', 30: '?', 31: 'K', 32: '£', 33: 'q', 34: 'w', 35: '2', 36: 'i', 37: 'I', 38: 'J', 39: ':', 40: 'B', 41: ';', 42: 'g', 43: 'V', 44: 't', 45: 'k', 46: 'h', 47: 'N', 48: 'a', 49: 'n', 50: ',', 51: '9', 52: 'j', 53: '(', 54: 'f', 55: 'T', 56: 'l', 57: '0', 58: 'b', 59: '3', 60: '"', 61: 'R', 62: 'Q', 63: 'u', 64: 'H', 65: '1', 66: 'm', 67: 'S', 68: 'M', 69: '/', 70: 'Z', 71: 'L', 72: '.', 73: '4', 74: 'C', 75: 'U', 76: 'z', 77: ')', 78: '$', 79: '6'}
Target INT to letter: {0: '<PAD>', 1: '<UNK>', 2: '<GO>', 3: '<EOS>', 4: 'o', 5: "'", 6: '-', 7: 'G', 8: 'A', 9: 'c', 10: '5', 11: 'E', 12: '8', 13: 'Y', 14: 'v', 15: 'W', 16: 'F', 17: 'e', 18: 'r', 19: 'y', 20: 'O', 21: 's', 22: 'D', 23: '7', 24: 'x',

In [58]:
print('\nFirst 10 sentence:')
for i in range (0, 10):
    print("\nSource --> {}".format(source_letter_ids[i]))
    print("Target --> {}".format(target_letter_ids[i]))


First 10 sentence:

Source --> [55, 34, 4, 25, 21, 17, 44, 21, 25, 4, 54, 25, 44, 34, 36, 49, 21, 25, 36, 21, 25, 66, 4, 18, 25, 17, 44, 46, 48, 49, 25, 17, 49, 4, 63, 42, 46, 25, 54, 4, 18, 25, 48, 49, 19, 4, 49, 17, 72]
Target --> [54, 33, 4, 25, 21, 17, 43, 21, 25, 4, 53, 25, 43, 33, 35, 48, 21, 25, 35, 21, 25, 65, 4, 18, 17, 25, 43, 45, 47, 48, 25, 17, 48, 4, 62, 41, 45, 25, 53, 4, 18, 25, 47, 48, 19, 4, 48, 17, 70, 3]

Source --> [5, 74, 63, 21, 44, 4, 66, 17, 18, 25, 67, 17, 18, 14, 36, 9, 17, 5, 50, 25, 4, 49, 17, 25, 21, 44, 48, 18, 50, 25, 18, 17, 14, 36, 17, 34, 17, 29, 25, 4, 49, 25, 8, 28, 18, 36, 56, 25, 35, 65, 50, 25, 35, 57, 65, 59]
Target --> [5, 72, 62, 21, 43, 4, 65, 17, 18, 25, 66, 17, 18, 14, 35, 9, 17, 5, 49, 25, 4, 48, 17, 25, 21, 43, 47, 18, 49, 25, 18, 17, 14, 35, 17, 33, 17, 28, 25, 4, 48, 25, 8, 27, 18, 35, 55, 25, 34, 64, 49, 25, 34, 56, 64, 58, 3]

Source --> [16, 4, 63, 49, 29, 17, 18, 50, 67, 25, 4, 63, 44, 46, 25, 22, 17, 14, 4, 49, 25, 67, 17, 48, 58, 

## Character Sequence to Sequence Model
This model was updated to work with TensorFlow 1.1 and builds on the work of Dave Currie. Check out Dave's post [Text Summarization with Amazon Reviews](https://medium.com/towards-data-science/text-summarization-with-amazon-reviews-41801c2210b).
<img src="images/sequence-to-sequence.jpg"/>
#### Check the Version of TensorFlow and wether or not there's a GPU

In [59]:
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    print('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.3.0
Default GPU Device: /gpu:0


### Hyperparameters

In [60]:
if (len(source_sentences) > 10000):
    
    # Using the big data (i.e. one billion word benchmark)
    print("Using hyperparameters for the big data with {:,} source sentences.".format(len(source_sentences)))
    epochs = 4       # Number of Epochs
    batch_size = 128 # Batch Size

    rnn_size = 512   # RNN Size
    num_layers = 2   # Number of Layers
    encoding_embedding_size = 512 # Encoding embedding Size
    decoding_embedding_size = 512 # Decoding embedding Size
    keep_probability = 0.7 # keep probability

    learning_rate = 0.001 # Learning Rate
    
else:
    
    # Using the small data (i.e. 10k source sentences)
    print("Using hyperparameters for the small data with {:,} source sentences.".format(len(source_sentences)))
    epochs = 60 # Number of Epochs (normally 60 but reduced to test retraining model)
    batch_size = 128 # Batch Size
    rnn_size = 50 # RNN Size    
    num_layers = 2 # Number of Layers    
    encoding_embedding_size = 15 # Embedding Size
    decoding_embedding_size = 15 # Embedding Size
    keep_probability = 0.7 # keep probability
    learning_rate = 0.001 # Learning Rate

def get_hyperparameters_message():
    message  = "Batch size: {}\n".format(batch_size)
    message += "RNN size  : {}\n".format(rnn_size)
    message += "Num layers: {}\n".format(num_layers)
    message += "Enc. size : {}\n".format(encoding_embedding_size)
    message += "Dec. size : {}\n".format(decoding_embedding_size)
    message += "Keep prob.: {}\n".format(keep_probability)
    message += "Learn rate: {}\n\n".format(learning_rate)
    return message

# Write batch_size to file for loading after graph has been saved
with open(GRAPH_PARAMETERS, 'w') as file:
  file.write('%d' % batch_size)

Using hyperparameters for the big data with 4,161,772 source sentences.


### Input

In [61]:
def get_model_inputs():
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    keep_probability = tf.placeholder(tf.float32,name='keep_prob')
    lr = tf.placeholder(tf.float32, name='learning_rate')

    target_sequence_length = tf.placeholder(tf.int32, (None,), name='target_sequence_length')
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name='max_target_len')
    source_sequence_length = tf.placeholder(tf.int32, (None,), name='source_sequence_length')
    
    return input_data, targets, keep_probability, lr, target_sequence_length, max_target_sequence_length, source_sequence_length

### Sequence to Sequence Model

We can now start defining the functions that will build the seq2seq model. We are building it from the bottom up with the following components:

    2.1 Encoder
        - Embedding
        - Encoder cell
    2.2 Decoder
        1- Process decoder inputs
        2- Set up the decoder
            - Embedding
            - Decoder cell
            - Dense output layer
            - Training decoder
            - Inference decoder
    2.3 Seq2seq model connecting the encoder and decoder
    2.4 Build the training graph hooking up the model with the 
        optimizer

### 2.1 Encoder

The first bit of the model we'll build is the encoder. Here, we'll embed the input data, construct our encoder, then pass the embedded data to the encoder.

- Embed the input data using [`tf.contrib.layers.embed_sequence`](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence)
<img src="images/embed_sequence.png" />

- Pass the embedded input into a stack of RNNs.  Save the RNN state and ignore the output.
<img src="images/encoder.png" />

In [62]:
def encoding_layer(input_data, rnn_size, num_layers, keep_prob, source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):

    # Encoder embedding
    enc_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)

    # RNN cell
    def make_cell(rnn_size):
        enc_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.contrib.layers.variance_scaling_initializer(seed=2))
        enc_cell = tf.contrib.rnn.DropoutWrapper(enc_cell, output_keep_prob=keep_prob)
        return enc_cell

    enc_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
    
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, enc_embed_input, 
                                              sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

## 2.2 Decoder

The decoder is probably the most involved part of this model. The following steps are needed to create it:

    1- Process decoder inputs
    2- Set up the decoder components
        - Embedding
        - Decoder cell
        - Dense output layer
        - Training decoder
        - Inference decoder


### Process Decoder Input


In the training process, the target sequences will be used in two different places:

 1. Using them to calculate the loss
 2. Feeding them to the decoder during training to make the model more robust.

Now we need to address the second point. Let's assume our targets look like this in their letter/word form (we're doing this for readibility. At this point in the code, these sequences would be in int form):


<img src="images/targets_1.png"/>

We need to do a simple transformation on the tensor before feeding it to the decoder:

1- We will feed an item of the sequence to the decoder at each time step. Think about the last timestep -- where the decoder outputs the final word in its output. The input to that step is the item before last from the target sequence. The decoder has no use for the last item in the target sequence in this scenario. So we'll need to remove the last item. 

We do that using tensorflow's tf.strided_slice() method. We hand it the tensor, and the index of where to start and where to end the cutting.

<img src="images/strided_slice_1.png"/>

2- The first item in each sequence we feed to the decoder has to be GO symbol. So We'll add that to the beginning.


<img src="images/targets_add_go.png"/>


Now the tensor is ready to be fed to the decoder. It looks like this (if we convert from ints to letters/symbols):

<img src="images/targets_after_processing_1.png"/>

In [63]:
# Process the input we'll feed to the decoder
def process_decoder_input(target_data, vocab_to_int, batch_size):
    '''Remove the last word id from each batch and concat the <GO> to the begining of each batch'''
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input


### Set up the decoder components

        - Embedding
        - Decoder cell
        - Dense output layer
        - Training decoder
        - Inference decoder

#### 1- Embedding
Now that we have prepared the inputs to the training decoder, we need to embed them so they can be ready to be passed to the decoder. 

We'll create an embedding matrix like the following then have tf.nn.embedding_lookup convert our input to its embedded equivalent:
<img src="images/embeddings.png" />

#### 2- Decoder Cell
Then we declare our decoder cell. Just like the encoder, we'll use an tf.contrib.rnn.LSTMCell here as well.

We need to declare a decoder for the training process, and a decoder for the inference/prediction process. These two decoders will share their parameters (so that all the weights and biases that are set during the training phase can be used when we deploy the model).

First, we'll need to define the type of cell we'll be using for our decoder RNNs. We opted for LSTM.

#### 3- Dense output layer
Before we move to declaring our decoders, we'll need to create the output layer, which will be a tensorflow.python.layers.core.Dense layer that translates the outputs of the decoder to logits that tell us which element of the decoder vocabulary the decoder is choosing to output at each time step.

#### 4- Training decoder
Essentially, we'll be creating two decoders which share their parameters. One for training and one for inference. The two are similar in that both created using tf.contrib.seq2seq.**BasicDecoder** and tf.contrib.seq2seq.**dynamic_decode**. They differ, however, in that we feed the the target sequences as inputs to the training decoder at each time step to make it more robust.

We can think of the training decoder as looking like this (except that it works with sequences in batches):
<img src="images/sequence-to-sequence-training-decoder.png"/>

The training decoder **does not** feed the output of each time step to the next. Rather, the inputs to the decoder time steps are the target sequence from the training dataset (the orange letters).

#### 5- Inference decoder
The inference decoder is the one we'll use when we deploy our model to the wild.

<img src="images/sequence-to-sequence-inference-decoder.png"/>

We'll hand our encoder hidden state to both the training and inference decoders and have it process its output. TensorFlow handles most of the logic for us. We just have to use the appropriate methods from tf.contrib.seq2seq and supply them with the appropriate inputs.


In [64]:
def decoding_layer(target_letter_to_int, decoding_embedding_size, num_layers, rnn_size, keep_prob,
                   target_sequence_length, max_target_sequence_length, enc_state, dec_input):
    
    # 1. Decoder Embedding
    target_vocab_size = len(target_letter_to_int)
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    # 2. Construct the decoder cell
    def make_cell(rnn_size):
        dec_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.contrib.layers.variance_scaling_initializer(seed=2))
        dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, output_keep_prob=keep_prob)
        return dec_cell

    dec_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
     
    # 3. Dense layer to translate the decoder's output at each time 
    # step into a choice from the target vocabulary
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.contrib.layers.variance_scaling_initializer(seed=2))

    # 4. Set up a training decoder and an inference decoder
    # Training Decoder
    with tf.variable_scope("decode"):

        # Helper for the training process. Used by BasicDecoder to read inputs.
        training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)

        # Basic decoder
        training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, enc_state, output_layer) 
        
        # Perform dynamic decoding using the decoder
        training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder, impute_finished=True, 
                                                                    maximum_iterations=max_target_sequence_length)[0]

    # 5. Inference Decoder
    # Reuses the same parameters trained by the training process
    with tf.variable_scope("decode", reuse=True):
        start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), 
                               [batch_size], name='start_tokens')

        # Helper for the inference process.
        inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, 
                                                                    start_tokens, 
                                                                    target_letter_to_int['<EOS>'])

        # Basic decoder
        inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, inference_helper, enc_state, output_layer)
        
        # Perform dynamic decoding using the decoder
        inference_decoder_output = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_target_sequence_length)[0]

    return training_decoder_output, inference_decoder_output

## 2.3 Seq2seq model 
Let's now go a step above, and hook up the encoder and decoder using the methods we just declared

In [65]:
def seq2seq_model(input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length,
                  source_vocab_size, target_vocab_size, enc_embedding_size, dec_embedding_size, rnn_size, num_layers, 
                  keep_prob):
    
    # Pass the input data through the encoder. We'll ignore the encoder output, but use the state
    _, enc_state = encoding_layer(input_data, 
                                  rnn_size, 
                                  num_layers,
                                  keep_prob,
                                  source_sequence_length,
                                  source_vocab_size, 
                                  encoding_embedding_size)

    # Prepare the target sequences we'll feed to the decoder in training mode
    dec_input = process_decoder_input(targets, target_letter_to_int, batch_size)
    
    # Pass encoder state and decoder inputs to the decoders
    training_decoder_output, inference_decoder_output = decoding_layer(target_letter_to_int, 
                                                                       decoding_embedding_size, 
                                                                       num_layers, 
                                                                       rnn_size,
                                                                       keep_prob,
                                                                       target_sequence_length,
                                                                       max_target_sequence_length,
                                                                       enc_state, 
                                                                       dec_input) 
    
    return training_decoder_output, inference_decoder_output

Model outputs *training_decoder_output* and *inference_decoder_output* both contain a 'rnn_output' logits tensor that looks like this:

<img src="images/logits.png"/>

The logits we get from the training tensor we'll pass to tf.contrib.seq2seq.**sequence_loss()** to calculate the loss and ultimately the gradient.




In [66]:
from tensorflow.python.layers.core import Dense

# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():
    
    # Load the model inputs    
    input_data, targets, keep_prob, lr, target_sequence_length, max_target_sequence_length, source_sequence_length \
    = get_model_inputs()
    
    # Create the training and inference logits
    training_decoder_output, inference_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(source_letter_to_int),
                                                                      len(target_letter_to_int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers,
                                                                      keep_prob)    
    
    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_decoder_output.rnn_output, 'logits')
    inference_logits = tf.identity(inference_decoder_output.sample_id, name='predictions')
    
    # Create the weights for sequence_loss
    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):

        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(training_logits, targets, masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
        
        # Add variables to collection in order to load them up when retraining a saved graph
        tf.add_to_collection("cost", cost)
        tf.add_to_collection("train_op", train_op)

## Get Batches

There's little processing involved when we retreive the batches. This is a simple example assuming batch_size = 2

Target sequences (it's actually in int form, we're showing the characters for clarity):

<img src="images/source_batch.png" />

Source sequences (also in int, but showing letters for clarity):

<img src="images/target_batch.png" />

In [67]:
import numpy as np

def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]

def get_batches(targets, sources, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))
        
        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))
        
        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))
        
        yield pad_targets_batch, pad_sources_batch, pad_targets_lengths, pad_source_lengths

## Training function
We're now ready to train our model. If you run into OOM (out of memory) issues during training, try to decrease the batch_size.

In [68]:
# Split data to training and validation sets
train_source = source_letter_ids[batch_size:]
train_target = target_letter_ids[batch_size:]
valid_source = source_letter_ids[:batch_size]
valid_target = target_letter_ids[:batch_size]
(valid_targets_batch, valid_sources_batch, valid_targets_lengths, valid_sources_lengths) \
= next(get_batches(valid_target, valid_source, batch_size, 
                   source_letter_to_int['<PAD>'], target_letter_to_int['<PAD>']))

if (len(source_sentences) > 10000):
    display_step = 100 # Check training loss after each of this many batches with large data
else:
    display_step = 20 # Check training loss after each of this many batches with small data

def train(epoch_i):
    
    global train_graph, train_op, cost, input_data, targets, lr
    global source_sequence_length, target_sequence_length, keep_prob
    
    # Test to see if graph already exists
    if os.path.exists(checkpoint + ".meta"):
        print("Reloading existing graph to continue training.")
        reloading = True    
        train_graph = tf.Graph()
    else:
        print("Starting with new graph.")
        reloading = False
        with train_graph.as_default():
            saver = tf.train.Saver()
    
    with tf.Session(graph=train_graph) as sess:    

        if reloading:
            saver = tf.train.import_meta_graph(checkpoint + '.meta')
            saver.restore(sess, checkpoint) 

            # Restore variables
            input_data = train_graph.get_tensor_by_name('input:0')
            targets = train_graph.get_tensor_by_name('targets:0')
            lr = train_graph.get_tensor_by_name('learning_rate:0')
            source_sequence_length = train_graph.get_tensor_by_name('source_sequence_length:0')
            target_sequence_length = train_graph.get_tensor_by_name('target_sequence_length:0')
            keep_prob = train_graph.get_tensor_by_name('keep_prob:0')

            # Grab the optimizer variables that were added to the collection during build
            cost = tf.get_collection("cost")[0]
            train_op = tf.get_collection("train_op")[0]

        else:
            sess.run(tf.global_variables_initializer())

        message = "" # Clear message to be sent in body of email
        
        for batch_i, (targets_batch, sources_batch, targets_lengths, sources_lengths) in enumerate(
                get_batches(train_target, train_source, batch_size,
                           source_letter_to_int['<PAD>'],
                           target_letter_to_int['<PAD>'])):

            # Training step
            _, loss = sess.run(
                [train_op, cost],
                {input_data: sources_batch,
                 targets: targets_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})

            batch = batch_i + 1 # batch_i starts at zero so batch is the batch number
            
            # Debug message updating us on the status of the training
            if (batch % display_step == 0 and batch > 0) or batch == (len(train_source) // batch_size):

                # Calculate validation cost
                validation_loss = sess.run(
                [cost],
                {input_data: valid_sources_batch,
                 targets: valid_targets_batch,
                 lr: learning_rate,
                 target_sequence_length: valid_targets_lengths,
                 source_sequence_length: valid_sources_lengths,
                 keep_prob: 1.0})

                line = 'Epoch {:>3}/{} Batch {:>6}/{} Inputs (000) {:>7} - Loss: {:>6.3f} - Validation loss: {:>6.3f}'\
                .format(epoch_i, epochs, batch, len(train_source) // batch_size, 
                        (((epoch_i - 1) * len(train_source)) + batch_i * batch_size) // 1000, loss, validation_loss[0])
                print(line)
                message += line + "\n"

        # Save model at the end of each epoch
        print("Saving graph...")
        saver.save(sess, checkpoint)
        
        return message # return message to be sent in body of email

## Prediction
**Start here to use a saved and pre-trained graph.** Load the saved graph and compute some preditions.

In [69]:
# Read batch_size from file
with open(GRAPH_PARAMETERS, 'r') as file:
    try:
        batch_size = int(file.read())
        print("Loaded batch_size = {}".format(batch_size))
    except ValueError:
        batch_size = 128
        print("Unable to load batch_size from file so using default 128.")
            
if (small):
    
    # There is no validation data for the small set, so just load up the data
    print("Load up the small data.")
    validation_source_sentences, validation_target_sentences = load_small_data()
    
else:
    
    # Load the validation set and construct the source sentences
    AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN

    validation_target_sentences = open(NEWS_FILE_NAME_VALIDATE, encoding="utf8").read().split("\n")
    validation_source_sentences = open(NEWS_FILE_NAME_VALIDATE, encoding="utf8").read().split("\n")
    # Reduce workload by grabbing first batches only
    # target_sentences = target_sentences[:5*batch_size]
    # source_sentences = source_sentences[:5*batch_size]
    
    # Add the random noise to the source
    for i in range(len(validation_source_sentences)):
        validation_source_sentences[i] = add_noise_to_string(validation_source_sentences[i], AMOUNT_OF_NOISE)
    
print("There are {:,d} validation sentences and {:,.0f} batches.".format(len(validation_source_sentences), 
                                                                         len(validation_source_sentences)//batch_size))
    
print('\nFirst 10 sentence:')
for i in range (0, 10):
    print("\nSource --> " + validation_source_sentences[i])
    print("Target --> " + validation_target_sentences[i])

Loaded batch_size = 128
There are 84,934 validation sentences and 663 batches.

First 10 sentence:

Source --> He's very important.
Target --> He's very important.

Source --> A mature business is a stagnant, dying business.
Target --> A mature business is a stagnant, dying business.

Source --> And tehre was no velvet ropev to keep them out.
Target --> And there was no velvet rope to keep them out.

Source --> Will stay on as Collingwood's leader: Eddie McGuire.
Target --> Will stay on as Collingwood's leader: Eddie McGuire.

Source --> The referendum on independence takes place next year.
Target --> The referendum on independence takes place next year.

Source --> Yuji Ramen Opens Campaign to Start a Restaurant
Target --> Yuji Ramen Opens Campaign to Start a Restaurant

Source --> How can prove, AustraliaO?
Target --> How can prove, Australia?

Source --> Thankfully, times have cRhanged.
Target --> Thankfully, times have changed.

Source --> SAP plans to hire aY"whole bunch" of peopl

In [70]:
# def source_to_seq(text, length):
#     '''Prepare the text for the model'''
# #     sequence_length = 7 # don't understand why set to 7
# #     sequence_length = 60
#     return [source_letter_to_int.get(word, source_letter_to_int['<UNK>']) for word in text] \
# + [source_letter_to_int['<PAD>']]*(length-len(text))

In [71]:
def get_accuracy(source_sentences, target_sentences):
    
    # Convert sentences to IDs
    source_letter_ids, target_letter_ids = produce_letter_ids(source_sentences, target_sentences)

    pad = source_letter_to_int["<PAD>"]
    eos = source_letter_to_int["<EOS>"]
    matches = 0
    total = 0
    display_step = 10

    loaded_graph = tf.Graph()
    with tf.Session(graph=loaded_graph) as sess:

        # Load saved model
        loader = tf.train.import_meta_graph(checkpoint + '.meta')
        loader.restore(sess, checkpoint)

        # Load graph variables
        input_data = loaded_graph.get_tensor_by_name('input:0')
        logits = loaded_graph.get_tensor_by_name('predictions:0')
        source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
        target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
        keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

        for batch_i,(targets_batch, sources_batch, targets_lengths, sources_lengths) \
        in enumerate(get_batches(target_letter_ids, source_letter_ids, batch_size, 
                                 source_letter_to_int['<PAD>'], target_letter_to_int['<PAD>'])):

            # Multiply by batch_size to match the model's input parameters
            answer_logits = sess.run(logits, {input_data: sources_batch, 
                                              target_sequence_length: targets_lengths, 
                                              source_sequence_length: sources_lengths,
                                              keep_prob: 1.0})

            for n in range(batch_size):
                answer = "".join([target_int_to_letter[i] for i in answer_logits[n] if (i != pad and i != eos)])
                target = target_sentences[batch_i * batch_size + n]
                total += 1
                if (answer == target):
                    matches += 1

            if batch_i % display_step == 0 and batch_i > 0:
                print('Batch {:>6}/{} - Accuracy: {:.1%}'.format(batch_i, 
                                                                 len(source_sentences)//batch_size, 
                                                                 matches/total))

        print("Final accuracy = {:.1%}\n".format(matches/total))
        
        return matches/total

# Train graph by looping through epochs
Compute accuracy after each epoch and return in email

In [72]:
import time
from boto.utils import get_instance_metadata

start = time.time()
metadata = get_instance_metadata(timeout=1.0, num_retries=2)

# Run through all the epoch, computing the accuracy after each and sending the results via email
for epoch_i in range(1, epochs + 1):
    
    message = get_hyperparameters_message()
    message += train(epoch_i)

    # Print time spent training the model
    end = time.time()
    seconds = end - start
    m, s = divmod(seconds, 60)
    h, m = divmod(m, 60)
    print("Model Trained in {}h:{}m:{}s and Saved".format(int(h), int(m), int(s)))
    message += "\nModel training for {}h:{}m:{}s and saved.".format(int(h), int(m), int(s))
    
    # Get current accuracy
    accuracy = get_accuracy(validation_source_sentences, validation_target_sentences)
    message += "\nCurrent accuracy = {:.1%}".format(accuracy)
    
    # Send email updates if using AWS
    if len(metadata.keys()) > 0:
        subject = "Completed training epoch {} - Accuracy = {:.1%}".format(epoch_i, accuracy)
        if(small):
            if(epoch_i % 10 == 0): # Only send email every 10 epoch when using small data
                send_email(subject, message)
        else: # Send an email after every epoch with large data
            send_email(subject, message)
    
print("\nTraining completed.")

Starting with new graph.
Epoch   1/4 Batch    100/32512 Inputs (000)      12 - Loss:  1.846 - Validation loss:  1.781
Epoch   1/4 Batch    200/32512 Inputs (000)      25 - Loss:  1.581 - Validation loss:  1.579
Epoch   1/4 Batch    300/32512 Inputs (000)      38 - Loss:  1.508 - Validation loss:  1.478
Epoch   1/4 Batch    400/32512 Inputs (000)      51 - Loss:  1.485 - Validation loss:  1.390
Epoch   1/4 Batch    500/32512 Inputs (000)      63 - Loss:  1.407 - Validation loss:  1.327
Epoch   1/4 Batch    600/32512 Inputs (000)      76 - Loss:  1.292 - Validation loss:  1.269
Epoch   1/4 Batch    700/32512 Inputs (000)      89 - Loss:  1.201 - Validation loss:  1.231
Epoch   1/4 Batch    800/32512 Inputs (000)     102 - Loss:  1.332 - Validation loss:  1.192
Epoch   1/4 Batch    900/32512 Inputs (000)     115 - Loss:  1.279 - Validation loss:  1.155
Epoch   1/4 Batch   1000/32512 Inputs (000)     127 - Loss:  1.199 - Validation loss:  1.127
Epoch   1/4 Batch   1100/32512 Inputs (000)  

Epoch   1/4 Batch   8900/32512 Inputs (000)    1139 - Loss:  0.223 - Validation loss:  0.157
Epoch   1/4 Batch   9000/32512 Inputs (000)    1151 - Loss:  0.217 - Validation loss:  0.149
Epoch   1/4 Batch   9100/32512 Inputs (000)    1164 - Loss:  0.238 - Validation loss:  0.157
Epoch   1/4 Batch   9200/32512 Inputs (000)    1177 - Loss:  0.200 - Validation loss:  0.154
Epoch   1/4 Batch   9300/32512 Inputs (000)    1190 - Loss:  0.226 - Validation loss:  0.151
Epoch   1/4 Batch   9400/32512 Inputs (000)    1203 - Loss:  0.228 - Validation loss:  0.146
Epoch   1/4 Batch   9500/32512 Inputs (000)    1215 - Loss:  0.232 - Validation loss:  0.144
Epoch   1/4 Batch   9600/32512 Inputs (000)    1228 - Loss:  0.220 - Validation loss:  0.145
Epoch   1/4 Batch   9700/32512 Inputs (000)    1241 - Loss:  0.200 - Validation loss:  0.145
Epoch   1/4 Batch   9800/32512 Inputs (000)    1254 - Loss:  0.232 - Validation loss:  0.142
Epoch   1/4 Batch   9900/32512 Inputs (000)    1267 - Loss:  0.235 - V

Epoch   1/4 Batch  17800/32512 Inputs (000)    2278 - Loss:  0.133 - Validation loss:  0.079
Epoch   1/4 Batch  17900/32512 Inputs (000)    2291 - Loss:  0.147 - Validation loss:  0.079
Epoch   1/4 Batch  18000/32512 Inputs (000)    2303 - Loss:  0.136 - Validation loss:  0.082
Epoch   1/4 Batch  18100/32512 Inputs (000)    2316 - Loss:  0.130 - Validation loss:  0.082
Epoch   1/4 Batch  18200/32512 Inputs (000)    2329 - Loss:  0.123 - Validation loss:  0.080
Epoch   1/4 Batch  18300/32512 Inputs (000)    2342 - Loss:  0.152 - Validation loss:  0.082
Epoch   1/4 Batch  18400/32512 Inputs (000)    2355 - Loss:  0.131 - Validation loss:  0.080
Epoch   1/4 Batch  18500/32512 Inputs (000)    2367 - Loss:  0.123 - Validation loss:  0.078
Epoch   1/4 Batch  18600/32512 Inputs (000)    2380 - Loss:  0.121 - Validation loss:  0.081
Epoch   1/4 Batch  18700/32512 Inputs (000)    2393 - Loss:  0.112 - Validation loss:  0.081
Epoch   1/4 Batch  18800/32512 Inputs (000)    2406 - Loss:  0.110 - V

Epoch   1/4 Batch  26700/32512 Inputs (000)    3417 - Loss:  0.081 - Validation loss:  0.066
Epoch   1/4 Batch  26800/32512 Inputs (000)    3430 - Loss:  0.106 - Validation loss:  0.065
Epoch   1/4 Batch  26900/32512 Inputs (000)    3443 - Loss:  0.129 - Validation loss:  0.062
Epoch   1/4 Batch  27000/32512 Inputs (000)    3455 - Loss:  0.105 - Validation loss:  0.065
Epoch   1/4 Batch  27100/32512 Inputs (000)    3468 - Loss:  0.114 - Validation loss:  0.066
Epoch   1/4 Batch  27200/32512 Inputs (000)    3481 - Loss:  0.097 - Validation loss:  0.061
Epoch   1/4 Batch  27300/32512 Inputs (000)    3494 - Loss:  0.110 - Validation loss:  0.059
Epoch   1/4 Batch  27400/32512 Inputs (000)    3507 - Loss:  0.131 - Validation loss:  0.057
Epoch   1/4 Batch  27500/32512 Inputs (000)    3519 - Loss:  0.097 - Validation loss:  0.059
Epoch   1/4 Batch  27600/32512 Inputs (000)    3532 - Loss:  0.113 - Validation loss:  0.060
Epoch   1/4 Batch  27700/32512 Inputs (000)    3545 - Loss:  0.103 - V

Epoch   2/4 Batch    200/32512 Inputs (000)    4187 - Loss:  0.076 - Validation loss:  0.052
Epoch   2/4 Batch    300/32512 Inputs (000)    4199 - Loss:  0.091 - Validation loss:  0.053
Epoch   2/4 Batch    400/32512 Inputs (000)    4212 - Loss:  0.081 - Validation loss:  0.054
Epoch   2/4 Batch    500/32512 Inputs (000)    4225 - Loss:  0.087 - Validation loss:  0.055
Epoch   2/4 Batch    600/32512 Inputs (000)    4238 - Loss:  0.092 - Validation loss:  0.056
Epoch   2/4 Batch    700/32512 Inputs (000)    4251 - Loss:  0.084 - Validation loss:  0.053
Epoch   2/4 Batch    800/32512 Inputs (000)    4263 - Loss:  0.092 - Validation loss:  0.051
Epoch   2/4 Batch    900/32512 Inputs (000)    4276 - Loss:  0.096 - Validation loss:  0.051
Epoch   2/4 Batch   1000/32512 Inputs (000)    4289 - Loss:  0.096 - Validation loss:  0.054
Epoch   2/4 Batch   1100/32512 Inputs (000)    4302 - Loss:  0.082 - Validation loss:  0.055
Epoch   2/4 Batch   1200/32512 Inputs (000)    4315 - Loss:  0.083 - V

Epoch   2/4 Batch   9100/32512 Inputs (000)    5326 - Loss:  0.077 - Validation loss:  0.050
Epoch   2/4 Batch   9200/32512 Inputs (000)    5339 - Loss:  0.073 - Validation loss:  0.047
Epoch   2/4 Batch   9300/32512 Inputs (000)    5351 - Loss:  0.079 - Validation loss:  0.045
Epoch   2/4 Batch   9400/32512 Inputs (000)    5364 - Loss:  0.080 - Validation loss:  0.043
Epoch   2/4 Batch   9500/32512 Inputs (000)    5377 - Loss:  0.078 - Validation loss:  0.043
Epoch   2/4 Batch   9600/32512 Inputs (000)    5390 - Loss:  0.078 - Validation loss:  0.043
Epoch   2/4 Batch   9700/32512 Inputs (000)    5403 - Loss:  0.082 - Validation loss:  0.044
Epoch   2/4 Batch   9800/32512 Inputs (000)    5415 - Loss:  0.085 - Validation loss:  0.046
Epoch   2/4 Batch   9900/32512 Inputs (000)    5428 - Loss:  0.083 - Validation loss:  0.044
Epoch   2/4 Batch  10000/32512 Inputs (000)    5441 - Loss:  0.069 - Validation loss:  0.046
Epoch   2/4 Batch  10100/32512 Inputs (000)    5454 - Loss:  0.076 - V

Epoch   2/4 Batch  18000/32512 Inputs (000)    6465 - Loss:  0.073 - Validation loss:  0.045
Epoch   2/4 Batch  18100/32512 Inputs (000)    6478 - Loss:  0.069 - Validation loss:  0.041
Epoch   2/4 Batch  18200/32512 Inputs (000)    6491 - Loss:  0.074 - Validation loss:  0.040
Epoch   2/4 Batch  18300/32512 Inputs (000)    6503 - Loss:  0.076 - Validation loss:  0.042
Epoch   2/4 Batch  18400/32512 Inputs (000)    6516 - Loss:  0.080 - Validation loss:  0.038
Epoch   2/4 Batch  18500/32512 Inputs (000)    6529 - Loss:  0.063 - Validation loss:  0.042
Epoch   2/4 Batch  18600/32512 Inputs (000)    6542 - Loss:  0.065 - Validation loss:  0.043
Epoch   2/4 Batch  18700/32512 Inputs (000)    6555 - Loss:  0.071 - Validation loss:  0.040
Epoch   2/4 Batch  18800/32512 Inputs (000)    6567 - Loss:  0.056 - Validation loss:  0.040
Epoch   2/4 Batch  18900/32512 Inputs (000)    6580 - Loss:  0.074 - Validation loss:  0.041
Epoch   2/4 Batch  19000/32512 Inputs (000)    6593 - Loss:  0.064 - V

Epoch   2/4 Batch  26900/32512 Inputs (000)    7604 - Loss:  0.084 - Validation loss:  0.038
Epoch   2/4 Batch  27000/32512 Inputs (000)    7617 - Loss:  0.075 - Validation loss:  0.039
Epoch   2/4 Batch  27100/32512 Inputs (000)    7630 - Loss:  0.069 - Validation loss:  0.038
Epoch   2/4 Batch  27200/32512 Inputs (000)    7643 - Loss:  0.064 - Validation loss:  0.035
Epoch   2/4 Batch  27300/32512 Inputs (000)    7655 - Loss:  0.069 - Validation loss:  0.037
Epoch   2/4 Batch  27400/32512 Inputs (000)    7668 - Loss:  0.086 - Validation loss:  0.037
Epoch   2/4 Batch  27500/32512 Inputs (000)    7681 - Loss:  0.067 - Validation loss:  0.035
Epoch   2/4 Batch  27600/32512 Inputs (000)    7694 - Loss:  0.074 - Validation loss:  0.038
Epoch   2/4 Batch  27700/32512 Inputs (000)    7707 - Loss:  0.060 - Validation loss:  0.035
Epoch   2/4 Batch  27800/32512 Inputs (000)    7719 - Loss:  0.069 - Validation loss:  0.037
Epoch   2/4 Batch  27900/32512 Inputs (000)    7732 - Loss:  0.049 - V

Epoch   3/4 Batch    400/32512 Inputs (000)    8374 - Loss:  0.053 - Validation loss:  0.036
Epoch   3/4 Batch    500/32512 Inputs (000)    8387 - Loss:  0.060 - Validation loss:  0.035
Epoch   3/4 Batch    600/32512 Inputs (000)    8399 - Loss:  0.060 - Validation loss:  0.037
Epoch   3/4 Batch    700/32512 Inputs (000)    8412 - Loss:  0.059 - Validation loss:  0.037
Epoch   3/4 Batch    800/32512 Inputs (000)    8425 - Loss:  0.061 - Validation loss:  0.036
Epoch   3/4 Batch    900/32512 Inputs (000)    8438 - Loss:  0.070 - Validation loss:  0.036
Epoch   3/4 Batch   1000/32512 Inputs (000)    8451 - Loss:  0.076 - Validation loss:  0.043
Epoch   3/4 Batch   1100/32512 Inputs (000)    8463 - Loss:  0.063 - Validation loss:  0.039
Epoch   3/4 Batch   1200/32512 Inputs (000)    8476 - Loss:  0.059 - Validation loss:  0.037
Epoch   3/4 Batch   1300/32512 Inputs (000)    8489 - Loss:  0.052 - Validation loss:  0.038
Epoch   3/4 Batch   1400/32512 Inputs (000)    8502 - Loss:  0.064 - V

Epoch   3/4 Batch   9300/32512 Inputs (000)    9513 - Loss:  0.065 - Validation loss:  0.039
Epoch   3/4 Batch   9400/32512 Inputs (000)    9526 - Loss:  0.054 - Validation loss:  0.036
Epoch   3/4 Batch   9500/32512 Inputs (000)    9539 - Loss:  0.076 - Validation loss:  0.036
Epoch   3/4 Batch   9600/32512 Inputs (000)    9551 - Loss:  0.056 - Validation loss:  0.037
Epoch   3/4 Batch   9700/32512 Inputs (000)    9564 - Loss:  0.063 - Validation loss:  0.036
Epoch   3/4 Batch   9800/32512 Inputs (000)    9577 - Loss:  0.056 - Validation loss:  0.033
Epoch   3/4 Batch   9900/32512 Inputs (000)    9590 - Loss:  0.072 - Validation loss:  0.034
Epoch   3/4 Batch  10000/32512 Inputs (000)    9603 - Loss:  0.052 - Validation loss:  0.034
Epoch   3/4 Batch  10100/32512 Inputs (000)    9615 - Loss:  0.058 - Validation loss:  0.032
Epoch   3/4 Batch  10200/32512 Inputs (000)    9628 - Loss:  0.053 - Validation loss:  0.034
Epoch   3/4 Batch  10300/32512 Inputs (000)    9641 - Loss:  0.051 - V

Epoch   3/4 Batch  18200/32512 Inputs (000)   10652 - Loss:  0.052 - Validation loss:  0.029
Epoch   3/4 Batch  18300/32512 Inputs (000)   10665 - Loss:  0.058 - Validation loss:  0.030
Epoch   3/4 Batch  18400/32512 Inputs (000)   10678 - Loss:  0.057 - Validation loss:  0.033
Epoch   3/4 Batch  18500/32512 Inputs (000)   10691 - Loss:  0.046 - Validation loss:  0.029
Epoch   3/4 Batch  18600/32512 Inputs (000)   10703 - Loss:  0.049 - Validation loss:  0.031
Epoch   3/4 Batch  18700/32512 Inputs (000)   10716 - Loss:  0.051 - Validation loss:  0.031
Epoch   3/4 Batch  18800/32512 Inputs (000)   10729 - Loss:  0.050 - Validation loss:  0.031
Epoch   3/4 Batch  18900/32512 Inputs (000)   10742 - Loss:  0.068 - Validation loss:  0.034
Epoch   3/4 Batch  19000/32512 Inputs (000)   10755 - Loss:  0.060 - Validation loss:  0.032
Epoch   3/4 Batch  19100/32512 Inputs (000)   10767 - Loss:  0.063 - Validation loss:  0.032
Epoch   3/4 Batch  19200/32512 Inputs (000)   10780 - Loss:  0.053 - V

Epoch   3/4 Batch  27100/32512 Inputs (000)   11791 - Loss:  0.058 - Validation loss:  0.031
Epoch   3/4 Batch  27200/32512 Inputs (000)   11804 - Loss:  0.058 - Validation loss:  0.032
Epoch   3/4 Batch  27300/32512 Inputs (000)   11817 - Loss:  0.056 - Validation loss:  0.032
Epoch   3/4 Batch  27400/32512 Inputs (000)   11830 - Loss:  0.071 - Validation loss:  0.030
Epoch   3/4 Batch  27500/32512 Inputs (000)   11843 - Loss:  0.058 - Validation loss:  0.033
Epoch   3/4 Batch  27600/32512 Inputs (000)   11855 - Loss:  0.057 - Validation loss:  0.031
Epoch   3/4 Batch  27700/32512 Inputs (000)   11868 - Loss:  0.053 - Validation loss:  0.029
Epoch   3/4 Batch  27800/32512 Inputs (000)   11881 - Loss:  0.054 - Validation loss:  0.029
Epoch   3/4 Batch  27900/32512 Inputs (000)   11894 - Loss:  0.047 - Validation loss:  0.030
Epoch   3/4 Batch  28000/32512 Inputs (000)   11907 - Loss:  0.061 - Validation loss:  0.031
Epoch   3/4 Batch  28100/32512 Inputs (000)   11919 - Loss:  0.060 - V

Epoch   4/4 Batch    600/32512 Inputs (000)   12561 - Loss:  0.050 - Validation loss:  0.028
Epoch   4/4 Batch    700/32512 Inputs (000)   12574 - Loss:  0.048 - Validation loss:  0.031
Epoch   4/4 Batch    800/32512 Inputs (000)   12587 - Loss:  0.049 - Validation loss:  0.030
Epoch   4/4 Batch    900/32512 Inputs (000)   12600 - Loss:  0.070 - Validation loss:  0.037
Epoch   4/4 Batch   1000/32512 Inputs (000)   12612 - Loss:  0.060 - Validation loss:  0.035
Epoch   4/4 Batch   1100/32512 Inputs (000)   12625 - Loss:  0.051 - Validation loss:  0.033
Epoch   4/4 Batch   1200/32512 Inputs (000)   12638 - Loss:  0.045 - Validation loss:  0.032
Epoch   4/4 Batch   1300/32512 Inputs (000)   12651 - Loss:  0.045 - Validation loss:  0.030
Epoch   4/4 Batch   1400/32512 Inputs (000)   12664 - Loss:  0.063 - Validation loss:  0.038
Epoch   4/4 Batch   1500/32512 Inputs (000)   12676 - Loss:  0.050 - Validation loss:  0.032
Epoch   4/4 Batch   1600/32512 Inputs (000)   12689 - Loss:  0.053 - V

Epoch   4/4 Batch   9500/32512 Inputs (000)   13700 - Loss:  0.048 - Validation loss:  0.029
Epoch   4/4 Batch   9600/32512 Inputs (000)   13713 - Loss:  0.048 - Validation loss:  0.030
Epoch   4/4 Batch   9700/32512 Inputs (000)   13726 - Loss:  0.050 - Validation loss:  0.028
Epoch   4/4 Batch   9800/32512 Inputs (000)   13739 - Loss:  0.055 - Validation loss:  0.027
Epoch   4/4 Batch   9900/32512 Inputs (000)   13752 - Loss:  0.059 - Validation loss:  0.028
Epoch   4/4 Batch  10000/32512 Inputs (000)   13764 - Loss:  0.044 - Validation loss:  0.028
Epoch   4/4 Batch  10100/32512 Inputs (000)   13777 - Loss:  0.051 - Validation loss:  0.028
Epoch   4/4 Batch  10200/32512 Inputs (000)   13790 - Loss:  0.045 - Validation loss:  0.029
Epoch   4/4 Batch  10300/32512 Inputs (000)   13803 - Loss:  0.051 - Validation loss:  0.029
Epoch   4/4 Batch  10400/32512 Inputs (000)   13816 - Loss:  0.056 - Validation loss:  0.031
Epoch   4/4 Batch  10500/32512 Inputs (000)   13828 - Loss:  0.047 - V

Epoch   4/4 Batch  18400/32512 Inputs (000)   14840 - Loss:  0.054 - Validation loss:  0.027
Epoch   4/4 Batch  18500/32512 Inputs (000)   14852 - Loss:  0.043 - Validation loss:  0.027
Epoch   4/4 Batch  18600/32512 Inputs (000)   14865 - Loss:  0.050 - Validation loss:  0.026
Epoch   4/4 Batch  18700/32512 Inputs (000)   14878 - Loss:  0.049 - Validation loss:  0.027
Epoch   4/4 Batch  18800/32512 Inputs (000)   14891 - Loss:  0.041 - Validation loss:  0.026
Epoch   4/4 Batch  18900/32512 Inputs (000)   14904 - Loss:  0.048 - Validation loss:  0.026
Epoch   4/4 Batch  19000/32512 Inputs (000)   14916 - Loss:  0.049 - Validation loss:  0.029
Epoch   4/4 Batch  19100/32512 Inputs (000)   14929 - Loss:  0.054 - Validation loss:  0.027
Epoch   4/4 Batch  19200/32512 Inputs (000)   14942 - Loss:  0.042 - Validation loss:  0.027
Epoch   4/4 Batch  19300/32512 Inputs (000)   14955 - Loss:  0.037 - Validation loss:  0.028
Epoch   4/4 Batch  19400/32512 Inputs (000)   14968 - Loss:  0.040 - V

Epoch   4/4 Batch  27300/32512 Inputs (000)   15979 - Loss:  0.051 - Validation loss:  0.028
Epoch   4/4 Batch  27400/32512 Inputs (000)   15992 - Loss:  0.065 - Validation loss:  0.027
Epoch   4/4 Batch  27500/32512 Inputs (000)   16004 - Loss:  0.045 - Validation loss:  0.027
Epoch   4/4 Batch  27600/32512 Inputs (000)   16017 - Loss:  0.051 - Validation loss:  0.027
Epoch   4/4 Batch  27700/32512 Inputs (000)   16030 - Loss:  0.043 - Validation loss:  0.028
Epoch   4/4 Batch  27800/32512 Inputs (000)   16043 - Loss:  0.047 - Validation loss:  0.027
Epoch   4/4 Batch  27900/32512 Inputs (000)   16056 - Loss:  0.041 - Validation loss:  0.028
Epoch   4/4 Batch  28000/32512 Inputs (000)   16068 - Loss:  0.052 - Validation loss:  0.032
Epoch   4/4 Batch  28100/32512 Inputs (000)   16081 - Loss:  0.047 - Validation loss:  0.029
Epoch   4/4 Batch  28200/32512 Inputs (000)   16094 - Loss:  0.036 - Validation loss:  0.031
Epoch   4/4 Batch  28300/32512 Inputs (000)   16107 - Loss:  0.046 - V