# Spelling Correction using Deep Learning

Inspired by https://medium.com/@majortal/deep-spelling-9ffef96a24f6. Code can be found at https://github.com/MajorTal/DeepSpell/blob/master/keras_spell.py.

**Character Sequence to Sequence** code pulled from https://github.com/mdcramer/deep-learning/tree/master/seq2seq.

Environment initialization:
* open Acaconda terminal
* \>activate tensorflow
* \>jupyter notebook
* Comment out email calls

When running on EC2 with Udactiy AMI:
* Fix email addresses
* \>source activate dl
* \>conda update --all
* \>pip install tensorflow-gpu==1.1 # Tensorflow v1.1 is required
* \>jupyter notebook

Useful commands:
* \>nohup python -u deep_speeling.py small > small_output.txt & # '2>nohup.err </dev/null' before '&' is optional
* \>nohup python -u deep_speeling.py > large_output.txt & # '2>nohup.err </dev/null' before '&' is optional
* \>jobs # list all nohup jobs
* \>ps -ef # list all running processes
* \>kill PID # kills process with specific PID
* \>watch -n 0.5 nvidia-smi # display GPU utilization
* \>rm -r mydir # removes directory

"I see that you have made three spelling mistakes." - Marquis de Favras, purportedly, upon the reading of his death warrant prior to be hanged in 1790.
<img src="images/MarquisdeFavras.jpg"/>

# Initialize global variables
**Make sure to run this cell first each time**

Continue to work with big data. Jump down to work with small data.

In [181]:
import os
import sys
import errno

# Global variable around input length
MIN_INPUT_LEN = 5 # minimum number of characters in a sentence
MAX_INPUT_LEN = 60 # maximum number of characters in a sentence

# Filenames
NEWS_FILE_NAME = os.path.join(os.path.expanduser("data"), "news.2013.en.shuffled") # uncompressed data file
NEWS_FILE_NAME_CLEAN = os.path.join(os.path.expanduser("data"), "news.2013.en.clean") # clean data file
NEWS_FILE_NAME_FILTERED = os.path.join(os.path.expanduser("data"), "news.2013.en.filtered")
NEWS_FILE_NAME_TRAIN = os.path.join(os.path.expanduser("data"), "news.2013.en.train")
NEWS_FILE_NAME_VALIDATE = os.path.join(os.path.expanduser("data"), "news.2013.en.validate")

# Check for command line argument to use small data
print ("Command line args are: {}".format(str(sys.argv)))
small = 'small' in str(sys.argv)
# small = True # Use this to force small data. Comment out when running script.

if (small):
    print("Using the small data.")
    directory = "small_graph"
    # This is where the small graph is going to be saved and reloaded
    GRAPH_PARAMETERS = "small_graph/graph_params" # Filename for storing parameters associated with the graph    
    SOURCE_INT_TO_LETTER = "small_graph/sourceinttoletter.json" # Filename for INT to letter List for source sentences
    TARGET_INT_TO_LETTER = "small_graph/targetinttoletter.json" # Filename for INT to letter List for target sentences
    SOURCE_LETTER_TO_INT = "small_graph/sourcelettertoint.json" # Filename for letter to INT List for source sentences
    TARGET_LETTER_TO_INT = "small_graph/targetlettertoint.json" # Filename for letter to INT List for source sentences
    checkpoint = "./small_graph/best_model.ckpt"
else:
    print("Using the large data.")
    # This is where the large graph is going to be saved and reloaded
    directory = "large_graph"
    GRAPH_PARAMETERS = "large_graph/graph_params" # Filename for storing parameters associated with the graph    
    SOURCE_INT_TO_LETTER = "large_graph/sourceinttoletter.json" # Filename for INT to letter List for source sentences
    TARGET_INT_TO_LETTER = "large_graph/targetinttoletter.json" # Filename for INT to letter List for target sentences
    SOURCE_LETTER_TO_INT = "large_graph/sourcelettertoint.json" # Filename for letter to INT List for source sentences
    TARGET_LETTER_TO_INT = "large_graph/targetlettertoint.json" # Filename for letter to INT List for source sentences
    checkpoint = "./large_graph/best_model.ckpt"

# create directory for data, large or small, if it does not already exist
try:
    os.makedirs(directory)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise

Command line args are: ['C:\\Users\\mcram\\Anaconda3\\envs\\tensorflow\\lib\\site-packages\\ipykernel_launcher.py', '-f', 'C:\\Users\\mcram\\AppData\\Roaming\\jupyter\\runtime\\kernel-3a6273f1-1985-4227-83e6-0d4322d55d1e.json']
Using the large data.


# Function for sending email updates from AWS

In [182]:
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

# AWS Config
EMAIL_HOST = 'email-smtp.us-west-2.amazonaws.com'
EMAIL_HOST_USER = 'AKIAJKVANBDPILI5UNYA'
EMAIL_HOST_PASSWORD = 'Ava4fqQT7ux9iz22ToSjFqvZB5mzHE/wzD3Ib4p/27VJ'
EMAIL_PORT = 587

def send_email(subject, message):

    # Do not upload to Github with real email addresses
    me = "m@mba.edu"
    you = ["m@alum.edu", "bf@gmail.com"]

    # Construct email
    msg = MIMEMultipart('alternative')
    msg['Subject'] = subject
    msg['From'] = me
    msg['To'] = ", ".join(you)
    msg.attach(MIMEText(message, 'plain'))

    # html = open('index.html').read()
    # mime_text = MIMEText(html, 'html')
    # msg.attach(mime_text)

    s = smtplib.SMTP(EMAIL_HOST, EMAIL_PORT)
    s.starttls()
    s.login(EMAIL_HOST_USER, EMAIL_HOST_PASSWORD)
    s.sendmail(me, you, msg.as_string()) # (from, to, message)
    s.quit()
    
    print("Email update sent.")

## Download raw data file from the internet and uncompress it

[One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling](https://research.google.com/pubs/pub41880.html)

**Start here if you are going to work with the big dataset** The dataset lives in the /data/ folder.

**Skip to below to work with the small dataset**

In [183]:
def download_raw_datafile():
    
    import errno
    import requests
    import gzip

    NEWS_FILE_NAME_COMPRESSED = os.path.join(os.path.expanduser("data"), "news.2013.en.shuffled.gz") # 1.1 GB file
    DATA_FILES_URL = "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz" # file location

    # create directory for data, if it does not already exist
    try:
        os.makedirs(os.path.dirname(NEWS_FILE_NAME_COMPRESSED))
    except OSError as exception:
        if exception.errno != errno.EEXIST:
            raise

    # check size of current data file
    try:
        current_size = os.path.getsize(NEWS_FILE_NAME_COMPRESSED)
    except:
        current_size = 0

    # check size of data file on internet
    response = requests.get(DATA_FILES_URL, stream=True)
    total_length = response.headers.get('content-length') # returns a str
    total_length = int(total_length)

    # download file if it is larger than the one already in the data directory
    if (total_length > current_size):
        print("Download compressed data file")
        with open(NEWS_FILE_NAME_COMPRESSED, "wb") as output_file: # open for writing in binary mode
            downloaded = percentage = 0
            print("»"*100)
            for data in response.iter_content(chunk_size=4096):
                downloaded += len(data)
                output_file.write(data)
                new_percentage = 100 * downloaded // total_length # // is floor divide
                if new_percentage > percentage:
                    print("o", end="") # end="" remove carriage return
                    percentage = new_percentage
        print() # add carriage return at the end of progress indicator
    else:
        print("Local copy of compressed data file is up to date.")

    # uncompress data
    if (os.path.isfile(NEWS_FILE_NAME_COMPRESSED[:-3])): # check to see if file already exists
        print("Data file is already uncompressed.")
    else:
        print("Uncompress data file.") # uncompress the file if it does not
        with gzip.open(NEWS_FILE_NAME_COMPRESSED, 'rb') as compressed_file:
            with open(NEWS_FILE_NAME_COMPRESSED[:-3], 'wb') as outfile: #2.5 GB file
                outfile.write(compressed_file.read())
        print("Data file uncompressed.")

## Clean the data
Takes the `news.2013.en.clean` and input and produces `news.2013.en.shuffled`.

In [184]:
def clean_data():
    
    import re

    NORMALIZE_WHITESPACE_REGEX = re.compile(r'[^\S\n]+', re.UNICODE) # match all whitespace except newlines
    RE_DASH_FILTER = re.compile(r'[\-\˗\֊\‐\‑\‒\–\—\⁻\₋\−\﹣\－]', re.UNICODE)
    RE_APOSTROPHE_FILTER = re.compile(r'&#39;|[ʼ՚＇‘’‛❛❜ߴߵ`‵´ˊˋ{}{}{}{}{}{}{}{}{}]'
                                      .format(chr(768), chr(769), chr(832), chr(833), chr(2387),
                                              chr(5151), chr(5152), chr(65344), chr(8242)), re.UNICODE)
    RE_LEFT_PARENTH_FILTER = re.compile(r'[\(\[\{\⁽\₍\❨\❪\﹙\（]', re.UNICODE)
    RE_RIGHT_PARENTH_FILTER = re.compile(r'[\)\]\}\⁾\₎\❩\❫\﹚\）]', re.UNICODE)
    ALLOWED_CURRENCIES = """¥£₪$€฿₨"""
    ALLOWED_PUNCTUATION = """-!?/;"'%&<>.()[]{}@#:,|=*"""
    RE_BASIC_CLEANER = re.compile(r'[^\w\s{}{}]'
                                  .format(re.escape(ALLOWED_CURRENCIES), re.escape(ALLOWED_PUNCTUATION)), re.UNICODE)

    def file_len(fname):
        with open(fname, encoding="utf8") as f:
            for i, l in enumerate(f):
                pass
        return i + 1

    def clean_text(text):
        # Clean the text - remove unwanted chars, fold punctuation etc.
        result = NORMALIZE_WHITESPACE_REGEX.sub(' ', text.strip())
        result = RE_DASH_FILTER.sub('-', result)
        result = RE_APOSTROPHE_FILTER.sub("'", result)
        result = RE_LEFT_PARENTH_FILTER.sub("(", result)
        result = RE_RIGHT_PARENTH_FILTER.sub(")", result)
        result = RE_BASIC_CLEANER.sub('', result)
        return result

    if (os.path.isfile(NEWS_FILE_NAME_CLEAN)):
        print("Data file is already clean.")
    else:    
        print("Clean data file:")
        number_lines = file_len(NEWS_FILE_NAME)
        with open(NEWS_FILE_NAME_CLEAN, "wb") as clean_data:
            processed = percentage = 0
            for line in open(NEWS_FILE_NAME, encoding="utf8"):
                processed += 1
                # decoded_line = line.decode('utf-8') # https://stackoverflow.com/a/28583969/852795
                cleaned_line = clean_text(line)
                encoded_line = cleaned_line.encode("utf-8")
                clean_data.write(encoded_line + b"\n")
                new_percentage = 100 * processed // number_lines
                if (new_percentage > percentage):
                    print("{0:2d}".format(new_percentage), "%: ", line, end="")
                    percentage = new_percentage

## Analyze the charaters

Get counts of all of the characters and select the top ones for processing and filter only sentences with the right charcters. Eliminate any sentences that are too small or too long.

Takes `news.2013.en.shuffled` as input and produces `news.2013.en.filtered` and `news.2013.en.char_frequency.json`.

In [185]:
def analyze_characters():
    
    from collections import Counter
    import json

    NUMBER_OF_CHARS = 75 # Quantity of most popular characters to keep. Was 100 in original code.
    CHAR_FREQUENCY_FILE_NAME = os.path.join(os.path.expanduser("data"), "news.2013.en.char_frequency.json")

    # create character frequency file
    if (os.path.isfile(CHAR_FREQUENCY_FILE_NAME)):
        print("Character frequency file already created.")
    else:
        counter = Counter()
        print("Reading data file:")
        for line in open(NEWS_FILE_NAME_CLEAN, encoding="utf8"):
            counter.update(line)
        print("Done. Writing to file:")
        with open(CHAR_FREQUENCY_FILE_NAME, 'wb') as output_file:
            output_file.write(json.dumps(counter).encode("utf-8"))
        most_popular_chars = {key for key, _value in counter.most_common(NUMBER_OF_CHARS)}

    # Read top characters that were saved to file
    chars = json.loads(open(CHAR_FREQUENCY_FILE_NAME).read())
    counter = Counter(chars)
    most_popular_chars = {key for key, _value in counter.most_common(NUMBER_OF_CHARS)}
    print("The top %s chars are:", NUMBER_OF_CHARS)
    print("".join(sorted(most_popular_chars)))

    # Filter only sentences with the right chars
    if (os.path.isfile(NEWS_FILE_NAME_FILTERED)):
        print("\nFiltered file already created.")
    else:
        print("\nReading and filtering data:")
        num_lines = 0
        with open(NEWS_FILE_NAME_FILTERED, "wb") as output_file:
            for line in open(NEWS_FILE_NAME_CLEAN, encoding="utf8"):
                if line and (not bool(set(line) - most_popular_chars)) and (MAX_INPUT_LEN >= len(line) > MIN_INPUT_LEN):
                    output_file.write(line.encode("utf8"))
                    num_lines += 1
                    if (num_lines % 1000000 == 0):
                        print("{0:10,d}".format(num_lines), ": ", line, end="")
        print("Done. Filtered file contains {:,} lines.".format(num_lines))

## Split the data into training and validation sets
Takes `news.2013.en.filtered` as input and produces `news.2013.en.train` and `news.2013.en.validate`.


In [186]:
def split_data():
    
    from numpy.random import shuffle as random_shuffle, seed as random_seed
    
    random_seed(123) # Reproducibility

    if (os.path.isfile(NEWS_FILE_NAME_TRAIN)):
        print("Training and Validation files already created.")
    else:
        answers = open(NEWS_FILE_NAME_FILTERED, encoding="utf8").read().split("\n")
        print('shuffle', end=" ")
        random_shuffle(answers)
        print("Done")
        # Explicitly set apart 10% for validation data that we never train over
        # TODO skip if files already exist
        split_at = len(answers) - len(answers) // 10
        with open(NEWS_FILE_NAME_TRAIN, "wb") as output_file:
            output_file.write("\n".join(answers[:split_at]).encode('utf-8'))
        with open(NEWS_FILE_NAME_VALIDATE, "wb") as output_file:
            output_file.write("\n".join(answers[split_at:]).encode('utf-8'))
        print("\nTraining and Validation files written.")

## Load the target data and generate source data by injecting mistakes

In [187]:
def add_noise_to_string(a_string, amount_of_noise): # Add artificial spelling mistakes to string    
    
    from numpy.random import choice as random_choice, randint as random_randint, rand

    CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")

    if rand() < amount_of_noise * len(a_string):
        # Replace a character with a random character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + random_choice(CHARS[:-1]) + a_string[random_char_position + 1:]
    if rand() < amount_of_noise * len(a_string):
        # Delete a character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + a_string[random_char_position + 1:]
    if len(a_string) < MAX_INPUT_LEN and rand() < amount_of_noise * len(a_string):
        # Add a random character
        random_char_position = random_randint(len(a_string))
        a_string = a_string[:random_char_position] + random_choice(CHARS[:-1]) + a_string[random_char_position:]
    if rand() < amount_of_noise * len(a_string):
        # Transpose 2 characters
        random_char_position = random_randint(len(a_string) - 1)
        a_string = (a_string[:random_char_position] + a_string[random_char_position + 1] + 
                    a_string[random_char_position] + a_string[random_char_position + 2:])
    return a_string

In [188]:
def load_big_data():

    AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN

    #TODO save file with source_sentences so no need to recompute
    target_sentences = open(NEWS_FILE_NAME_TRAIN, encoding="utf8").read().split("\n")    
    source_sentences = open(NEWS_FILE_NAME_TRAIN, encoding="utf8").read().split("\n")
    for i in range(len(source_sentences)):
        source_sentences[i] = add_noise_to_string(source_sentences[i], AMOUNT_OF_NOISE)

    print('\nFirst 10 sentence:')
    for i in range (0, 10):
        print("\nSource --> " + source_sentences[i])
        print("Target --> " + target_sentences[i])
        
    # Datasets
    # Take a look at the initial source of target datasets
    print("\nThe source is comprised of {:,} sentences. Here are the first 10.".format(len(source_sentences)))
    print("\n".join(source_sentences[:10]))
    
    print("\nThe target is comprised of {:,} sentences. Here are the first 10.".format(len(target_sentences)))
    print("\n".join(target_sentences[:10]))
    
    return source_sentences, target_sentences

## Run this cell to get the tiny data set
**Start here if you are going to run with the small dataset** If you are using the big dataset, make sure to skip this.

In [189]:
# Run this cell to grab the small data sets that came with this model. Otherwise skip it.
# The dataset lives in the /data/ folder. At the moment, it is made up of the following files:
# letters_source.txt: The list of input letter sequences. Each sequence is its own line. 
# letters_target.txt: The list of target sequences we'll use in the training process.
# Each sequence here is a response to the input sequence in letters_source.txt with the same line number.

def load_small_data():
    
    import helper

    source_path = 'data/letters_source.txt'
    target_path = 'data/letters_target.txt'

    source_sentences = helper.load_data(source_path).split('\n') # added .split('\n) to be consistent with big data
    target_sentences = helper.load_data(target_path).split('\n')

    # source_sentences contains the entire input sequence file as text delimited by newline symbols.
    print("Source: {}".format(source_sentences[:10]))
    # target_sentences contains the entire output sequence file as text delimited by newline symbols.
    # Each line corresponds to the line from source_sentences. target_sentences contains sorted characters of the line.
    print("Target: {}".format(target_sentences[:10]))

    print("\nThe source is comprised of {:,} sentences.".format(len(source_sentences)))
    
    return source_sentences, target_sentences

# Load the Data - Big of Small
If the command line switch "small" is set then load the small data. Otherwise load the big data.

In [190]:
if (small):
    print("Load up the small data.")
    source_sentences, target_sentences = load_small_data()
else:
    print("Load up the big data.")
    download_raw_datafile()
    clean_data()
    analyze_characters()
    split_data()
    source_sentences, target_sentences = load_big_data()

Load up the big data.
Local copy of compressed data file is up to date.
Data file is already uncompressed.
Data file is already clean.
Reading data file:
Done. Writing to file:
The top %s chars are: 75

 "$'(),-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWYabcdefghijklmnopqrstuvwxyzé

Reading and filtering data:
 1,000,000 :  It's on to Rooty Hill for Team Abbott.
 2,000,000 :  PMMA pills are slower to take effect.
 3,000,000 :  It compliments hydropower, which is seasonal.
 4,000,000 :  The High Line's Next Neighbor
Done. Filtered file contains 4,615,704 lines.
shuffle Done

Training and Validation files written.

First 10 sentence:

Source --> But predicting the region has an honourale pedigrey.
Target --> But predicting the region has an honourable pedigree.

Source --> February 18 - Paolo Di Canio (Swindon)
Target --> February 18 - Paolo Di Canio (Swindon)

Source --> He's such an charming guy.
Target --> He's such an charming guy.

Source --> Taliban attack U.N. comnound in Afghan capita

## Preprocess
To do anything useful with it, turn the each string into a list of characters. Then convert the characters to their int values as declared in the vocabulary.

In [191]:
import json

# Define global variables
source_int_to_letter = []
target_int_to_letter = []
source_letter_to_int = []
target_letter_to_int = []

def extract_character_vocab(data):
    special_words = ['<PAD>', '<UNK>', '<GO>',  '<EOS>']

    #set_words = set([character for line in data.split('\n') for character in line])
    set_words = set([character for line in data for character in line])
    int_to_vocab = {word_i: word for word_i, word in enumerate(special_words + list(set_words))}
    vocab_to_int = {word: word_i for word_i, word in int_to_vocab.items()}

    return int_to_vocab, vocab_to_int
    
def load_int_letter_translations(source, target):
    
    global source_int_to_letter, target_int_to_letter, source_letter_to_int, target_letter_to_int
    
    # Check to see if conversion files have already been created
    if (os.path.isfile(SOURCE_INT_TO_LETTER)):

        print()
        # Load up all of the conversion files
        with open(SOURCE_INT_TO_LETTER, 'r') as file:
            try:
                source_int_to_letter = json.load(file)
                print("Read {} data from file.".format(SOURCE_INT_TO_LETTER))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        source_int_to_letter = {int(k):v for k,v in source_int_to_letter.items()}
        with open(TARGET_INT_TO_LETTER, 'r') as file:
            try:
                target_int_to_letter = json.load(file)
                print("Read {} data from file.".format(TARGET_INT_TO_LETTER))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        target_int_to_letter = {int(k):v for k,v in target_int_to_letter.items()}
        with open(SOURCE_LETTER_TO_INT, 'r') as file:
            try:
                source_letter_to_int = json.load(file)
                print("Read {} data from file.".format(SOURCE_LETTER_TO_INT))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        source_letter_to_int = {k:int(v) for k,v in source_letter_to_int.items()}
        with open(TARGET_LETTER_TO_INT, 'r') as file:
            try:
                target_letter_to_int = json.load(file)
                print("Read {} data from file.".format(TARGET_LETTER_TO_INT))
            except ValueError: # if the file is empty the ValueError will be thrown
                data = {}
        target_letter_to_int = {k:int(v) for k,v in target_letter_to_int.items()}

    else:

        # Build int2letter and letter2int dicts
        source_int_to_letter, source_letter_to_int = extract_character_vocab(source)
        target_int_to_letter, target_letter_to_int = extract_character_vocab(target)
        print("Source INT to letter: {}".format(source_int_to_letter))
        print("Target INT to letter: {}\n".format(target_int_to_letter))

        # Save source_int_to_letter, target_int_to_letter & source_letter_to_int for loading later after graph is saved
        with open(SOURCE_INT_TO_LETTER, 'w') as output_file:
            json.dump(source_int_to_letter, output_file)
        print("Wrote {} data to file.".format(SOURCE_INT_TO_LETTER))
        with open(TARGET_INT_TO_LETTER, 'w') as output_file:
            json.dump(target_int_to_letter, output_file)
        print("Wrote {} data to file.".format(TARGET_INT_TO_LETTER))
        with open(SOURCE_LETTER_TO_INT, 'w') as output_file:
            json.dump(source_letter_to_int, output_file)
        print("Wrote {} data to file.".format(SOURCE_LETTER_TO_INT))
        with open(TARGET_LETTER_TO_INT, 'w') as output_file:
            json.dump(target_letter_to_int, output_file)
        print("Wrote {} data to file.".format(TARGET_LETTER_TO_INT))

def produce_letter_ids(source, target):
    
    if (not source_int_to_letter):
        load_int_letter_translations(source, target)
    
    # Convert characters to ids
    source_ids = [[source_letter_to_int.get(letter, source_letter_to_int['<UNK>']) for letter in line] \
                         for line in source]
    target_ids = [[target_letter_to_int.get(letter, target_letter_to_int['<UNK>']) for letter in line] \
                         + [target_letter_to_int['<EOS>']] for line in target]
    
    return source_ids, target_ids

In [192]:
# Convert source and target sentences into IDs
source_letter_ids, target_letter_ids = produce_letter_ids(source_sentences, target_sentences)

print("\nExample source sequences")
print(source_letter_ids[:3])
print("\nExample target sequences")
print(target_letter_ids[:3])
print()


Read large_graph/sourceinttoletter.json data from file.
Read large_graph/targetinttoletter.json data from file.
Read large_graph/sourcelettertoint.json data from file.
Read large_graph/targetlettertoint.json data from file.

Example source sequences
[[54, 67, 22, 46, 77, 9, 68, 4, 6, 70, 22, 6, 49, 42, 46, 22, 26, 68, 46, 9, 68, 42, 6, 66, 49, 46, 26, 11, 58, 46, 11, 49, 46, 26, 66, 49, 66, 67, 9, 11, 33, 68, 46, 77, 68, 4, 6, 42, 9, 68, 40, 16], [19, 68, 5, 9, 67, 11, 9, 40, 46, 38, 37, 46, 18, 46, 76, 11, 66, 33, 66, 46, 10, 6, 46, 72, 11, 49, 6, 66, 46, 29, 50, 64, 6, 49, 4, 66, 49, 36], [75, 68, 14, 58, 46, 58, 67, 70, 26, 46, 11, 49, 46, 70, 26, 11, 9, 65, 6, 49, 42, 46, 42, 67, 40, 16]]

Example target sequences
[[53, 65, 22, 45, 75, 9, 66, 4, 6, 68, 22, 6, 48, 42, 45, 22, 26, 66, 45, 9, 66, 42, 6, 64, 48, 45, 26, 11, 57, 45, 11, 48, 45, 26, 64, 48, 64, 65, 9, 11, 5, 33, 66, 45, 75, 66, 4, 6, 42, 9, 66, 66, 16, 3], [19, 66, 5, 9, 65, 11, 9, 40, 45, 38, 37, 45, 18, 45, 74, 11, 64

In [193]:
print('\nFirst 10 sentence:')
for i in range (0, 10):
    print("\nSource --> {}".format(source_letter_ids[i]))
    print("Target --> {}".format(target_letter_ids[i]))


First 10 sentence:

Source --> [54, 67, 22, 46, 77, 9, 68, 4, 6, 70, 22, 6, 49, 42, 46, 22, 26, 68, 46, 9, 68, 42, 6, 66, 49, 46, 26, 11, 58, 46, 11, 49, 46, 26, 66, 49, 66, 67, 9, 11, 33, 68, 46, 77, 68, 4, 6, 42, 9, 68, 40, 16]
Target --> [53, 65, 22, 45, 75, 9, 66, 4, 6, 68, 22, 6, 48, 42, 45, 22, 26, 66, 45, 9, 66, 42, 6, 64, 48, 45, 26, 11, 57, 45, 11, 48, 45, 26, 64, 48, 64, 65, 9, 11, 5, 33, 66, 45, 75, 66, 4, 6, 42, 9, 66, 66, 16, 3]

Source --> [19, 68, 5, 9, 67, 11, 9, 40, 46, 38, 37, 46, 18, 46, 76, 11, 66, 33, 66, 46, 10, 6, 46, 72, 11, 49, 6, 66, 46, 29, 50, 64, 6, 49, 4, 66, 49, 36]
Target --> [19, 66, 5, 9, 65, 11, 9, 40, 45, 38, 37, 45, 18, 45, 74, 11, 64, 33, 64, 45, 10, 6, 45, 70, 11, 48, 6, 64, 45, 29, 49, 62, 6, 48, 4, 64, 48, 36, 3]

Source --> [75, 68, 14, 58, 46, 58, 67, 70, 26, 46, 11, 49, 46, 70, 26, 11, 9, 65, 6, 49, 42, 46, 42, 67, 40, 16]
Target --> [73, 66, 14, 57, 45, 57, 65, 68, 26, 45, 11, 48, 45, 68, 26, 11, 9, 63, 6, 48, 42, 45, 42, 65, 40, 16, 3]

So

## Character Sequence to Sequence Model
This model was updated to work with TensorFlow 1.1 and builds on the work of Dave Currie. Check out Dave's post [Text Summarization with Amazon Reviews](https://medium.com/towards-data-science/text-summarization-with-amazon-reviews-41801c2210b).
<img src="images/sequence-to-sequence.jpg"/>
#### Check the Version of TensorFlow and wether or not there's a GPU

In [194]:
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    print('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.3.0
Default GPU Device: /gpu:0


### Hyperparameters

In [195]:
if (len(source_sentences) > 10000):
    
    # Using the big data (i.e. one billion word benchmark)
    print("Using hyperparameters for the big data with {:,} source sentences.".format(len(source_sentences)))
    epochs = 4       # Number of Epochs
    batch_size = 128 # Batch Size

    rnn_size = 512   # RNN Size
    num_layers = 2   # Number of Layers
    encoding_embedding_size = 512 # Encoding embedding Size
    decoding_embedding_size = 512 # Decoding embedding Size
    keep_probability = 0.7 # keep probability

    learning_rate = 0.001 # Learning Rate
    
else:
    
    # Using the small data (i.e. 10k source sentences)
    print("Using hyperparameters for the small data with {:,} source sentences.".format(len(source_sentences)))
    epochs = 60 # Number of Epochs (normally 60 but reduced to test retraining model)
    batch_size = 128 # Batch Size
    rnn_size = 50 # RNN Size    
    num_layers = 2 # Number of Layers    
    encoding_embedding_size = 15 # Embedding Size
    decoding_embedding_size = 15 # Embedding Size
    keep_probability = 0.7 # keep probability
    learning_rate = 0.001 # Learning Rate

def get_hyperparameters_message():
    message  = "Batch size: {}\n".format(batch_size)
    message += "RNN size  : {}\n".format(rnn_size)
    message += "Num layers: {}\n".format(num_layers)
    message += "Enc. size : {}\n".format(encoding_embedding_size)
    message += "Dec. size : {}\n".format(decoding_embedding_size)
    message += "Keep prob.: {}\n".format(keep_probability)
    message += "Learn rate: {}\n\n".format(learning_rate)
    return message

# Write batch_size to file for loading after graph has been saved
with open(GRAPH_PARAMETERS, 'w') as file:
  file.write('%d' % batch_size)

Using hyperparameters for the big data with 4,154,135 source sentences.


### Input

In [196]:
def get_model_inputs():
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    keep_probability = tf.placeholder(tf.float32,name='keep_prob')
    lr = tf.placeholder(tf.float32, name='learning_rate')

    target_sequence_length = tf.placeholder(tf.int32, (None,), name='target_sequence_length')
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name='max_target_len')
    source_sequence_length = tf.placeholder(tf.int32, (None,), name='source_sequence_length')
    
    return input_data, targets, keep_probability, lr, target_sequence_length, max_target_sequence_length, source_sequence_length

### Sequence to Sequence Model

We can now start defining the functions that will build the seq2seq model. We are building it from the bottom up with the following components:

    2.1 Encoder
        - Embedding
        - Encoder cell
    2.2 Decoder
        1- Process decoder inputs
        2- Set up the decoder
            - Embedding
            - Decoder cell
            - Dense output layer
            - Training decoder
            - Inference decoder
    2.3 Seq2seq model connecting the encoder and decoder
    2.4 Build the training graph hooking up the model with the 
        optimizer

### 2.1 Encoder

The first bit of the model we'll build is the encoder. Here, we'll embed the input data, construct our encoder, then pass the embedded data to the encoder.

- Embed the input data using [`tf.contrib.layers.embed_sequence`](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence)
<img src="images/embed_sequence.png" />

- Pass the embedded input into a stack of RNNs.  Save the RNN state and ignore the output.
<img src="images/encoder.png" />

In [197]:
def encoding_layer(input_data, rnn_size, num_layers, keep_prob, source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):

    # Encoder embedding
    enc_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)

    # RNN cell
    def make_cell(rnn_size):
        enc_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.contrib.layers.variance_scaling_initializer(seed=2))
        enc_cell = tf.contrib.rnn.DropoutWrapper(enc_cell, output_keep_prob=keep_prob)
        return enc_cell

    enc_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
    
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, enc_embed_input, 
                                              sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

## 2.2 Decoder

The decoder is probably the most involved part of this model. The following steps are needed to create it:

    1- Process decoder inputs
    2- Set up the decoder components
        - Embedding
        - Decoder cell
        - Dense output layer
        - Training decoder
        - Inference decoder


### Process Decoder Input


In the training process, the target sequences will be used in two different places:

 1. Using them to calculate the loss
 2. Feeding them to the decoder during training to make the model more robust.

Now we need to address the second point. Let's assume our targets look like this in their letter/word form (we're doing this for readibility. At this point in the code, these sequences would be in int form):


<img src="images/targets_1.png"/>

We need to do a simple transformation on the tensor before feeding it to the decoder:

1- We will feed an item of the sequence to the decoder at each time step. Think about the last timestep -- where the decoder outputs the final word in its output. The input to that step is the item before last from the target sequence. The decoder has no use for the last item in the target sequence in this scenario. So we'll need to remove the last item. 

We do that using tensorflow's tf.strided_slice() method. We hand it the tensor, and the index of where to start and where to end the cutting.

<img src="images/strided_slice_1.png"/>

2- The first item in each sequence we feed to the decoder has to be GO symbol. So We'll add that to the beginning.


<img src="images/targets_add_go.png"/>


Now the tensor is ready to be fed to the decoder. It looks like this (if we convert from ints to letters/symbols):

<img src="images/targets_after_processing_1.png"/>

In [198]:
# Process the input we'll feed to the decoder
def process_decoder_input(target_data, vocab_to_int, batch_size):
    '''Remove the last word id from each batch and concat the <GO> to the begining of each batch'''
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input


### Set up the decoder components

        - Embedding
        - Decoder cell
        - Dense output layer
        - Training decoder
        - Inference decoder

#### 1- Embedding
Now that we have prepared the inputs to the training decoder, we need to embed them so they can be ready to be passed to the decoder. 

We'll create an embedding matrix like the following then have tf.nn.embedding_lookup convert our input to its embedded equivalent:
<img src="images/embeddings.png" />

#### 2- Decoder Cell
Then we declare our decoder cell. Just like the encoder, we'll use an tf.contrib.rnn.LSTMCell here as well.

We need to declare a decoder for the training process, and a decoder for the inference/prediction process. These two decoders will share their parameters (so that all the weights and biases that are set during the training phase can be used when we deploy the model).

First, we'll need to define the type of cell we'll be using for our decoder RNNs. We opted for LSTM.

#### 3- Dense output layer
Before we move to declaring our decoders, we'll need to create the output layer, which will be a tensorflow.python.layers.core.Dense layer that translates the outputs of the decoder to logits that tell us which element of the decoder vocabulary the decoder is choosing to output at each time step.

#### 4- Training decoder
Essentially, we'll be creating two decoders which share their parameters. One for training and one for inference. The two are similar in that both created using tf.contrib.seq2seq.**BasicDecoder** and tf.contrib.seq2seq.**dynamic_decode**. They differ, however, in that we feed the the target sequences as inputs to the training decoder at each time step to make it more robust.

We can think of the training decoder as looking like this (except that it works with sequences in batches):
<img src="images/sequence-to-sequence-training-decoder.png"/>

The training decoder **does not** feed the output of each time step to the next. Rather, the inputs to the decoder time steps are the target sequence from the training dataset (the orange letters).

#### 5- Inference decoder
The inference decoder is the one we'll use when we deploy our model to the wild.

<img src="images/sequence-to-sequence-inference-decoder.png"/>

We'll hand our encoder hidden state to both the training and inference decoders and have it process its output. TensorFlow handles most of the logic for us. We just have to use the appropriate methods from tf.contrib.seq2seq and supply them with the appropriate inputs.


In [199]:
def decoding_layer(target_letter_to_int, decoding_embedding_size, num_layers, rnn_size, keep_prob,
                   target_sequence_length, max_target_sequence_length, enc_state, dec_input):
    
    # 1. Decoder Embedding
    target_vocab_size = len(target_letter_to_int)
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    # 2. Construct the decoder cell
    def make_cell(rnn_size):
        dec_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.contrib.layers.variance_scaling_initializer(seed=2))
        dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, output_keep_prob=keep_prob)
        return dec_cell

    dec_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
     
    # 3. Dense layer to translate the decoder's output at each time 
    # step into a choice from the target vocabulary
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.contrib.layers.variance_scaling_initializer(seed=2))

    # 4. Set up a training decoder and an inference decoder
    # Training Decoder
    with tf.variable_scope("decode"):

        # Helper for the training process. Used by BasicDecoder to read inputs.
        training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)

        # Basic decoder
        training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, enc_state, output_layer) 
        
        # Perform dynamic decoding using the decoder
        training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder, impute_finished=True, 
                                                                    maximum_iterations=max_target_sequence_length)[0]

    # 5. Inference Decoder
    # Reuses the same parameters trained by the training process
    with tf.variable_scope("decode", reuse=True):
        start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), 
                               [batch_size], name='start_tokens')

        # Helper for the inference process.
        inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, 
                                                                    start_tokens, 
                                                                    target_letter_to_int['<EOS>'])

        # Basic decoder
        inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, inference_helper, enc_state, output_layer)
        
        # Perform dynamic decoding using the decoder
        inference_decoder_output = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_target_sequence_length)[0]

    return training_decoder_output, inference_decoder_output

## 2.3 Seq2seq model 
Let's now go a step above, and hook up the encoder and decoder using the methods we just declared

In [200]:
def seq2seq_model(input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length,
                  source_vocab_size, target_vocab_size, enc_embedding_size, dec_embedding_size, rnn_size, num_layers, 
                  keep_prob):
    
    # Pass the input data through the encoder. We'll ignore the encoder output, but use the state
    _, enc_state = encoding_layer(input_data, 
                                  rnn_size, 
                                  num_layers,
                                  keep_prob,
                                  source_sequence_length,
                                  source_vocab_size, 
                                  encoding_embedding_size)

    # Prepare the target sequences we'll feed to the decoder in training mode
    dec_input = process_decoder_input(targets, target_letter_to_int, batch_size)
    
    # Pass encoder state and decoder inputs to the decoders
    training_decoder_output, inference_decoder_output = decoding_layer(target_letter_to_int, 
                                                                       decoding_embedding_size, 
                                                                       num_layers, 
                                                                       rnn_size,
                                                                       keep_prob,
                                                                       target_sequence_length,
                                                                       max_target_sequence_length,
                                                                       enc_state, 
                                                                       dec_input) 
    
    return training_decoder_output, inference_decoder_output

Model outputs *training_decoder_output* and *inference_decoder_output* both contain a 'rnn_output' logits tensor that looks like this:

<img src="images/logits.png"/>

The logits we get from the training tensor we'll pass to tf.contrib.seq2seq.**sequence_loss()** to calculate the loss and ultimately the gradient.




In [201]:
from tensorflow.python.layers.core import Dense

# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():
    
    # Load the model inputs    
    input_data, targets, keep_prob, lr, target_sequence_length, max_target_sequence_length, source_sequence_length \
    = get_model_inputs()
    
    # Create the training and inference logits
    training_decoder_output, inference_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(source_letter_to_int),
                                                                      len(target_letter_to_int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers,
                                                                      keep_prob)    
    
    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_decoder_output.rnn_output, 'logits')
    inference_logits = tf.identity(inference_decoder_output.sample_id, name='predictions')
    
    # Create the weights for sequence_loss
    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):

        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(training_logits, targets, masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
        
        # Add variables to collection in order to load them up when retraining a saved graph
        tf.add_to_collection("cost", cost)
        tf.add_to_collection("train_op", train_op)

## Get Batches

There's little processing involved when we retreive the batches. This is a simple example assuming batch_size = 2

Target sequences (it's actually in int form, we're showing the characters for clarity):

<img src="images/source_batch.png" />

Source sequences (also in int, but showing letters for clarity):

<img src="images/target_batch.png" />

In [202]:
import numpy as np

def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]

def get_batches(targets, sources, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))
        
        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))
        
        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))
        
        yield pad_targets_batch, pad_sources_batch, pad_targets_lengths, pad_source_lengths

## Training function
We're now ready to train our model. If you run into OOM (out of memory) issues during training, try to decrease the batch_size.

In [203]:
# Split data to training and validation sets
train_source = source_letter_ids[batch_size:]
train_target = target_letter_ids[batch_size:]
valid_source = source_letter_ids[:batch_size]
valid_target = target_letter_ids[:batch_size]
(valid_targets_batch, valid_sources_batch, valid_targets_lengths, valid_sources_lengths) \
= next(get_batches(valid_target, valid_source, batch_size, 
                   source_letter_to_int['<PAD>'], target_letter_to_int['<PAD>']))

if (len(source_sentences) > 10000):
    display_step = 100 # Check training loss after each of this many batches with large data
else:
    display_step = 20 # Check training loss after each of this many batches with small data

def train(epoch_i):
    
    global train_graph, train_op, cost, input_data, targets, lr
    global source_sequence_length, target_sequence_length, keep_prob
    
    # Test to see if graph already exists
    if os.path.exists(checkpoint + ".meta"):
        print("Reloading existing graph to continue training.")
        reloading = True    
        train_graph = tf.Graph()
    else:
        print("Starting with new graph.")
        reloading = False
        with train_graph.as_default():
            saver = tf.train.Saver()
    
    with tf.Session(graph=train_graph) as sess:    

        if reloading:
            saver = tf.train.import_meta_graph(checkpoint + '.meta')
            saver.restore(sess, checkpoint) 

            # Restore variables
            input_data = train_graph.get_tensor_by_name('input:0')
            targets = train_graph.get_tensor_by_name('targets:0')
            lr = train_graph.get_tensor_by_name('learning_rate:0')
            source_sequence_length = train_graph.get_tensor_by_name('source_sequence_length:0')
            target_sequence_length = train_graph.get_tensor_by_name('target_sequence_length:0')
            keep_prob = train_graph.get_tensor_by_name('keep_prob:0')

            # Grab the optimizer variables that were added to the collection during build
            cost = tf.get_collection("cost")[0]
            train_op = tf.get_collection("train_op")[0]

        else:
            sess.run(tf.global_variables_initializer())

        message = "" # Clear message to be sent in body of email
        
        for batch_i, (targets_batch, sources_batch, targets_lengths, sources_lengths) in enumerate(
                get_batches(train_target, train_source, batch_size,
                           source_letter_to_int['<PAD>'],
                           target_letter_to_int['<PAD>'])):

            # Training step
            _, loss = sess.run(
                [train_op, cost],
                {input_data: sources_batch,
                 targets: targets_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})

            batch = batch_i + 1 # batch_i starts at zero so batch is the batch number
            
            # Debug message updating us on the status of the training
            if (batch % display_step == 0 and batch > 0) or batch == (len(train_source) // batch_size):

                # Calculate validation cost
                validation_loss = sess.run(
                [cost],
                {input_data: valid_sources_batch,
                 targets: valid_targets_batch,
                 lr: learning_rate,
                 target_sequence_length: valid_targets_lengths,
                 source_sequence_length: valid_sources_lengths,
                 keep_prob: 1.0})

                line = 'Epoch {:>3}/{} Batch {:>6}/{} Inputs (000) {:>7} - Loss: {:>6.3f} - Validation loss: {:>6.3f}'\
                .format(epoch_i, epochs, batch, len(train_source) // batch_size, 
                        (((epoch_i - 1) * len(train_source)) + batch_i * batch_size) // 1000, loss, validation_loss[0])
                print(line)
                message += line + "\n"

        # Save model at the end of each epoch
        print("Saving graph...")
        saver.save(sess, checkpoint)
        
        return message # return message to be sent in body of email

## Prediction
**Start here to use a saved and pre-trained graph.** Load the saved graph and compute some preditions.

In [204]:
# Read batch_size from file
with open(GRAPH_PARAMETERS, 'r') as file:
    try:
        batch_size = int(file.read())
        print("Loaded batch_size = {}".format(batch_size))
    except ValueError:
        batch_size = 128
        print("Unable to load batch_size from file so using default 128.")
            
if (small):
    # There is no validation data for the small set, so just load up the data
    print("Load up the small data.")
    validation_source_sentences, validation_target_sentences = load_small_data()
else:
    
    # Load the validation set and construct the source sentences
    AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN

    validation_target_sentences = open(NEWS_FILE_NAME_VALIDATE, encoding="utf8").read().split("\n")
    validation_source_sentences = open(NEWS_FILE_NAME_VALIDATE, encoding="utf8").read().split("\n")
    # Reduce workload by grabbing first batches only
    # target_sentences = target_sentences[:5*batch_size]
    # source_sentences = source_sentences[:5*batch_size]
    
    # Add the random noise to the source
    for i in range(len(validation_source_sentences)):
        validation_source_sentences[i] = add_noise_to_string(validation_source_sentences[i], AMOUNT_OF_NOISE)
    
print("There are {:,d} validation sentences and {:,.0f} batches.".format(len(validation_source_sentences), 
                                                                         len(validation_source_sentences)//batch_size))
    
print('\nFirst 10 sentence:')
for i in range (0, 10):
    print("\nSource --> " + validation_source_sentences[i])
    print("Target --> " + validation_target_sentences[i])

Loaded batch_size = 128
There are 461,570 validation sentences and 3,606 batches.

First 10 sentence:

Source --> Thenagain, many Roosters fans believe they've already won.
Target --> Then again, many Roosters fans believe they've already won.

Source --> Reembering Two Seminal Kennedy Speeches
Target --> Remembering Two Seminal Kennedy Speeches

Source --> ThaU would be a stretch to say I've seen it.
Target --> That would be a stretch to say I've seen it.

Source --> Solemn Skies
Target --> Solemn Skies

Source --> Tropical Storm Fernand Heads to Mexico's Coast
Target --> Tropical Storm Fernand Heads to Mexico's Coast

Source --> But mate I want to keep going with what I am doing.
Target --> But mate I want to keep going with what I am doing.

Source --> Jimi Hendrix makes highest chart debut since 1969
Target --> Jimi Hendrix makes highest chart debut since 1969

Source --> Iran: pas de bouleversements, mais un changement de style
Target --> Iran: pas de bouleversements, mais un chan

In [205]:
# def source_to_seq(text, length):
#     '''Prepare the text for the model'''
# #     sequence_length = 7 # don't understand why set to 7
# #     sequence_length = 60
#     return [source_letter_to_int.get(word, source_letter_to_int['<UNK>']) for word in text] \
# + [source_letter_to_int['<PAD>']]*(length-len(text))

In [206]:
def get_accuracy(source_sentences, target_sentences):
    
    # Convert sentences to IDs
    source_letter_ids, target_letter_ids = produce_letter_ids(source_sentences, target_sentences)

    pad = source_letter_to_int["<PAD>"]
    eos = source_letter_to_int["<EOS>"]
    matches = 0
    total = 0
    display_step = 10

    loaded_graph = tf.Graph()
    with tf.Session(graph=loaded_graph) as sess:

        # Load saved model
        loader = tf.train.import_meta_graph(checkpoint + '.meta')
        loader.restore(sess, checkpoint)

        # Load graph variables
        input_data = loaded_graph.get_tensor_by_name('input:0')
        logits = loaded_graph.get_tensor_by_name('predictions:0')
        source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
        target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
        keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

        for batch_i,(targets_batch, sources_batch, targets_lengths, sources_lengths) \
        in enumerate(get_batches(target_letter_ids, source_letter_ids, batch_size, 
                                 source_letter_to_int['<PAD>'], target_letter_to_int['<PAD>'])):

            # Multiply by batch_size to match the model's input parameters
            answer_logits = sess.run(logits, {input_data: sources_batch, 
                                              target_sequence_length: targets_lengths, 
                                              source_sequence_length: sources_lengths,
                                              keep_prob: 1.0})

            for n in range(batch_size):
                answer = "".join([target_int_to_letter[i] for i in answer_logits[n] if (i != pad and i != eos)])
                target = target_sentences[batch_i * batch_size + n]
                total += 1
                if (answer == target):
                    matches += 1

            if batch_i % display_step == 0 and batch_i > 0:
                print('Batch {:>6}/{} - Accuracy: {:.1%}'.format(batch_i, 
                                                                 len(source_sentences)//batch_size, 
                                                                 matches/total))

        print("Final accuracy = {:.1%}\n".format(matches/total))
        
        return matches/total

# Train graph by looping through epochs
Compute accuracy after each epoch and return in email

In [207]:
import time
from boto.utils import get_instance_metadata

start = time.time()
metadata = get_instance_metadata(timeout=1.0, num_retries=2)

# Run through all the epoch, computing the accuracy after each and sending the results via email
for epoch_i in range(1, epochs + 1):
    
    message = get_hyperparameters_message()
    message += train(epoch_i)

    # Print time spent training the model
    end = time.time()
    seconds = end - start
    m, s = divmod(seconds, 60)
    h, m = divmod(m, 60)
    print("Model Trained in {}h:{}m:{}s and Saved".format(int(h), int(m), int(s)))
    message += "\nModel training for {}h:{}m:{}s and saved.".format(int(h), int(m), int(s))
    
    # Get current accuracy
    accuracy = get_accuracy(validation_source_sentences, validation_target_sentences)
    message += "\nCurrent accuracy = {:.1%}".format(accuracy)
    
    # Send email updates if using AWS
    if len(metadata.keys()) > 0:
        subject = "Completed training epoch {} - Accuracy = {:.1%}".format(epoch_i, accuracy)
        if(small):
            if(epoch_i % 10 == 0): # Only send email every 10 epoch when using small data
                send_email(subject, message)
        else: # Send an email after every epoch with large data
            send_email(subject, message)
    
print("\nTraining completed.")

Starting with new graph.
Epoch   1/4 Batch    100/32453 Inputs (000)      12 - Loss:  1.881 - Validation loss:  1.699
Epoch   1/4 Batch    200/32453 Inputs (000)      25 - Loss:  1.561 - Validation loss:  1.520
Epoch   1/4 Batch    300/32453 Inputs (000)      38 - Loss:  1.459 - Validation loss:  1.410
Epoch   1/4 Batch    400/32453 Inputs (000)      51 - Loss:  1.379 - Validation loss:  1.332
Epoch   1/4 Batch    500/32453 Inputs (000)      63 - Loss:  1.360 - Validation loss:  1.284
Epoch   1/4 Batch    600/32453 Inputs (000)      76 - Loss:  1.303 - Validation loss:  1.238
Epoch   1/4 Batch    700/32453 Inputs (000)      89 - Loss:  1.208 - Validation loss:  1.207
Epoch   1/4 Batch    800/32453 Inputs (000)     102 - Loss:  1.261 - Validation loss:  1.171
Epoch   1/4 Batch    900/32453 Inputs (000)     115 - Loss:  1.213 - Validation loss:  1.140
Epoch   1/4 Batch   1000/32453 Inputs (000)     127 - Loss:  1.202 - Validation loss:  1.112
Epoch   1/4 Batch   1100/32453 Inputs (000)  

Epoch   1/4 Batch   8900/32453 Inputs (000)    1139 - Loss:  0.265 - Validation loss:  0.198
Epoch   1/4 Batch   9000/32453 Inputs (000)    1151 - Loss:  0.289 - Validation loss:  0.199
Epoch   1/4 Batch   9100/32453 Inputs (000)    1164 - Loss:  0.256 - Validation loss:  0.200
Epoch   1/4 Batch   9200/32453 Inputs (000)    1177 - Loss:  0.281 - Validation loss:  0.195
Epoch   1/4 Batch   9300/32453 Inputs (000)    1190 - Loss:  0.240 - Validation loss:  0.195
Epoch   1/4 Batch   9400/32453 Inputs (000)    1203 - Loss:  0.250 - Validation loss:  0.196
Epoch   1/4 Batch   9500/32453 Inputs (000)    1215 - Loss:  0.243 - Validation loss:  0.191
Epoch   1/4 Batch   9600/32453 Inputs (000)    1228 - Loss:  0.265 - Validation loss:  0.187
Epoch   1/4 Batch   9700/32453 Inputs (000)    1241 - Loss:  0.254 - Validation loss:  0.190
Epoch   1/4 Batch   9800/32453 Inputs (000)    1254 - Loss:  0.235 - Validation loss:  0.190
Epoch   1/4 Batch   9900/32453 Inputs (000)    1267 - Loss:  0.246 - V

Epoch   1/4 Batch  17800/32453 Inputs (000)    2278 - Loss:  0.145 - Validation loss:  0.110
Epoch   1/4 Batch  17900/32453 Inputs (000)    2291 - Loss:  0.154 - Validation loss:  0.106
Epoch   1/4 Batch  18000/32453 Inputs (000)    2303 - Loss:  0.159 - Validation loss:  0.105
Epoch   1/4 Batch  18100/32453 Inputs (000)    2316 - Loss:  0.143 - Validation loss:  0.105
Epoch   1/4 Batch  18200/32453 Inputs (000)    2329 - Loss:  0.168 - Validation loss:  0.104
Epoch   1/4 Batch  18300/32453 Inputs (000)    2342 - Loss:  0.187 - Validation loss:  0.106
Epoch   1/4 Batch  18400/32453 Inputs (000)    2355 - Loss:  0.138 - Validation loss:  0.104
Epoch   1/4 Batch  18500/32453 Inputs (000)    2367 - Loss:  0.115 - Validation loss:  0.106
Epoch   1/4 Batch  18600/32453 Inputs (000)    2380 - Loss:  0.130 - Validation loss:  0.106
Epoch   1/4 Batch  18700/32453 Inputs (000)    2393 - Loss:  0.135 - Validation loss:  0.104
Epoch   1/4 Batch  18800/32453 Inputs (000)    2406 - Loss:  0.113 - V

Epoch   1/4 Batch  26700/32453 Inputs (000)    3417 - Loss:  0.094 - Validation loss:  0.074
Epoch   1/4 Batch  26800/32453 Inputs (000)    3430 - Loss:  0.094 - Validation loss:  0.074
Epoch   1/4 Batch  26900/32453 Inputs (000)    3443 - Loss:  0.104 - Validation loss:  0.076
Epoch   1/4 Batch  27000/32453 Inputs (000)    3455 - Loss:  0.114 - Validation loss:  0.077
Epoch   1/4 Batch  27100/32453 Inputs (000)    3468 - Loss:  0.135 - Validation loss:  0.074
Epoch   1/4 Batch  27200/32453 Inputs (000)    3481 - Loss:  0.122 - Validation loss:  0.075
Epoch   1/4 Batch  27300/32453 Inputs (000)    3494 - Loss:  0.110 - Validation loss:  0.071
Epoch   1/4 Batch  27400/32453 Inputs (000)    3507 - Loss:  0.080 - Validation loss:  0.074
Epoch   1/4 Batch  27500/32453 Inputs (000)    3519 - Loss:  0.101 - Validation loss:  0.073
Epoch   1/4 Batch  27600/32453 Inputs (000)    3532 - Loss:  0.100 - Validation loss:  0.073
Epoch   1/4 Batch  27700/32453 Inputs (000)    3545 - Loss:  0.109 - V

Batch    730/3606 - Accuracy: 53.9%
Batch    740/3606 - Accuracy: 53.9%
Batch    750/3606 - Accuracy: 53.9%
Batch    760/3606 - Accuracy: 53.9%
Batch    770/3606 - Accuracy: 53.9%
Batch    780/3606 - Accuracy: 53.9%
Batch    790/3606 - Accuracy: 53.8%
Batch    800/3606 - Accuracy: 53.8%
Batch    810/3606 - Accuracy: 53.8%
Batch    820/3606 - Accuracy: 53.8%
Batch    830/3606 - Accuracy: 53.8%
Batch    840/3606 - Accuracy: 53.9%
Batch    850/3606 - Accuracy: 53.9%
Batch    860/3606 - Accuracy: 53.9%
Batch    870/3606 - Accuracy: 53.9%
Batch    880/3606 - Accuracy: 53.9%
Batch    890/3606 - Accuracy: 53.9%
Batch    900/3606 - Accuracy: 53.9%
Batch    910/3606 - Accuracy: 53.9%
Batch    920/3606 - Accuracy: 53.9%
Batch    930/3606 - Accuracy: 53.9%
Batch    940/3606 - Accuracy: 53.9%
Batch    950/3606 - Accuracy: 53.9%
Batch    960/3606 - Accuracy: 53.9%
Batch    970/3606 - Accuracy: 53.9%
Batch    980/3606 - Accuracy: 53.9%
Batch    990/3606 - Accuracy: 53.9%
Batch   1000/3606 - Accuracy

Batch   3010/3606 - Accuracy: 53.8%
Batch   3020/3606 - Accuracy: 53.8%
Batch   3030/3606 - Accuracy: 53.8%
Batch   3040/3606 - Accuracy: 53.8%
Batch   3050/3606 - Accuracy: 53.8%
Batch   3060/3606 - Accuracy: 53.8%
Batch   3070/3606 - Accuracy: 53.8%
Batch   3080/3606 - Accuracy: 53.8%
Batch   3090/3606 - Accuracy: 53.8%
Batch   3100/3606 - Accuracy: 53.8%
Batch   3110/3606 - Accuracy: 53.8%
Batch   3120/3606 - Accuracy: 53.8%
Batch   3130/3606 - Accuracy: 53.8%
Batch   3140/3606 - Accuracy: 53.8%
Batch   3150/3606 - Accuracy: 53.8%
Batch   3160/3606 - Accuracy: 53.8%
Batch   3170/3606 - Accuracy: 53.8%
Batch   3180/3606 - Accuracy: 53.8%
Batch   3190/3606 - Accuracy: 53.8%
Batch   3200/3606 - Accuracy: 53.8%
Batch   3210/3606 - Accuracy: 53.8%
Batch   3220/3606 - Accuracy: 53.8%
Batch   3230/3606 - Accuracy: 53.8%
Batch   3240/3606 - Accuracy: 53.8%
Batch   3250/3606 - Accuracy: 53.8%
Batch   3260/3606 - Accuracy: 53.8%
Batch   3270/3606 - Accuracy: 53.8%
Batch   3280/3606 - Accuracy

Epoch   2/4 Batch   6500/32453 Inputs (000)    4985 - Loss:  0.082 - Validation loss:  0.062
Epoch   2/4 Batch   6600/32453 Inputs (000)    4998 - Loss:  0.065 - Validation loss:  0.058
Epoch   2/4 Batch   6700/32453 Inputs (000)    5011 - Loss:  0.096 - Validation loss:  0.057
Epoch   2/4 Batch   6800/32453 Inputs (000)    5024 - Loss:  0.079 - Validation loss:  0.057
Epoch   2/4 Batch   6900/32453 Inputs (000)    5037 - Loss:  0.083 - Validation loss:  0.060
Epoch   2/4 Batch   7000/32453 Inputs (000)    5049 - Loss:  0.099 - Validation loss:  0.060
Epoch   2/4 Batch   7100/32453 Inputs (000)    5062 - Loss:  0.087 - Validation loss:  0.057
Epoch   2/4 Batch   7200/32453 Inputs (000)    5075 - Loss:  0.089 - Validation loss:  0.058
Epoch   2/4 Batch   7300/32453 Inputs (000)    5088 - Loss:  0.072 - Validation loss:  0.059
Epoch   2/4 Batch   7400/32453 Inputs (000)    5101 - Loss:  0.087 - Validation loss:  0.060
Epoch   2/4 Batch   7500/32453 Inputs (000)    5113 - Loss:  0.072 - V

Epoch   2/4 Batch  15400/32453 Inputs (000)    6125 - Loss:  0.093 - Validation loss:  0.052
Epoch   2/4 Batch  15500/32453 Inputs (000)    6137 - Loss:  0.078 - Validation loss:  0.052
Epoch   2/4 Batch  15600/32453 Inputs (000)    6150 - Loss:  0.070 - Validation loss:  0.051
Epoch   2/4 Batch  15700/32453 Inputs (000)    6163 - Loss:  0.074 - Validation loss:  0.053
Epoch   2/4 Batch  15800/32453 Inputs (000)    6176 - Loss:  0.073 - Validation loss:  0.053
Epoch   2/4 Batch  15900/32453 Inputs (000)    6189 - Loss:  0.074 - Validation loss:  0.049
Epoch   2/4 Batch  16000/32453 Inputs (000)    6201 - Loss:  0.061 - Validation loss:  0.050
Epoch   2/4 Batch  16100/32453 Inputs (000)    6214 - Loss:  0.069 - Validation loss:  0.049
Epoch   2/4 Batch  16200/32453 Inputs (000)    6227 - Loss:  0.089 - Validation loss:  0.048
Epoch   2/4 Batch  16300/32453 Inputs (000)    6240 - Loss:  0.079 - Validation loss:  0.051
Epoch   2/4 Batch  16400/32453 Inputs (000)    6253 - Loss:  0.076 - V

Epoch   2/4 Batch  24300/32453 Inputs (000)    7264 - Loss:  0.058 - Validation loss:  0.045
Epoch   2/4 Batch  24400/32453 Inputs (000)    7277 - Loss:  0.074 - Validation loss:  0.044
Epoch   2/4 Batch  24500/32453 Inputs (000)    7289 - Loss:  0.065 - Validation loss:  0.045
Epoch   2/4 Batch  24600/32453 Inputs (000)    7302 - Loss:  0.072 - Validation loss:  0.044
Epoch   2/4 Batch  24700/32453 Inputs (000)    7315 - Loss:  0.063 - Validation loss:  0.043
Epoch   2/4 Batch  24800/32453 Inputs (000)    7328 - Loss:  0.071 - Validation loss:  0.044
Epoch   2/4 Batch  24900/32453 Inputs (000)    7341 - Loss:  0.078 - Validation loss:  0.046
Epoch   2/4 Batch  25000/32453 Inputs (000)    7353 - Loss:  0.076 - Validation loss:  0.046
Epoch   2/4 Batch  25100/32453 Inputs (000)    7366 - Loss:  0.051 - Validation loss:  0.043
Epoch   2/4 Batch  25200/32453 Inputs (000)    7379 - Loss:  0.063 - Validation loss:  0.043
Epoch   2/4 Batch  25300/32453 Inputs (000)    7392 - Loss:  0.077 - V

Batch    110/3606 - Accuracy: 64.6%
Batch    120/3606 - Accuracy: 64.5%
Batch    130/3606 - Accuracy: 64.6%
Batch    140/3606 - Accuracy: 64.6%
Batch    150/3606 - Accuracy: 64.6%
Batch    160/3606 - Accuracy: 64.7%
Batch    170/3606 - Accuracy: 64.5%
Batch    180/3606 - Accuracy: 64.6%
Batch    190/3606 - Accuracy: 64.6%
Batch    200/3606 - Accuracy: 64.6%
Batch    210/3606 - Accuracy: 64.6%
Batch    220/3606 - Accuracy: 64.6%
Batch    230/3606 - Accuracy: 64.6%
Batch    240/3606 - Accuracy: 64.6%
Batch    250/3606 - Accuracy: 64.6%
Batch    260/3606 - Accuracy: 64.5%
Batch    270/3606 - Accuracy: 64.6%
Batch    280/3606 - Accuracy: 64.5%
Batch    290/3606 - Accuracy: 64.5%
Batch    300/3606 - Accuracy: 64.6%
Batch    310/3606 - Accuracy: 64.5%
Batch    320/3606 - Accuracy: 64.5%
Batch    330/3606 - Accuracy: 64.5%
Batch    340/3606 - Accuracy: 64.5%
Batch    350/3606 - Accuracy: 64.5%
Batch    360/3606 - Accuracy: 64.5%
Batch    370/3606 - Accuracy: 64.5%
Batch    380/3606 - Accuracy

Batch   2390/3606 - Accuracy: 64.5%
Batch   2400/3606 - Accuracy: 64.5%
Batch   2410/3606 - Accuracy: 64.5%
Batch   2420/3606 - Accuracy: 64.5%
Batch   2430/3606 - Accuracy: 64.5%
Batch   2440/3606 - Accuracy: 64.5%
Batch   2450/3606 - Accuracy: 64.5%
Batch   2460/3606 - Accuracy: 64.5%
Batch   2470/3606 - Accuracy: 64.5%
Batch   2480/3606 - Accuracy: 64.5%
Batch   2490/3606 - Accuracy: 64.5%
Batch   2500/3606 - Accuracy: 64.5%
Batch   2510/3606 - Accuracy: 64.5%
Batch   2520/3606 - Accuracy: 64.5%
Batch   2530/3606 - Accuracy: 64.5%
Batch   2540/3606 - Accuracy: 64.5%
Batch   2550/3606 - Accuracy: 64.5%
Batch   2560/3606 - Accuracy: 64.5%
Batch   2570/3606 - Accuracy: 64.5%
Batch   2580/3606 - Accuracy: 64.5%
Batch   2590/3606 - Accuracy: 64.5%
Batch   2600/3606 - Accuracy: 64.5%
Batch   2610/3606 - Accuracy: 64.5%
Batch   2620/3606 - Accuracy: 64.5%
Batch   2630/3606 - Accuracy: 64.5%
Batch   2640/3606 - Accuracy: 64.5%
Batch   2650/3606 - Accuracy: 64.5%
Batch   2660/3606 - Accuracy

Epoch   3/4 Batch   4100/32453 Inputs (000)    8832 - Loss:  0.067 - Validation loss:  0.039
Epoch   3/4 Batch   4200/32453 Inputs (000)    8845 - Loss:  0.063 - Validation loss:  0.039
Epoch   3/4 Batch   4300/32453 Inputs (000)    8858 - Loss:  0.050 - Validation loss:  0.042
Epoch   3/4 Batch   4400/32453 Inputs (000)    8871 - Loss:  0.064 - Validation loss:  0.042
Epoch   3/4 Batch   4500/32453 Inputs (000)    8883 - Loss:  0.056 - Validation loss:  0.043
Epoch   3/4 Batch   4600/32453 Inputs (000)    8896 - Loss:  0.062 - Validation loss:  0.041
Epoch   3/4 Batch   4700/32453 Inputs (000)    8909 - Loss:  0.056 - Validation loss:  0.042
Epoch   3/4 Batch   4800/32453 Inputs (000)    8922 - Loss:  0.047 - Validation loss:  0.039
Epoch   3/4 Batch   4900/32453 Inputs (000)    8935 - Loss:  0.053 - Validation loss:  0.038
Epoch   3/4 Batch   5000/32453 Inputs (000)    8947 - Loss:  0.055 - Validation loss:  0.039
Epoch   3/4 Batch   5100/32453 Inputs (000)    8960 - Loss:  0.052 - V

Epoch   3/4 Batch  13000/32453 Inputs (000)    9971 - Loss:  0.068 - Validation loss:  0.037
Epoch   3/4 Batch  13100/32453 Inputs (000)    9984 - Loss:  0.061 - Validation loss:  0.037
Epoch   3/4 Batch  13200/32453 Inputs (000)    9997 - Loss:  0.054 - Validation loss:  0.037
Epoch   3/4 Batch  13300/32453 Inputs (000)   10010 - Loss:  0.064 - Validation loss:  0.039
Epoch   3/4 Batch  13400/32453 Inputs (000)   10023 - Loss:  0.057 - Validation loss:  0.038
Epoch   3/4 Batch  13500/32453 Inputs (000)   10035 - Loss:  0.052 - Validation loss:  0.037
Epoch   3/4 Batch  13600/32453 Inputs (000)   10048 - Loss:  0.053 - Validation loss:  0.039
Epoch   3/4 Batch  13700/32453 Inputs (000)   10061 - Loss:  0.077 - Validation loss:  0.041
Epoch   3/4 Batch  13800/32453 Inputs (000)   10074 - Loss:  0.065 - Validation loss:  0.037
Epoch   3/4 Batch  13900/32453 Inputs (000)   10087 - Loss:  0.062 - Validation loss:  0.039
Epoch   3/4 Batch  14000/32453 Inputs (000)   10099 - Loss:  0.056 - V

Epoch   3/4 Batch  21900/32453 Inputs (000)   11111 - Loss:  0.059 - Validation loss:  0.036
Epoch   3/4 Batch  22000/32453 Inputs (000)   11123 - Loss:  0.055 - Validation loss:  0.038
Epoch   3/4 Batch  22100/32453 Inputs (000)   11136 - Loss:  0.045 - Validation loss:  0.039
Epoch   3/4 Batch  22200/32453 Inputs (000)   11149 - Loss:  0.050 - Validation loss:  0.035
Epoch   3/4 Batch  22300/32453 Inputs (000)   11162 - Loss:  0.056 - Validation loss:  0.037
Epoch   3/4 Batch  22400/32453 Inputs (000)   11175 - Loss:  0.058 - Validation loss:  0.034
Epoch   3/4 Batch  22500/32453 Inputs (000)   11187 - Loss:  0.058 - Validation loss:  0.036
Epoch   3/4 Batch  22600/32453 Inputs (000)   11200 - Loss:  0.053 - Validation loss:  0.038
Epoch   3/4 Batch  22700/32453 Inputs (000)   11213 - Loss:  0.057 - Validation loss:  0.035
Epoch   3/4 Batch  22800/32453 Inputs (000)   11226 - Loss:  0.056 - Validation loss:  0.035
Epoch   3/4 Batch  22900/32453 Inputs (000)   11239 - Loss:  0.057 - V

Epoch   3/4 Batch  30800/32453 Inputs (000)   12250 - Loss:  0.041 - Validation loss:  0.034
Epoch   3/4 Batch  30900/32453 Inputs (000)   12263 - Loss:  0.048 - Validation loss:  0.034
Epoch   3/4 Batch  31000/32453 Inputs (000)   12275 - Loss:  0.045 - Validation loss:  0.034
Epoch   3/4 Batch  31100/32453 Inputs (000)   12288 - Loss:  0.045 - Validation loss:  0.035
Epoch   3/4 Batch  31200/32453 Inputs (000)   12301 - Loss:  0.061 - Validation loss:  0.035
Epoch   3/4 Batch  31300/32453 Inputs (000)   12314 - Loss:  0.032 - Validation loss:  0.034
Epoch   3/4 Batch  31400/32453 Inputs (000)   12327 - Loss:  0.054 - Validation loss:  0.035
Epoch   3/4 Batch  31500/32453 Inputs (000)   12339 - Loss:  0.055 - Validation loss:  0.032
Epoch   3/4 Batch  31600/32453 Inputs (000)   12352 - Loss:  0.051 - Validation loss:  0.033
Epoch   3/4 Batch  31700/32453 Inputs (000)   12365 - Loss:  0.051 - Validation loss:  0.036
Epoch   3/4 Batch  31800/32453 Inputs (000)   12378 - Loss:  0.056 - V

Batch   1790/3606 - Accuracy: 68.9%
Batch   1800/3606 - Accuracy: 68.9%
Batch   1810/3606 - Accuracy: 69.0%
Batch   1820/3606 - Accuracy: 68.9%
Batch   1830/3606 - Accuracy: 69.0%
Batch   1840/3606 - Accuracy: 69.0%
Batch   1850/3606 - Accuracy: 69.0%
Batch   1860/3606 - Accuracy: 69.0%
Batch   1870/3606 - Accuracy: 69.0%
Batch   1880/3606 - Accuracy: 69.0%
Batch   1890/3606 - Accuracy: 69.0%
Batch   1900/3606 - Accuracy: 69.0%
Batch   1910/3606 - Accuracy: 69.0%
Batch   1920/3606 - Accuracy: 69.0%
Batch   1930/3606 - Accuracy: 69.0%
Batch   1940/3606 - Accuracy: 69.0%
Batch   1950/3606 - Accuracy: 69.0%
Batch   1960/3606 - Accuracy: 69.0%
Batch   1970/3606 - Accuracy: 69.0%
Batch   1980/3606 - Accuracy: 69.0%
Batch   1990/3606 - Accuracy: 69.0%
Batch   2000/3606 - Accuracy: 69.0%
Batch   2010/3606 - Accuracy: 69.0%
Batch   2020/3606 - Accuracy: 69.0%
Batch   2030/3606 - Accuracy: 69.0%
Batch   2040/3606 - Accuracy: 69.0%
Batch   2050/3606 - Accuracy: 69.0%
Batch   2060/3606 - Accuracy

Epoch   4/4 Batch   1800/32453 Inputs (000)   12692 - Loss:  0.058 - Validation loss:  0.034
Epoch   4/4 Batch   1900/32453 Inputs (000)   12705 - Loss:  0.051 - Validation loss:  0.031
Epoch   4/4 Batch   2000/32453 Inputs (000)   12717 - Loss:  0.054 - Validation loss:  0.032
Epoch   4/4 Batch   2100/32453 Inputs (000)   12730 - Loss:  0.053 - Validation loss:  0.034
Epoch   4/4 Batch   2200/32453 Inputs (000)   12743 - Loss:  0.048 - Validation loss:  0.032
Epoch   4/4 Batch   2300/32453 Inputs (000)   12756 - Loss:  0.050 - Validation loss:  0.034
Epoch   4/4 Batch   2400/32453 Inputs (000)   12769 - Loss:  0.051 - Validation loss:  0.034
Epoch   4/4 Batch   2500/32453 Inputs (000)   12781 - Loss:  0.045 - Validation loss:  0.033
Epoch   4/4 Batch   2600/32453 Inputs (000)   12794 - Loss:  0.050 - Validation loss:  0.034
Epoch   4/4 Batch   2700/32453 Inputs (000)   12807 - Loss:  0.046 - Validation loss:  0.034
Epoch   4/4 Batch   2800/32453 Inputs (000)   12820 - Loss:  0.057 - V

Epoch   4/4 Batch  10700/32453 Inputs (000)   13831 - Loss:  0.042 - Validation loss:  0.031
Epoch   4/4 Batch  10800/32453 Inputs (000)   13844 - Loss:  0.048 - Validation loss:  0.033
Epoch   4/4 Batch  10900/32453 Inputs (000)   13857 - Loss:  0.052 - Validation loss:  0.032
Epoch   4/4 Batch  11000/32453 Inputs (000)   13869 - Loss:  0.048 - Validation loss:  0.033
Epoch   4/4 Batch  11100/32453 Inputs (000)   13882 - Loss:  0.062 - Validation loss:  0.033
Epoch   4/4 Batch  11200/32453 Inputs (000)   13895 - Loss:  0.057 - Validation loss:  0.032
Epoch   4/4 Batch  11300/32453 Inputs (000)   13908 - Loss:  0.053 - Validation loss:  0.032
Epoch   4/4 Batch  11400/32453 Inputs (000)   13921 - Loss:  0.051 - Validation loss:  0.031
Epoch   4/4 Batch  11500/32453 Inputs (000)   13933 - Loss:  0.041 - Validation loss:  0.033
Epoch   4/4 Batch  11600/32453 Inputs (000)   13946 - Loss:  0.067 - Validation loss:  0.035
Epoch   4/4 Batch  11700/32453 Inputs (000)   13959 - Loss:  0.037 - V

Epoch   4/4 Batch  19600/32453 Inputs (000)   14970 - Loss:  0.042 - Validation loss:  0.032
Epoch   4/4 Batch  19700/32453 Inputs (000)   14983 - Loss:  0.050 - Validation loss:  0.030
Epoch   4/4 Batch  19800/32453 Inputs (000)   14996 - Loss:  0.040 - Validation loss:  0.033
Epoch   4/4 Batch  19900/32453 Inputs (000)   15009 - Loss:  0.050 - Validation loss:  0.029
Epoch   4/4 Batch  20000/32453 Inputs (000)   15021 - Loss:  0.046 - Validation loss:  0.031
Epoch   4/4 Batch  20100/32453 Inputs (000)   15034 - Loss:  0.050 - Validation loss:  0.030
Epoch   4/4 Batch  20200/32453 Inputs (000)   15047 - Loss:  0.042 - Validation loss:  0.034
Epoch   4/4 Batch  20300/32453 Inputs (000)   15060 - Loss:  0.046 - Validation loss:  0.032
Epoch   4/4 Batch  20400/32453 Inputs (000)   15073 - Loss:  0.055 - Validation loss:  0.032
Epoch   4/4 Batch  20500/32453 Inputs (000)   15085 - Loss:  0.034 - Validation loss:  0.029
Epoch   4/4 Batch  20600/32453 Inputs (000)   15098 - Loss:  0.052 - V

Epoch   4/4 Batch  28500/32453 Inputs (000)   16109 - Loss:  0.069 - Validation loss:  0.030
Epoch   4/4 Batch  28600/32453 Inputs (000)   16122 - Loss:  0.046 - Validation loss:  0.030
Epoch   4/4 Batch  28700/32453 Inputs (000)   16135 - Loss:  0.039 - Validation loss:  0.032
Epoch   4/4 Batch  28800/32453 Inputs (000)   16148 - Loss:  0.054 - Validation loss:  0.030
Epoch   4/4 Batch  28900/32453 Inputs (000)   16161 - Loss:  0.050 - Validation loss:  0.031
Epoch   4/4 Batch  29000/32453 Inputs (000)   16173 - Loss:  0.044 - Validation loss:  0.033
Epoch   4/4 Batch  29100/32453 Inputs (000)   16186 - Loss:  0.043 - Validation loss:  0.031
Epoch   4/4 Batch  29200/32453 Inputs (000)   16199 - Loss:  0.053 - Validation loss:  0.032
Epoch   4/4 Batch  29300/32453 Inputs (000)   16212 - Loss:  0.041 - Validation loss:  0.031
Epoch   4/4 Batch  29400/32453 Inputs (000)   16225 - Loss:  0.045 - Validation loss:  0.030
Epoch   4/4 Batch  29500/32453 Inputs (000)   16237 - Loss:  0.041 - V

Batch   1200/3606 - Accuracy: 71.7%
Batch   1210/3606 - Accuracy: 71.7%
Batch   1220/3606 - Accuracy: 71.7%
Batch   1230/3606 - Accuracy: 71.7%
Batch   1240/3606 - Accuracy: 71.7%
Batch   1250/3606 - Accuracy: 71.7%
Batch   1260/3606 - Accuracy: 71.7%
Batch   1270/3606 - Accuracy: 71.7%
Batch   1280/3606 - Accuracy: 71.6%
Batch   1290/3606 - Accuracy: 71.7%
Batch   1300/3606 - Accuracy: 71.6%
Batch   1310/3606 - Accuracy: 71.7%
Batch   1320/3606 - Accuracy: 71.7%
Batch   1330/3606 - Accuracy: 71.6%
Batch   1340/3606 - Accuracy: 71.7%
Batch   1350/3606 - Accuracy: 71.6%
Batch   1360/3606 - Accuracy: 71.6%
Batch   1370/3606 - Accuracy: 71.6%
Batch   1380/3606 - Accuracy: 71.6%
Batch   1390/3606 - Accuracy: 71.6%
Batch   1400/3606 - Accuracy: 71.6%
Batch   1410/3606 - Accuracy: 71.6%
Batch   1420/3606 - Accuracy: 71.6%
Batch   1430/3606 - Accuracy: 71.6%
Batch   1440/3606 - Accuracy: 71.6%
Batch   1450/3606 - Accuracy: 71.6%
Batch   1460/3606 - Accuracy: 71.6%
Batch   1470/3606 - Accuracy

Batch   3480/3606 - Accuracy: 71.6%
Batch   3490/3606 - Accuracy: 71.6%
Batch   3500/3606 - Accuracy: 71.6%
Batch   3510/3606 - Accuracy: 71.6%
Batch   3520/3606 - Accuracy: 71.6%
Batch   3530/3606 - Accuracy: 71.6%
Batch   3540/3606 - Accuracy: 71.6%
Batch   3550/3606 - Accuracy: 71.6%
Batch   3560/3606 - Accuracy: 71.6%
Batch   3570/3606 - Accuracy: 71.6%
Batch   3580/3606 - Accuracy: 71.6%
Batch   3590/3606 - Accuracy: 71.6%
Batch   3600/3606 - Accuracy: 71.6%
Final accuracy = 71.6%


Training completed.
