# Language correction - missing articles

This notebook demonstrates a working example of applying the character-level sequence-to-sequence with attention model to address a language correction problem. We limit the scope of the problem to fixing a single missing article (a, an, the). This means that given an input text with an article missing, the model would recommend a language-corrected text as an output.

In particular, this notebook performs the following. Note that the majority of the codes are separated Python files inside the same folder (data_iterator.py, preprocessor.py, tf_graph.py, and training_manager.py).
- Initial data investigation to see how the data look like 
- Data preprocessing, i.e., converting from raw data to the format that is ready to be trained by a Tensorflow graph for recurrent network modelling.
- Splitting the data into three parts 1.) training set, 2.) validation set, and 3.) test set. 
- Specifying the hyperparameter for the training
- Training a sequence-to-sequence model (with attention) on the training set and evaluate an accuracy metric on the validation set. The training stops when the loss metric on the validation set does not improve (using a simple early stopping method).
- Saving the training model to files
- Loading the training model
- Evaluating the performance of the model qualitatively by using it on some random input data
- Evaluating the performance of the model quantitatively by using the accuracy metric on the test set

Before you begin, please execute the following steps
- Download the dataset archive file (missing_article.tar.gz) from https://github.com/rerngvit/dataset/blob/master/nlp/language_correction/missing_article.tar.gz
- Extract the dataset to the same folder as this notebook. You expect to have the folder "dataset" after the extraction process

In [1]:
src_data_path  = "./dataset/source.csv/part-00000-3399e745-6191-479d-b42c-434c9cea1010-c000.csv"
dest_data_path = "./dataset/dest.csv/part-00000-9a59ed8f-769a-4108-b277-7f014af68d7c-c000.csv"

# First let have a look at how the data look like

We read out the first 10 lines of the source and destination file

In [2]:
count = 1
for src_line, dest_line in zip(open(src_data_path), 
                               open(dest_data_path)):
    print(src_line + dest_line)
    print("============")
    if count >= 5:
        break
    count = count + 1
            

"When he was fifteen , he said he wanted to be messenger ."
"When he was fifteen, he said he wanted to be a messenger."

His mother sent him to get comb from his aunt .
His mother sent him to get a comb from his aunt.

He climbed on stork as the storks were flying south .
He climbed on a stork as the storks were flying south.

The chain emerged from Galati Brothers .
The chain emerged from the Galati Brothers.

All pages with template on them are listed here .
All pages with the template on them are listed here.



The data represents a fixing of missing an article (at most one mistake) and a minor fixing of puncation.

# Let begin the training process

First we specify the folder that will store the trained models and Tensorboard data

In [3]:
model_base_dir = "./trained_models/remove_articles"


First let have a look at the devices available for Tensorflow to train.
You expect to see a GPU on the command below. 
Otherwise, the training process will be even longer than it normally would (15-20X+).

In [4]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

  from ._conv import register_converters as _register_converters


[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 9107794676342764524]

In [5]:
import numpy as np
import warnings
import matplotlib.pyplot as plt
import time
import os
import json
warnings.filterwarnings('ignore')

In [6]:
import tensorflow as tf
tf.__version__

'1.4.1'

In [7]:
from preprocessor import CharacterLevelSeq2SeqPreprocessor

In [8]:
lang_data_preprocessor = CharacterLevelSeq2SeqPreprocessor(src_data_path=src_data_path, 
               dest_data_path=dest_data_path, 
               max_number_of_samples=-1)

 Processing in total 440450 lines


Below is the hyper parameters used for the model

In [9]:
config = { 'hidden_units' : 256,
           'depth'     : 12,
           'attention_type' : 'Bahdanau',
           'use_embedding' :  True,
           'embedding_size' : 128,
           'num_encoder_symbols' : lang_data_preprocessor.num_src_tokens,
           'num_decoder_symbols' : lang_data_preprocessor.num_dest_tokens,
           'use_residual' : True,
           'attn_input_feeding': True,
           'use_dropout': True,
           'dropout_rate' : 0.02,
           'learning_rate' : 0.0005,
           'max_gradient_norm' : 1.0,
           'batch_size' : 512,
           'num_epochs' : 20,
           'optimizer'  : 'adam',
           'use_fp16'   : False,
           'max_src_seq_length'  : lang_data_preprocessor.max_src_seq_length,
           'max_dest_seq_length' : lang_data_preprocessor.max_dest_seq_length,
           'max_decode_step' : lang_data_preprocessor.max_dest_seq_length,
           'padding_token_index' : lang_data_preprocessor.dest_token_index[
                                       lang_data_preprocessor.PADDED_character],
           'dest_start_token_index' : lang_data_preprocessor.dest_token_index[
                                       lang_data_preprocessor.SEQ_START_CHARACTER],
           'dest_eos_token_index' : lang_data_preprocessor.dest_token_index[
                                       lang_data_preprocessor.EOS_character],
           'model_base_dir': model_base_dir,
           'model_saved_path' : model_base_dir + "/trained_model",
           'use_beamsearch_decode' : False,
           'beam_width' : 64,          
           'saving_last_model' : True,
           'default_device' : '/gpu:0',
           'run_full_trace': False
         }

In [10]:
from tf_graph import CharacterSeq2SeqModel
from data_iterator import DataFeedIterator
from training_manager import TrainingManager

In [11]:
training_manager = TrainingManager(lang_data_preprocessor=lang_data_preprocessor,
                                   src_data_path=src_data_path, 
                                   dest_data_path=dest_data_path, 
                                   batch_size=config["batch_size"])

In [12]:
import json
os.makedirs(model_base_dir, exist_ok=True)
cfg_file_path = model_base_dir + "/config.json"
with open(cfg_file_path, 'w') as file:
     file.write(json.dumps(config))

# The code below will start the training process
* Note that it is expected to take around 15-20 hours for a single run (with a GPU). If you do not have a GPU, it can take even longer than that.
* After the training process is finished. The model would be saved to a file

In [13]:
#training_manager.fit_eval_dnn(config=config)

# Loading the trained model from the saved file

In [14]:
import json

In [15]:
loaded_config = json.load(open(cfg_file_path))
loaded_config

{'attention_type': 'Bahdanau',
 'attn_input_feeding': True,
 'batch_size': 512,
 'beam_width': 64,
 'default_device': '/gpu:0',
 'depth': 12,
 'dest_eos_token_index': 1,
 'dest_start_token_index': 2,
 'dropout_rate': 0.02,
 'embedding_size': 128,
 'hidden_units': 256,
 'learning_rate': 0.0005,
 'max_decode_step': 51,
 'max_dest_seq_length': 51,
 'max_gradient_norm': 1.0,
 'max_src_seq_length': 50,
 'model_base_dir': './trained_models/remove_articles',
 'model_saved_path': './trained_models/remove_articles/trained_model',
 'num_decoder_symbols': 102,
 'num_encoder_symbols': 98,
 'num_epochs': 20,
 'optimizer': 'adam',
 'padding_token_index': 0,
 'run_full_trace': False,
 'saving_last_model': True,
 'use_beamsearch_decode': False,
 'use_dropout': True,
 'use_embedding': True,
 'use_fp16': False,
 'use_residual': True}

In [16]:
decoding_config = {
    'beam_width': 100,
    'use_beamsearch_decode' : True,
    'max_decode_step': lang_data_preprocessor.max_dest_seq_length,
    'write_n_best' : True,
    'log_device_placement' : True,
    'batch_size': 1,
    'default_device' : '/cpu:0'
    
}

In [17]:
for k, v in loaded_config.items():
    if k not in decoding_config.keys():
        decoding_config[k] = v

In [18]:
decoding_config

{'attention_type': 'Bahdanau',
 'attn_input_feeding': True,
 'batch_size': 1,
 'beam_width': 100,
 'default_device': '/cpu:0',
 'depth': 12,
 'dest_eos_token_index': 1,
 'dest_start_token_index': 2,
 'dropout_rate': 0.02,
 'embedding_size': 128,
 'hidden_units': 256,
 'learning_rate': 0.0005,
 'log_device_placement': True,
 'max_decode_step': 51,
 'max_dest_seq_length': 51,
 'max_gradient_norm': 1.0,
 'max_src_seq_length': 50,
 'model_base_dir': './trained_models/remove_articles',
 'model_saved_path': './trained_models/remove_articles/trained_model',
 'num_decoder_symbols': 102,
 'num_encoder_symbols': 98,
 'num_epochs': 20,
 'optimizer': 'adam',
 'padding_token_index': 0,
 'run_full_trace': False,
 'saving_last_model': True,
 'use_beamsearch_decode': True,
 'use_dropout': True,
 'use_embedding': True,
 'use_fp16': False,
 'use_residual': True,
 'write_n_best': True}

In [19]:
import tensorflow as tf
tf.reset_default_graph() # to clean out all the variables to allow for rerunning the model
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True))

In [20]:
# Start the model
decoding_model = CharacterSeq2SeqModel(session=sess,
                              config=decoding_config, 
                              mode='decode')
# Restoring model parameters
decoding_model.restore(sess, decoding_config["model_saved_path"])

 Setting default device to  /cpu:0
building model..
building encoder..
building decoder and attention..
use beamsearch decoding..
 Initial state for decoder cell batch size is  100
building beamsearch decoder..
INFO:tensorflow:Restoring parameters from ./trained_models/remove_articles/trained_model
model restored from ./trained_models/remove_articles/trained_model


# Evaluating the model qualitatively with unseen data

In [21]:
def language_correction_text(input_text):
    return training_manager.seq2seq_execution(
    sess, decoding_model, decoding_config['batch_size'], 
    input_text=input_text)

In [22]:
language_correction_text("I run with table.")

'I run with the table..\n'

In [23]:
language_correction_text("I have car.")

'I have a car...\n'

In [24]:
language_correction_text("We work in office.")

'We work in the office.\n'

In [25]:
language_correction_text("I like to walk with stick.")

'I like to walk with the stick.\n'

In [26]:
language_correction_text("This is test.")

'This is a test. test.\n'

In [27]:
language_correction_text("I like to travel with car.")

'I like to travel with the car...\n'

In [28]:
language_correction_text("I am not sure I like apple.")

'I am not sure I like the apple.\n'

In [29]:
language_correction_text("She like to watch movie.")

'She like to watch the movie.\n'

In [30]:
language_correction_text("We go to theater.")

'We go to the theater..\n'

In [31]:
language_correction_text("This is cat.")

'This is a cat..\n'

In [32]:
language_correction_text("I would like to have homerun.")

'I would like to have a homerun.\n'

# Evaluating the model quantitatively on the test set

The code below simply retrieve the data as a batch and using the model to generate the predictions (decoded_text)
and compare with the ground truth (dest text). At the end of the evaluation, the code would output the accuracy metric, which is the fraction of samples that are matched between the ground truth and the predictions.

In [33]:
def evaluate_model(dataset):
    num_samples, num_corrected = 0, 0
    
    for source, source_len, dest, dest_len in dataset:
        for sample_idx in range(config['batch_size']):
            beam_idx = 0
            decoding_src_seq  = source[sample_idx, :]
            decoding_dest_seq = dest[sample_idx, :]
            
            input_text = lang_data_preprocessor.src_seq_to_text(decoding_src_seq)
            predicted_text = language_correction_text(input_text)
            actual_text = lang_data_preprocessor.dest_seq_to_text(
                decoding_dest_seq)[1:]   # Ignoring the first SEQ_START chracter
            
            print("Input text = '%s'"     % input_text)
            print("Dest text = '%s'"     %  actual_text)
            print("Decoded text = '%s'  " % predicted_text)
            print(" =======================")
            
            if actual_text.strip() == predicted_text.strip():
                num_corrected += 1
            
            num_samples += 1
    
    print(" Number of corrected decoding = ", num_corrected, ", out of ", num_samples)
    accuracy = (num_corrected + 0.0) / num_samples
    print(" Accuracy = %s " % accuracy)



In [34]:
train_data_list = list(training_manager.train_set)
train_data_list

[(array([[51., 72., 85., ...,  0.,  0.,  0.],
         [43., 72.,  3., ...,  0.,  0.,  0.],
         [87., 75., 76., ...,  0.,  0.,  0.],
         ...,
         [55., 75., 72., ...,  0.,  0.,  0.],
         [45., 72., 85., ...,  0.,  0.,  0.],
         [ 6.,  3., 58., ...,  0.,  0.,  0.]]),
  array([31, 35, 44, 41, 35, 26, 42, 45, 31, 44, 36, 34, 34, 47, 40, 41, 34,
         44, 46, 45, 48, 49, 47, 45, 35, 42, 40, 36, 47, 39, 46, 43, 42, 34,
         42, 48, 30, 22, 41, 30, 34, 47, 41, 44, 38, 45, 44, 34, 44, 41, 40,
         42, 35, 32, 40, 42, 32, 35, 48, 33, 45, 31, 43, 42, 28, 44, 42, 45,
         39, 37, 41, 33, 45, 37, 44, 46, 42, 33, 43, 34, 33, 44, 44, 31, 48,
         46, 26, 35, 40, 32, 40, 46, 34, 34, 31, 38, 37, 41, 41, 39, 49, 39,
         31, 47, 41, 45, 45, 44, 43, 22, 34, 44, 35, 43, 40, 44, 45, 42, 40,
         40, 46, 48, 19, 41, 44, 38, 38, 40, 28, 43, 41, 36, 43, 40, 37, 39,
         44, 34, 34, 27, 45, 35, 48, 27, 40, 40, 27, 43, 39, 43, 41, 34, 40,
         46, 39

In [None]:
import random

First, let evaluate the performance on the training set. We do data sampling here with *k* representing the number of batches to obtain early results before progressing to larger *k*

In [None]:
evaluate_model(random.sample(train_data_list, k=2))

Input text = '*There were 39 RVs during ceasefire period .
'
Dest text = '*There were 39 RVs during the ceasefire period.
'
Decoded text = '*There were 39 RVs during the ceasefire period.
'  
Input text = 'Several condo towers are located along beach .
'
Dest text = 'Several condo towers are located along the beach.
'
Decoded text = 'Several condo towers are located along the beach.
'  
Input text = 'The median age in city was 44 years .
'
Dest text = 'The median age in the city was 44 years.
'
Decoded text = 'The median age in the city was 44 years.
'  
Input text = 'Terry Labonte won pole .
'
Dest text = 'Terry Labonte won the pole.
'
Decoded text = 'Terry Labonte won the pole.
'  
Input text = '[ [ File : Steven Wright-I Have Pony.jpg ] ]
'
Dest text = '[[File:Steven Wright-I Have a Pony.jpg]]
'
Decoded text = '[[File:Steven Wright-I Have a Pony.jpg]]
'  
Input text = 'He died next year at age of 67 .
'
Dest text = 'He died the next year at the age of 67.
'
Decoded text = 'He died t

Input text = 'Speedy Keep is stupidest vote I 've seen .
'
Dest text = 'Speedy Keep is the stupidest vote I've seen.
'
Decoded text = 'Speedy Keep is a stupidest vote I've seen.
'  
Input text = 'He was national selector from 1907 to 1914 .
'
Dest text = 'He was a national selector from 1907 to 1914.
'
Decoded text = 'He was a national selector from 1907 to 1914.
'  
Input text = 'Please now follow link back to .
'
Dest text = 'Please now follow the link back to .
'
Decoded text = 'Please now follow the link back to .
'  
Input text = 'The result of debate was delete .
'
Dest text = 'The result of the debate was delete.
'
Decoded text = 'The result of the debate was delete.
'  
Input text = 'It was band 's only album for major label .
'
Dest text = 'It was the band's only album for a major label.
'
Decoded text = 'It was the band's only album for the major label.
'  
Input text = 'The result of debate was delete .
'
Dest text = 'The result of the debate was delete.
'
Decoded text = 'Th

Input text = '"^ For sun 's size , see Miller , Edward ."
'
Dest text = '"^ For the sun's size, see Miller, Edward."
'
Decoded text = '"^ For the sun's size, see Miller, Edward."
'  
Input text = 'Flanagan left school at age of 16 .
'
Dest text = 'Flanagan left school at the age of 16.
'
Decoded text = 'Flanagan left the school at the age of 16.
'  
Input text = '"However , Bagdad was very poor port ."
'
Dest text = '"However, Bagdad was a very poor port."
'
Decoded text = '"However, Bagdadad was a very poor port."
'  
Input text = 'The film became box office hit .
'
Dest text = 'The film became a box office hit.
'
Decoded text = 'The film became a box office hit.
'  
Input text = 'He says 'I 'm coming in to tak lot . '
'
Dest text = 'He says 'I'm coming in to tak the lot.'
'
Decoded text = 'He says 'I'm coming in to tak a lot.'
'  
Input text = 'That was end of my violin career .
'
Dest text = 'That was the end of my violin career.
'
Decoded text = 'That was the end of my violin caree

Input text = 'The deceased was buried in earth .
'
Dest text = 'The deceased was buried in the earth.
'
Decoded text = 'The deceased was buried in the earth.
'  
Input text = 'Finally troops were forced to open fire .
'
Dest text = 'Finally the troops were forced to open fire.
'
Decoded text = 'Finally the troops were forced to open fire.
'  
Input text = 'The result of debate was keep .
'
Dest text = 'The result of the debate was keep.
'
Decoded text = 'The result of the debate was keep.
'  
Input text = '"The population was 2,930 at 2010 census ."
'
Dest text = '"The population was 2,930 at the 2010 census."
'
Decoded text = '"The population was 2,930 at the 2010 census."
'  
Input text = 'Won became professional in 1998 .
'
Dest text = 'Won became a professional in 1998.
'
Decoded text = 'Won became a professional in 1998.
'  
Input text = 'They have not released series to home video .
'
Dest text = 'They have not released the series to home video.
'
Decoded text = 'They have not re

Input text = 'It covered area to east of King 's Lynn .
'
Dest text = 'It covered an area to the east of King's Lynn.
'
Decoded text = 'It covered the area to the east of King's Lynn.
'  
Input text = '"The per capita income for CDP was $ 25,085 ."
'
Dest text = '"The per capita income for the CDP was $25,085."
'
Decoded text = '"The per capita income for the CDP was $25,085."
'  
Input text = '"Cornwall won game 17-3 , scoring 5 tries ."
'
Dest text = '"Cornwall won the game 17-3, scoring 5 tries."
'
Decoded text = '"Cornwall won the game 17-3, scoring 5 tries."
'  
Input text = 'All that remained were few drops of water .
'
Dest text = 'All that remained were a few drops of water.
'
Decoded text = 'All that remained were few drops of the water.
'  
Input text = 'It was debut release from both bands .
'
Dest text = 'It was the debut release from both bands.
'
Decoded text = 'It was a debut release from both bands.
'  
Input text = '"As of 2010 county population was 11,800 ."
'
Dest te

Input text = 'How did I know he 'd have band ?
'
Dest text = 'How did I know he'd have a band?
'
Decoded text = 'How did I know he'd have a band?
'  
Input text = 'Denmark and Finland advanced for playoff A .
'
Dest text = 'Denmark and Finland advanced for the playoff A.
'
Decoded text = 'Denmark and Finland advanced for the playoff A.
'  
Input text = 'This spawned American Southern culture .
'
Dest text = 'This spawned the American Southern culture.
'
Decoded text = 'This spawned the American Southern culture.
'  
Input text = 'The result of debate was merge/redirect .
'
Dest text = 'The result of the debate was merge/redirect.
'
Decoded text = 'The result of the debate was merge/redirect.
'  
Input text = 'Barnes and Trost created label L.M .
'
Dest text = 'Barnes and Trost created the label L.M.
'
Decoded text = 'Barnes and Trost created the label L.M.
'  
Input text = '"The per capita income for CDP was $ 5,165 ."
'
Dest text = '"The per capita income for the CDP was $5,165."
'
De

Input text = 'He was later protected by 1904 New Zealand law .
'
Dest text = 'He was later protected by a 1904 New Zealand law.
'
Decoded text = 'He was later protected by a 1904 New Zealand law.
'  
Input text = ': :Who gazed into Burdett-Coutts estate ?
'
Dest text = '::Who gazed into the Burdett-Coutts estate?
'
Decoded text = '::Who gazed into the Burdett-Coutts estate?
'  
Input text = '"At St Mark 's , he set about reforming choir ."
'
Dest text = '"At St Mark's, he set about reforming the choir."
'
Decoded text = '"At St Mark's, he set about reforming the choir."
'  
Input text = 'Japan encouraged modernization of Korea .
'
Dest text = 'Japan encouraged the modernization of Korea.
'
Decoded text = 'Japan encouraged the modernization of Korea.
'  
Input text = 'Mark Kirkland directed episode .
'
Dest text = 'Mark Kirkland directed the episode.
'
Decoded text = 'Mark Kirkland directed the episode.
'  
Input text = 'After finishing chord I looked up .
'
Dest text = 'After the finis

Input text = '"Also in 1959 , airline joined IATA ."
'
Dest text = '"Also in 1959, the airline joined IATA."
'
Decoded text = '"Also in 1959, the airline joined IATA."
'  
Input text = 'while Grampa was out with family .
'
Dest text = 'while Grampa was out with the family.
'
Decoded text = 'while Grampa was out with the family.
'  
Input text = 'The 618 had no stars on shoulder epaulettes .
'
Dest text = 'The 618 had no stars on the shoulder epaulettes.
'
Decoded text = 'The 618 had no stars on the shoulder epaulettes.
'  
Input text = 'I think someone got desperate and ate natto .
'
Dest text = 'I think someone got desperate and ate the natto.
'
Decoded text = 'I think someone got a desperate and ate natto.
'  
Input text = 'Nunsense played at theatre in 1987 .
'
Dest text = 'Nunsense played at the theatre in 1987.
'
Decoded text = 'Nunsense played at the theatre in 1987.
'  
Input text = 'They return home and family eats dinner .
'
Dest text = 'They return home and the family eats di

Input text = 'The fires did great damage to young growth .
'
Dest text = 'The fires did great damage to the young growth.
'
Decoded text = 'The fires did a great damage to young growth.
'  
Input text = 'In 1911 federal election he was H.H .
'
Dest text = 'In the 1911 federal election he was H.H.
'
Decoded text = 'In 1911 the federal election he was H.H.
'  
Input text = 'They were in studio for roughly 5 weeks .
'
Dest text = 'They were in the studio for roughly 5 weeks.
'
Decoded text = 'They were in the studio for roughly 5 weeks.
'  
Input text = '"Eventually , they are trapped by waterfall ."
'
Dest text = '"Eventually, they are trapped by a waterfall."
'
Decoded text = '"Eventually, they are trapped by the waterfall."
'  
Input text = 'The median age in town was 57.1 years .
'
Dest text = 'The median age in the town was 57.1 years.
'
Decoded text = 'The median age in the town was 57.1 years.
'  
Input text = 'Perhaps few who have made it into news .
'
Dest text = 'Perhaps a few w

Input text = 'She also discovers that Akkarin was slave .
'
Dest text = 'She also discovers that Akkarin was a slave.
'
Decoded text = 'She also discovers that Akkarin was the slave.
'  
Input text = 'People magazine ran a story on T-shirt .
'
Dest text = 'People magazine ran a story on the T-shirt.
'
Decoded text = 'People magazine ran a story on the T-shirt.
'  
Input text = '": :Oh , I agree it 's huge loss ."
'
Dest text = '"::Oh, I agree it's a huge loss."
'
Decoded text = '"::Oh, I agree it's a huge loss."
'  
Input text = 'The inmates are running asylum .
'
Dest text = 'The inmates are running the asylum.
'
Decoded text = 'The inmates are running the asylum.
'  
Input text = 'He was first president of EIZIE in 1987 .
'
Dest text = 'He was the first president of EIZIE in 1987.
'
Decoded text = 'He was the first president of EIZIE in 1987.
'  
Input text = 'The population was 759 at 2010 census .
'
Dest text = 'The population was 759 at the 2010 census.
'
Decoded text = 'The popul

Input text = '* kiddiematinee.com / The Boy and Pirates
'
Dest text = '* kiddiematinee.com / The Boy and the Pirates
'
Decoded text = '* kiddiematinee.com / The Boy and the Pirates
'  
Input text = 'or You got ta see baby !
'
Dest text = 'or You gotta see the baby!
'
Decoded text = 'or You got ta see the baby!
'  
Input text = 'Police blamed violence on the demonstrators .
'
Dest text = 'Police blamed the violence on the demonstrators.
'
Decoded text = 'Police blamed the violence on the demonstrators.
'  
Input text = '"The per capita income for town was $ 13,686 ."
'
Dest text = '"The per capita income for the town was $13,686."
'
Decoded text = '"The per capita income for the town was $13,686."
'  
Input text = 'Could you give me piece of advice .
'
Dest text = 'Could you give me a piece of advice.
'
Decoded text = 'Could you give me a piece of advice.
'  
Input text = 'It reached number 32 on UK Singles Chart .
'
Dest text = 'It reached number 32 on the UK Singles Chart.
'
Decoded t

Input text = 'Customs revenue paid off 1860 indemnities .
'
Dest text = 'Customs revenue paid off the 1860 indemnities.
'
Decoded text = 'Customs revenue paid off the 1860 indemnities.
'  
Input text = 'The song also featured new verse from Nas .
'
Dest text = 'The song also featured a new verse from Nas.
'
Decoded text = 'The song also featured a new verse from Nas.
'  
Input text = 'The population was 881 at 2010 census .
'
Dest text = 'The population was 881 at the 2010 census.
'
Decoded text = 'The population was 881 at the 2010 census.
'  
Input text = '"Hence term I think , therefore I am ."
'
Dest text = '"Hence the term I think, therefore I am."
'
Decoded text = '"Hence the term I think, therefore I am."
'  
Input text = '"The per capita income for city was $ 16,502 ."
'
Dest text = '"The per capita income for the city was $16,502."
'
Decoded text = '"The per capita income for the city was $16,502."
'  
Input text = '"The population was 5,058 at 2000 census ."
'
Dest text = '"T

Input text = '"For painter , see Jimmy Ernst ."
'
Dest text = '"For the painter, see Jimmy Ernst."
'
Decoded text = '"For the painter, see Jimmy Ernst."
'  
Input text = 'Buses have been used on island since 1905 .
'
Dest text = 'Buses have been used on the island since 1905.
'
Decoded text = 'Buses have been used on the island since 1905.
'  
Input text = 'Valiante led series early in 2003 .
'
Dest text = 'Valiante led the series early in 2003.
'
Decoded text = 'Valiante led the series early in 2003.
'  
Input text = 'The result of debate was delete .
'
Dest text = 'The result of the debate was delete.
'
Decoded text = 'The result of the debate was delete.
'  
Input text = '"The per capita income for city was $ 21,330 ."
'
Dest text = '"The per capita income for the city was $21,330."
'
Decoded text = '"The per capita income for the city was $21,330."
'  
Input text = '"He did not state cost of deal , however ."
'
Dest text = '"He did not state the cost of the deal, however."
'
Decode

Input text = 'Other bastions of resistance dot landscape .
'
Dest text = 'Other bastions of resistance dot the landscape.
'
Decoded text = 'Other bastions of the resistance dot landscape.
'  
Input text = 'I agree that article needs a lot of work .
'
Dest text = 'I agree that the article needs a lot of work.
'
Decoded text = 'I agree that the article needs a lot of work.
'  
Input text = 'Merivale died at Ely at age of 85 .
'
Dest text = 'Merivale died at Ely at the age of 85.
'
Decoded text = 'Merivale died at Ely at the age of 85.
'  
Input text = 'Two incarnations of team existed .
'
Dest text = 'Two incarnations of the team existed.
'
Decoded text = 'Two incarnations of the team existed.
'  
Input text = 'Their spines were connected at 90 degree angle .
'
Dest text = 'Their spines were connected at a 90 degree angle.
'
Decoded text = 'Their spines were connected at a 90 degree angle.
'  
Input text = 'The result of debate was delete .
'
Dest text = 'The result of the debate was del

Input text = 'Then Shore came in to game to relieve Ruth .
'
Dest text = 'Then Shore came in to the game to relieve Ruth.
'
Decoded text = 'Then Shore came in to game to the relieve Ruth.
'  
Input text = 'He took part in X-Crise Group .
'
Dest text = 'He took part in the X-Crise Group.
'
Decoded text = 'He took part in the X-Crise Group.
'  
Input text = 'At 2010 census population was 136 .
'
Dest text = 'At the 2010 census the population was 136.
'
Decoded text = 'At the 2010 census the population was 136.
'  
Input text = 'The result of debate was delete .
'
Dest text = 'The result of the debate was delete.
'
Decoded text = 'The result of the debate was delete.
'  
Input text = 'Sheen subsequently went on nationwide tour .
'
Dest text = 'Sheen subsequently went on a nationwide tour.
'
Decoded text = 'Sheen subsequently went on a nationwide tour.
'  
Input text = '*We 're off to see wizard !
'
Dest text = '*We're off to see the wizard!
'
Decoded text = '*We're off to see the wizard!


Input text = 'Legacy sources cause new problem in their own .
'
Dest text = 'Legacy sources cause a new problem in their own.
'
Decoded text = 'Legacy sources cause a new problem in their own.
'  
Input text = 'There are several ways to create shadow .
'
Dest text = 'There are several ways to create a shadow.
'
Decoded text = 'There are several ways to create a shadow.
'  
Input text = '"However , right eroded over time ."
'
Dest text = '"However, the right eroded over time."
'
Decoded text = '"However, the right eroded over time."
'  
Input text = 'Females give birth to up to 12 young at time .
'
Dest text = 'Females give birth to up to 12 young at a time.
'
Decoded text = 'Females give birth to up to 12 young at the time.
'  
Input text = 'A music video for song was also made .
'
Dest text = 'A music video for the song was also made.
'
Decoded text = 'A music video for the song was also made.
'  
Input text = 'The CDP 's population was 743 at 2010 census .
'
Dest text = 'The CDP's po

Input text = '"Gardner kept journal , much of which was lost ."
'
Dest text = '"Gardner kept a journal, much of which was lost."
'
Decoded text = '"Gardner kept a journal, much of which was lost."
'  
Input text = 'His goal was now to become teacher .
'
Dest text = 'His goal was now to become a teacher.
'
Decoded text = 'His goal was now to become a teacher.
'  
Input text = '"What do you think he is , mind reader ?"
'
Dest text = '"What do you think he is, a mind reader?"
'
Decoded text = '"What do you think he is, a mind reader?"
'  
Input text = '"So I disagree , claim is still unverified ."
'
Dest text = '"So I disagree, the claim is still unverified."
'
Decoded text = '"So I disagree, the claim is still unverified."
'  
Input text = 'His wife Ella was writer and teacher .
'
Dest text = 'His wife Ella was a writer and teacher.
'
Decoded text = 'His wife Ella was a writer and a teacher.
'  
Input text = 'I do n't know if I need article or not .
'
Dest text = 'I don't know if I need 

Input text = '"**Comment No , I get intent of template ."
'
Dest text = '"**Comment No, I get the intent of the template."
'
Decoded text = '"**Comment No, I get the intent of the template."
'  
Input text = '"The per capita income for city was $ 13,313 ."
'
Dest text = '"The per capita income for the city was $13,313."
'
Decoded text = '"The per capita income for the city was $13,313."
'  
Input text = 'The median age in city was 44.5 years .
'
Dest text = 'The median age in the city was 44.5 years.
'
Decoded text = 'The median age in the city was 44.5 years.
'  
Input text = '"The per capita income for city was $ 18,429 ."
'
Dest text = '"The per capita income for the city was $18,429."
'
Decoded text = '"The per capita income for the city was $18,429."
'  
Input text = '"Of course , jury probably believed him ."
'
Dest text = '"Of course, the jury probably believed him."
'
Decoded text = '"Of course, the jury probably believed him."
'  
Input text = 'There are ton of people there .


Input text = '"The per capita income for city was $ 9,378 ."
'
Dest text = '"The per capita income for the city was $9,378."
'
Decoded text = '"The per capita income for the city was $9,378."
'  
Input text = 'Note that groupings are not square .
'
Dest text = 'Note that the groupings are not square.
'
Decoded text = 'Note that groupings are not a square.
'  
Input text = 'The result of debate was } deletion .
'
Dest text = 'The result of the debate was } deletion.
'
Decoded text = 'The result of the debate was } deletion.
'  
Input text = 'It smashed and all wisdom fell out .
'
Dest text = 'It smashed and all the wisdom fell out.
'
Decoded text = 'It smashed and all the wisdom fell out.
'  
Input text = 'The vault was rebuilt in 15th century .
'
Dest text = 'The vault was rebuilt in the 15th century.
'
Decoded text = 'The vault was rebuilt in the 15th century.
'  
Input text = 'Evans died in fire in the Netherlands in 1977 .
'
Dest text = 'Evans died in a fire in the Netherlands in 19

Input text = 'The total length of line was 5+1/4 mi .
'
Dest text = 'The total length of the line was 5+1/4 mi.
'
Decoded text = 'The total length of the line was 5+1/4 mi.
'  
Input text = '*Keep-The user who created this is newbie .
'
Dest text = '*Keep-The user who created this is a newbie.
'
Decoded text = '*Keep-The user who created this is a newbie.
'  
Input text = 'The seat of municipality was in Asprangeloi .
'
Dest text = 'The seat of the municipality was in Asprangeloi.
'
Decoded text = 'The seat of the municipality was in Asprangeloi.
'  
Input text = 'The building was constructed in 1950s .
'
Dest text = 'The building was constructed in the 1950s.
'
Decoded text = 'The building was constructed in the 1950s.
'  
Input text = 'This violated secrecy of ballot .
'
Dest text = 'This violated the secrecy of the ballot.
'
Decoded text = 'This violated secrecy of the ballot.
'  
Input text = 'They earned derisive label Mink Brigade .
'
Dest text = 'They earned the derisive label t

Input text = 'We are trying to take Gandhian approach .
'
Dest text = 'We are trying to take the Gandhian approach.
'
Decoded text = 'We are trying to take the Gandhian approach.
'  
Input text = '"The population was 3,072 at 2000 census ."
'
Dest text = '"The population was 3,072 at the 2000 census."
'
Decoded text = '"The population was 3,072 at the 2000 census."
'  
Input text = 'I have created first Wikimoney lottery at .
'
Dest text = 'I have created the first Wikimoney lottery at .
'
Decoded text = 'I have created the first Wikimoney lottery at .
'  
Input text = '"The per capita income for CDP was $ 18,392 ."
'
Dest text = '"The per capita income for the CDP was $18,392."
'
Decoded text = '"The per capita income for the CDP was $18,392."
'  
Input text = 'An earlier name for town was Macon .
'
Dest text = 'An earlier name for the town was Macon.
'
Decoded text = 'An earlier name for the town was Macon.
'  
Input text = '"Case Plow Works , was independent business ."
'
Dest text 

In [None]:
evaluate_model(random.sample(train_data_list, k=10))

We also evaluate in the validation set to get a feeling of the performance differences

# The key evaluation on the test set

Below is the performance of the model on the unseen dataset

In [None]:
evaluate_model(training_manager.test_set)

Number of corrected decoding =  66564 , out of  88064
Accuracy = 0.755859375 