Recurrent Neural Network model for regression in sequences
====

In this example we will use the RNNModel to set up an experiment over one of the Spice (http://spice.lif.univ-mrs.fr/index.php) competence for sequence prediction, held in 2016. We will start by downloading and preprocessing the dataset.

In [1]:
ORIGIN_URL = 'http://spice.lif.univ-mrs.fr/data/1.spice.train'
DATASET_DIR = 'downloads'
DATASET_FILENAME = 'spice_dataset.txt'

In [2]:
# add parent directory to python path
import sys
sys.path.append('../')

In [3]:
import numpy
import os
import urllib
import tensorflow as tf

In [4]:
import utils
utils.safe_mkdir(DATASET_DIR)

In [5]:
def maybe_download():
    """Downloads dataset if it doesn't exists"""
    filename = os.path.join(DATASET_DIR, DATASET_FILENAME)
    if os.path.exists(filename):
        return
    urllib.urlretrieve(ORIGIN_URL, filename)

maybe_download()

The dataset file consists on a series of numerical sequences, one per line, including a header line that we will ignore. We will try to predict the last element of each sequence. 

In [6]:
def read_dataset():
    """Reads the dataset. Returns a list with sequences and a list of labels"""
    with open(os.path.join(DATASET_DIR, DATASET_FILENAME), 'r') as input_file:
        lines = input_file.readlines()[1:]  # Ignore the header
    # Split lines and convert numbers to int.
    sequences = [[numpy.array([int(value)], dtype=numpy.int16) for value in line.split()] for line in lines]
    instances = [sequence[:-1] for sequence in sequences]
    labels = [sequence[-1] for sequence in sequences]
    return numpy.array(instances), numpy.array(labels)

In [7]:
instances, labels = read_dataset()

We can now create the dataset using the extracted instances and labels

In [13]:
import dataset
dataset = reload(dataset)

samples = 1
partition_sizes = {'train': 0.7, 'test': 0.2, 'validation': 0.1}

splice_dataset = dataset.SequenceDataset()
splice_dataset.create_samples(instances, labels, samples, partition_sizes, use_numeric_labels=True)

In [14]:
import experiment
experiment = reload(experiment)
from models import lstm, mlp
mlp = reload(mlp)
lstm = reload(lstm)

config = {
    'model': lstm.LSTMModel,
    'model_arguments': {'hidden_layer_size': 50, 'batch_size': 100,
                        'logs_dirname': '../../results/examples/splice/',
                        'log_values': True, 'training_epochs': 1000}
}
splice_experiment = experiment.SampledExperiment(splice_dataset, config=config)

In [15]:
tf.reset_default_graph()
splice_experiment.run()

INFO:root:Classifier loss at step 0: 2.99467945099
INFO:root:Validation accuracy 0.32316158079
INFO:root:Classifier loss at step 100: 2.8269238472
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 200: 2.86058950424
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 300: 2.81255316734
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 400: 2.81250476837
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 500: 2.84739565849
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 600: 2.83434700966
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 700: 2.89007878304
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 800: 2.83562874794
INFO:root:Validation accuracy 0.150575287644
INFO:root:Classifier loss at step 900: 2.86018514633
INFO:root:Validation accuracy 0.150575287644
INFO:root:
	Precision	Re