In [1]:
# ML_in_Finance-1D-CNNs
# Author: Matthew Dixon
# Version: 1.0 (24.7.2019)
# License: MIT
# Email: matthew.dixon@iit.edu
# Notes: tested on Mac OS X with Python 3.6 and Tensorflow 1.3.0
# Citation: Please cite the following reference if this notebook is used for research purposes:
# Bilokon P., Dixon M.F. and I. Halperin, Machine Learning in Finance: From Theory to Practice, Springer Graduate textbook Series, 2020. 

# Using Keras to implement a 1D convolutional neural network (CNN) for timeseries prediction.

In [2]:
import numpy as np

from keras.layers import Conv1D, Dense, MaxPooling1D, Flatten
from keras.models import Sequential

Using TensorFlow backend.


In [3]:
np.set_printoptions(threshold=25)

### Creating the 1D CNN

We create a simple convolutional neural network: a 1D convolutional layer, followed by a dense layer.

It will allow us to predict the next value in a timeseries given an input sequence of length `window_size`

The `filter_length` is the length (in time-steps) of the sliding window that gets convolved with each position along each instance. The difference between 1D and 2D convolution is that a 1D filter's "height" is fixed to the number of input timeseries (its "width" being `filter_length`), and it can only slide along the window dimension.  This is useful as generally the input timeseries have no spatial/ordinal relationship, so it's not meaningful to look for patterns that are invariant with respect to subsets of the timeseries.
`nb_filter` is the number of such filters to learn (roughly, input patterns to recognize).

The model can handle multivariate timeseries (with `nb_input_series` variables) and multiple (`nb_outputs`) prediction targets. Predicting future values of a timeseries means setting these equal to one another.

In [4]:
def make_CNN(window_size, filter_length,  nb_filter=4, nb_input_series=1, nb_outputs=1):
    """
    window_size (int): number of observations in each input sequence
    filter length (int): length of the convolutional layer's filters
    nb_filter (int): number of filters learned in the convolutional layer
    nb_input_series (int): number of features of the input timeseries (1 for a univariate timeseries)
    nb_outputs (int): number of features being predicted (equal to nb_input_series 
        for predicting a timeseries at a set horizon)"""
    
    model = Sequential((
        # The convolutional layer learns `nb_filter` filters (aka kernels), 
        # each of size `(filter_length, nb_input_series)`.  
        # Its output will have shape `(None, window_size - filter_length + 1, nb_filter)` ,  
        # i.e., for each position in the input timeseries, the activation of each filter at that position.
        Conv1D(filters=nb_filter, kernel_size=filter_length, activation='relu', input_shape=(window_size, 1)),
        Flatten(),
        Dense(1, activation='linear'), # For classification, a 'sigmoid' activation function would be used
    ))
    model.compile(loss='mse', optimizer='adam', metrics=['mae'])
    
    return model

In [5]:
CNN_model = make_CNN(window_size=50, filter_length=5, nb_filter=4)

In [6]:
print('input shape:', CNN_model.layers[0].input_shape)
print('output shape:', CNN_model.layers[-1].output_shape)

input shape: (None, 50, 1)
output shape: (None, 1)


### Data preparation
  
We define a function to format a timeseries for training the neural network. It creates corresponding arrays of input sequences `X` and output values `y`. They have the same length as each other; the remaining dimensions must match the input and output layers of the model respectively:

The `X` input to the model's `fit()` method should be a 3D array of shape `(n_instances, window_size, n_ts_variables)`; each instance being a 2D array of shape `(window_size, nb_input_series)`.  For example, for `window_size = 3` and `nb_input_series = 1` (a univariate timeseries), one instance could be `[[0], [1], [2]]`

For each input instance, the output is a vector of size `nb_outputs`, usually the value(s) predicted to come after the last value in that input instance, i.e., the next value in the sequence. The `y` input to ``fit()`` should be an array of shape ``(n_instances, nb_outputs)``. 


In [7]:
def make_timeseries_instances(timeseries, window_size):
    # Convert 1D vectors to 2D column vectors
    timeseries = np.atleast_2d(timeseries)
    if timeseries.shape[0] == 1:
        timeseries = timeseries.T 

    if not 0 < window_size < timeseries.shape[0]:
        raise ValueError('Please set 0 < window size < timeseries length')
    
    # `X `is the tensor containing the inputs for the model
    # each row of `X` is a sequence of `window_size` observations from the timeseries
    X = [timeseries[start:start + window_size] for start in range(0, timeseries.shape[0] - window_size)]
    
    # for training the model, the array's dimensions must match the input layer of the CNN
    # that is, a 3D array of shape (timeseries.shape[0] - window_size, window_size, nof_ts_variables)
    X = np.atleast_3d(np.array(X))
    
    # For each row of `X`, the corresponding row of `y` is the 
    # desired output -- in this case, the subsequent value in the timeseries 
    y = timeseries[window_size:]
    
    return X, y

For example:  

In [8]:
X_fib, y_fib = make_timeseries_instances([1,1,2,3,5,8,13,21], 5)
print('X:', X_fib, 'y:', y_fib, sep='\n')

X:
[[[ 1]
  [ 1]
  [ 2]
  [ 3]
  [ 5]]

 [[ 1]
  [ 2]
  [ 3]
  [ 5]
  [ 8]]

 [[ 2]
  [ 3]
  [ 5]
  [ 8]
  [13]]]
y:
[[ 8]
 [13]
 [21]]


Create a toy timeseries and split it for training and testing the CNN:

In [9]:
timeseries = np.arange(1000)

In [None]:
X, y = make_timeseries_instances(timeseries, window_size=50)

In [24]:
i = 42
print('input instance:\n', X[i])
print('output instance:\n', y[i])

input instance:
 [[42]
 [43]
 [44]
 ...
 [89]
 [90]
 [91]]
output instance:
 [92]


In [10]:
test_ratio = 0.01 # In real life you'd usually want to use 0.2 - 0.5
test_size = int(test_ratio * len(timeseries)) 

# the "most recent" values are used for testing the model to avoid look-ahead bias
X_train, X_test, y_train, y_test = X[:-test_size], X[-test_size:], y[:-test_size], y[-test_size:]

Note the dimensions of the arrays:

In [None]:
[i.shape for i in [X_train, X_test, y_train, y_test]]

### Training the model

Now we can fit the model. Note that `validation_data` is not used to train the model, but allows you to monitor its out-of-sample performance during training

In [14]:
CNN_model.fit(X_train, y_train, epochs=25, batch_size=2, validation_data=(X_test, y_test))

Train on 940 samples, validate on 10 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.callbacks.History at 0x649aa0400>

We can inspect the weights of the convolutional layer:

In [31]:
CNN_model.summary()

print(CNN_model.get_layer('conv1d_1').weights)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 46, 4)             24        
_________________________________________________________________
flatten_1 (Flatten)          (None, 184)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 185       
Total params: 209
Trainable params: 209
Non-trainable params: 0
_________________________________________________________________
[<tf.Variable 'conv1d_1/kernel:0' shape=(5, 1, 4) dtype=float32, numpy=
array([[[-0.44083518,  0.3101118 ,  0.15849958,  0.0360361 ]],

       [[ 0.46102944,  0.40795228, -0.01246828, -0.12470502]],

       [[-0.48904577, -0.01921174,  0.27507454, -0.43751034]],

       [[-0.11164144, -0.3847304 ,  0.07413552, -0.11353293]],

       [[ 0.02154145, -0.413403  ,  0.37336847,  0.2116925

### Making predictions with the model
Get the predicted values for the test set:

In [16]:
y_pred = CNN_model.predict(X_test)

In [17]:
print(np.column_stack((y_test, y_pred)))

[[990.         990.04022217]
 [991.         991.04064941]
 [992.         992.04095459]
 [993.         993.04125977]
 [994.         994.04156494]
 [995.         995.04187012]
 [996.         996.04223633]
 [997.         997.04266357]
 [998.         998.04290771]
 [999.         999.04321289]]
