## Keras LSTM Time-Series Prediction

Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) can be used to learn time-series data. Using Mean Square Error as a loss function it is possible to the future values of a time series. Here, an LSTM RNN is trained on the closing price of S&P500 time series with a lookback window of 10 days. A trading strategy can be constructed by using LSTM prediction price change sign as a signal for upward or downward price movement. For a momentum strategy, we long S&P500 whenever prediction is above a positive threshold, we short S&P500 when the prediction is below a negative threshold and we clear positions otherwise. 


In [1]:
# import matplotlib
%matplotlib inline

import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

from pandas_datareader import data
from datetime import datetime
import pytz

import matplotlib.pyplot as plt

np.random.seed(0)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
def create_dataset(dataset, look_back=1):

    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
        dataX.append(dataset[i:(i+look_back),0])
        dataY.append(dataset[i+look_back,0])

    return np.array(dataX), np.array(dataY)

In [4]:
from pandas_datareader.google.daily import GoogleDailyReader

@property
def url(self):
    return 'http://finance.google.com/finance/historical'

GoogleDailyReader.url = url

# get data

import pandas_datareader as pdr
from datetime import datetime

start = datetime(2010,1,1)
end = datetime(2014,1,1)
ret = pdr.get_data_google(['AAPL'], start, end)

In [6]:
#load data
start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2016, 1, 1, 0, 0, 0, 0, pytz.utc)
# spy = data.DataReader("SPY", "google", start, end)
spy = pdr.get_data_google(['SPY'], start, end)
dataset = np.array(spy['Close'].values).reshape(-1,1)
dataset = dataset.astype('float32')
# spy.head()

We scale the closing price to 0 to 1 range:

In [7]:
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(dataset)

In [8]:
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

In [9]:
# reshape for look_back
look_back = 10
X_train, y_train = create_dataset(train, look_back)
X_test, y_test = create_dataset(test, look_back)

# reshape for LSTM [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

Next, we define our model:

In [10]:
# LSTM
model = Sequential()
model.add(LSTM(32, input_dim=1)) #look_back))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, nb_epoch=100, batch_size=5, verbose=2)

  This is separate from the ipykernel package so we can avoid doing imports until
  This is separate from the ipykernel package so we can avoid doing imports until


Epoch 1/100
 - 2s - loss: 0.1327
Epoch 2/100
 - 0s - loss: 0.0138
Epoch 3/100
 - 0s - loss: 0.0122
Epoch 4/100
 - 0s - loss: 0.0122
Epoch 5/100
 - 0s - loss: 0.0117
Epoch 6/100
 - 0s - loss: 0.0119
Epoch 7/100
 - 0s - loss: 0.0119
Epoch 8/100
 - 0s - loss: 0.0115
Epoch 9/100
 - 0s - loss: 0.0111
Epoch 10/100
 - 0s - loss: 0.0110
Epoch 11/100
 - 0s - loss: 0.0111
Epoch 12/100
 - 0s - loss: 0.0107
Epoch 13/100
 - 0s - loss: 0.0108
Epoch 14/100
 - 0s - loss: 0.0104
Epoch 15/100
 - 0s - loss: 0.0102
Epoch 16/100
 - 0s - loss: 0.0101
Epoch 17/100
 - 0s - loss: 0.0098
Epoch 18/100
 - 0s - loss: 0.0097
Epoch 19/100
 - 0s - loss: 0.0095
Epoch 20/100
 - 0s - loss: 0.0093
Epoch 21/100
 - 0s - loss: 0.0092
Epoch 22/100
 - 0s - loss: 0.0111
Epoch 23/100
 - 0s - loss: 0.0094
Epoch 24/100
 - 0s - loss: 0.0102
Epoch 25/100
 - 0s - loss: 0.0090
Epoch 26/100
 - 0s - loss: 0.0086
Epoch 27/100
 - 0s - loss: 0.0083
Epoch 28/100
 - 0s - loss: 0.0082
Epoch 29/100
 - 0s - loss: 0.0081
Epoch 30/100
 - 0s - lo

<keras.callbacks.History at 0x118791a20>

In [13]:
train_pred = model.predict(X_train)
test_pred = model.predict(X_test)

In [14]:
# scale back 
train_pred = scaler.inverse_transform(train_pred)
y_train = scaler.inverse_transform(y_train)
test_pred = scaler.inverse_transform(test_pred)
y_test = scaler.inverse_transform(y_test) 

ValueError: Expected 2D array, got 1D array instead:
array=[0.54815435 0.56452227 0.60335016 0.71831036 0.67529535 0.6935668
 0.59040785 0.49143553 0.5618582  0.46516943 0.55919313 0.67034674
 0.64065504 0.7190714  0.69737387 0.66235304 0.745337   0.74990463
 0.8256569  0.85839367 0.87095594 0.871717   0.86600685 0.91397095
 0.91282845 0.935668   0.9288168  0.9193001  0.89189243 0.94252014
 0.90940285 0.87552357 0.88427925 0.77160263 0.8043399  0.67567587
 0.65740395 0.75637674 0.70803213 0.8127146  0.7891135  0.88427925
 0.8477354  0.8823757  0.86676836 0.8218503  0.70536757 0.6867151
 0.70460653 0.8001523  0.7308717  0.7030835  0.7312527  0.7841649
 0.7632284  0.78987455 0.8248954  0.8682909  0.83212805 0.8473549
 0.88313675 0.8808527  0.7887325  0.8610587  0.851542   0.89074993
 0.9109249  0.9295778  0.89608    0.9215841  0.88846684 0.8081465
 0.8941765  0.917016   0.8248954  0.7921581  0.82375336 0.9284353
 0.8896084  0.86600685 0.86752987 0.9508953  0.95965004 0.98477364
 0.98210907 0.9763994  1.         0.980587   0.8934145  0.9695473
 0.960412   0.9101639  0.92653275 0.9185381  0.9398556  0.871717
 0.85801315 0.8089075  0.8077655  0.9029312  0.9288168  0.8671489
 0.83288956 0.8762846  0.8892269  0.97259235 0.8976021  0.938714
 0.9444237  0.8858013  0.8614392  0.8599167  0.6924248  0.70879364
 0.77160263 0.76436996 0.7419114  0.7913976  0.658546   0.6726303
 0.7708411  0.85801315 0.89265347 0.8899889  0.9543209  0.9611721
 0.9653597  0.93338394 0.9189186  0.87362003 0.79063606 0.7445755
 0.84126425 0.89608    0.8979826  0.8858013  0.8587742  0.8431673
 0.8694334  0.80395937 0.7887325  0.88846684 0.8161402  0.8256569
 0.81575966 0.84468985 0.8892269  0.86600685 0.80281734 0.6387515
 0.39588928 0.08831406 0.         0.28359365 0.45413065 0.4571762
 0.39246273].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [None]:
# shift predictions for plotting
train_pred_plot = np.empty_like(dataset)
train_pred_plot[:,:] = np.nan
train_pred_plot[look_back:len(train_pred)+look_back,:] = train_pred

test_pred_plot = np.empty_like(dataset)
test_pred_plot[:,:] = np.nan
test_pred_plot[len(train_pred)+(look_back*2)+1:len(dataset)-1,:] = test_pred

In [None]:
plt.figure()
plt.plot(scaler.inverse_transform(dataset), color='b', lw=2.0, label='S&P 500')
plt.plot(train_pred_plot, color='g', lw=2.0, label='LSTM train')
plt.plot(test_pred_plot, color='r', lw=2.0, label='LSTM test')
plt.legend(loc=3)
plt.grid(True)

The figure above shows LSTM predictions on the training (green) and the test (red) time series. We can see the regression results closely match the actual market price.