# LSTM Multi-Feature Stock Price Prediction with Keras

# Overview

This is a daily Apple stock price forecast project using LSTM with four features inputs and one feature output. 


# Background for LSTM
The long short-term memory (LSTM) unit is an improved version of gated recurrent unit (GRU), which tries to resolve the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html) and keep the long term "memory" activated.

See my other [project](https://github.com/ginochen/LSTM/blob/master/LSTM_min_temp.ipynb) for a picture summary on the network architecture.

# Data 
Daily data of [AAPL](https://finance.yahoo.com/quote/AAPL/history?p=AAPL) and [SPY](https://finance.yahoo.com/quote/SPY/history?p=SPY&.tsrc=fin-srch) downloaded from Yahoo Finance.

Then create batches of data with a generator:

In [253]:
class KerasBatchGenerator(object):

#    def __init__(self, data, num_steps, batch_size, vocabulary, skip_step=5):
    def __init__(self, data, num_steps, batch_size, ndim, skip_step=5):
        self.data = data
        self.num_steps = num_steps # forecast time steps
        self.batch_size = batch_size 
        self.ndim = ndim # number of feature dimensions
        # this will track the progress of the batches sequentially through the
        # data set - once the data reaches the end of the data set it will reset
        # back to zero
        self.current_idx = 0
        # skip_step is the number of words which will be skipped before the next
        # batch is skimmed from the data set
        self.skip_step = skip_step

    def generate(self):
        x = np.zeros((self.batch_size, self.num_steps, self.ndim))
        y = np.zeros((self.batch_size, self.num_steps,1))# make sure to set the last dim to one element, otherwise Keras won't `fit` the model!!!!!!!!!!!!!!!
        while True:
            for i in range(self.batch_size):
                if self.current_idx + self.num_steps >= len(self.data):
                    # reset the index back to the start of the data set
                    self.current_idx = 0
                x[i,:,:] = self.data[self.current_idx:self.current_idx + self.num_steps,:]
                y[i,:,0]   = self.data[self.current_idx + 1:self.current_idx + self.num_steps + 1,0]
                self.current_idx += self.skip_step                
            yield x, y

The 

In [254]:
import pandas as pd
import numpy as np
datapath = 'data/'
series_aapl = pd.read_csv('data/AAPL.csv')
series_spy = pd.read_csv('data/SPY.csv')
#series = pd.concat([series_aapl['Close'],series_aapl['Volume'],series_spy['Close'],series_spy['Volume']],axis=1)
series = np.stack((series_aapl['Close'].values,series_aapl['Volume'].values,series_spy['Close'].values,series_spy['Volume'].values),axis=1)
n = series.shape[0] # total time series length
nsteps = 10
batchsize = 20
lagt=1
ndim = series.shape[1] # number of features
# split data into 80% training, 10% validating,10% testing
train,valid,test = series[:round(0.8*n),:], series[round(0.8*n):round(0.9*n),:], series[round(0.9*n):,:]
train_data_generator = KerasBatchGenerator(train, nsteps, batchsize, ndim, skip_step=nsteps)
valid_data_generator = KerasBatchGenerator(valid, nsteps, batchsize, ndim, skip_step=nsteps)

#x_train,x_valid        = series[:round(0.8*n)], series[round(0.8*n):round(0.9*n)]
#y_train,y_valid,y_test = series.iloc[:,0][lagt:round(0.8*n)+lagt], series.iloc[:,0][round(0.8*n)+lagt:round(0.9*n)+lagt], series.iloc[:,0][round(0.9*n)+lagt:]
print(train.shape,valid.shape)

print('AAPL and SPY data snippet:\n', train[:5])


#series_goog = pd.read_csv('data/GOOG.csv')
#series_fund = pd.read_csv('nyse/fundamentals.csv')
#print(series_fund.columns)
#print(series_fund[series_fund['Ticker Symbol']=='AAP'])
#print(series_fund[series_fund['Ticker Symbol']=='AAP'].count)
#series_price = pd.read_csv('nyse/prices-split-adjusted.csv', error_bad_lines=False)
#series_price = pd.read_csv('nyse/prices.csv')
#print(series_price[series_price['symbol']=='AAP'])
#series.rename(columns={'Daily minimum temperatures in Melbourne, Australia, 1981-1990':'mint'},inplace=True) # rename minimum temp to 'mint'
#y = pd.to_numeric(series["mint"],downcast='float')
#y.index = pd.DatetimeIndex(start='1981-01-01',end='1990-12-31',freq='d')
#freq=365 # sampling freq
#train, valid = series_price[:freq*9], series_price[freq*9:]
#train.index, valid.index = y.index[:freq*9], y.index[freq*9:]
#print(series_price.head(35), series_fund[]
#series_fund.iloc[series_fund['Ticker Symbol']=='AAPL',:]
#series_fund.iloc[series_fund['Ticker Symbol']=='AAPL']



(1612, 4) (202, 4)
AAPL and SPY data snippet:
 [[4.55171430e+01 7.04396000e+07 1.22489998e+02 1.56107100e+08]
 [4.51542850e+01 9.58860000e+07 1.21610001e+02 1.86621600e+08]
 [4.54328580e+01 9.60568000e+07 1.22099998e+02 2.21387400e+08]
 [4.52357140e+01 9.03210000e+07 1.21639999e+02 1.58017600e+08]
 [4.40042840e+01 1.98961700e+08 1.20199997e+02 2.39068800e+08]]


In [259]:
import keras
from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, LSTM, TimeDistributed,Activation
import numpy as np
hdim = 128
nepochs = 500
model = Sequential()
model.add(LSTM(hdim, batch_input_shape=(batchsize,nsteps,ndim),  return_sequences=True)) #input_shape required only at the first layer
#model.add(LSTM(hdim, return_sequences=True)) 
# units: hidden nodes
# Input: (batch_size, timesteps, input_dim) # input_shape takes only (timesteps, input_dim) without batch_size
# Output: (batch_size, timesteps, units) if return_sequences=True, else, 2D tensor with shape `(batch_size, units)`
# set return_sequences to True to return the 
# full history of hidden state outputs at all times (i.e. the shape of output is (n_samples, n_timestamps, n_outdims)), 
# or the return value contains only the output at the last timestamp (i.e. the shape will be (n_samples, n_outdims)), 
# which is invalid as the input of the next LSTM layer. 
#
# lstm1, state_h, state_c = LSTM(128, return_sequences=True, return_state=True) 
# with return_state set to true, this returns the sequential hidden states, final hidden state and final cell states. 
model.add(Dropout(0.2))
model.add(LSTM(hdim,return_sequences=True)) # no need to specify input_shape anymore
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(1))) # y(batchsize,nsteps)
#model.add(Dense(1, activation='linear')) # essentially applies 'linear' activation, a(x)=x, which returns the Dense output directly
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# metrics: shows how the accuracy is improving during training
checkpointer = keras.callbacks.ModelCheckpoint(filepath=datapath + '/model-{epoch:02d}.hdf5', verbose=10)
# include the epoch in its naming of the model, which is good for keeping track of things.
model.summary()
model.fit_generator(train_data_generator.generate(), len(train)//(batchsize*nsteps), nepochs,
                        validation_data=valid_data_generator.generate(),
                        validation_steps=len(valid)//(batchsize*nsteps), callbacks=[checkpointer])

#model.fit(x_train, y_train, batch_size=batchsize, epochs=nepochs)
#score = model.evaluate(x_valid, y_valid, batch_size=16)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_170 (LSTM)              (20, 10, 128)             68096     
_________________________________________________________________
dropout_158 (Dropout)        (20, 10, 128)             0         
_________________________________________________________________
lstm_171 (LSTM)              (20, 10, 128)             131584    
_________________________________________________________________
dropout_159 (Dropout)        (20, 10, 128)             0         
_________________________________________________________________
time_distributed_34 (TimeDis (20, 10, 1)               129       
Total params: 199,809
Trainable params: 199,809
Non-trainable params: 0
_________________________________________________________________
Epoch 1/500

Epoch 00001: saving model to data//model-01.hdf5
Epoch 2/500

Epoch 00002: saving model to data//model-02.hdf5
Epoch 3/500


Epoch 38/500

Epoch 00038: saving model to data//model-38.hdf5
Epoch 39/500

Epoch 00039: saving model to data//model-39.hdf5
Epoch 40/500

Epoch 00040: saving model to data//model-40.hdf5
Epoch 41/500

Epoch 00041: saving model to data//model-41.hdf5
Epoch 42/500

Epoch 00042: saving model to data//model-42.hdf5
Epoch 43/500

Epoch 00043: saving model to data//model-43.hdf5
Epoch 44/500

Epoch 00044: saving model to data//model-44.hdf5
Epoch 45/500

Epoch 00045: saving model to data//model-45.hdf5
Epoch 46/500

Epoch 00046: saving model to data//model-46.hdf5
Epoch 47/500

Epoch 00047: saving model to data//model-47.hdf5
Epoch 48/500

Epoch 00048: saving model to data//model-48.hdf5
Epoch 49/500

Epoch 00049: saving model to data//model-49.hdf5
Epoch 50/500

Epoch 00050: saving model to data//model-50.hdf5
Epoch 51/500

Epoch 00051: saving model to data//model-51.hdf5
Epoch 52/500

Epoch 00052: saving model to data//model-52.hdf5
Epoch 53/500

Epoch 00053: saving model to data//model-

Epoch 80/500

Epoch 00080: saving model to data//model-80.hdf5
Epoch 81/500

Epoch 00081: saving model to data//model-81.hdf5
Epoch 82/500

Epoch 00082: saving model to data//model-82.hdf5
Epoch 83/500

Epoch 00083: saving model to data//model-83.hdf5
Epoch 84/500

Epoch 00084: saving model to data//model-84.hdf5
Epoch 85/500

Epoch 00085: saving model to data//model-85.hdf5
Epoch 86/500

Epoch 00086: saving model to data//model-86.hdf5
Epoch 87/500

Epoch 00087: saving model to data//model-87.hdf5
Epoch 88/500

Epoch 00088: saving model to data//model-88.hdf5
Epoch 89/500

Epoch 00089: saving model to data//model-89.hdf5
Epoch 90/500

Epoch 00090: saving model to data//model-90.hdf5
Epoch 91/500

Epoch 00091: saving model to data//model-91.hdf5
Epoch 92/500

Epoch 00092: saving model to data//model-92.hdf5
Epoch 93/500

Epoch 00093: saving model to data//model-93.hdf5
Epoch 94/500

Epoch 00094: saving model to data//model-94.hdf5
Epoch 95/500

Epoch 00095: saving model to data//model-

Epoch 122/500

Epoch 00122: saving model to data//model-122.hdf5
Epoch 123/500

Epoch 00123: saving model to data//model-123.hdf5
Epoch 124/500

Epoch 00124: saving model to data//model-124.hdf5
Epoch 125/500

Epoch 00125: saving model to data//model-125.hdf5
Epoch 126/500

Epoch 00126: saving model to data//model-126.hdf5
Epoch 127/500

Epoch 00127: saving model to data//model-127.hdf5
Epoch 128/500

Epoch 00128: saving model to data//model-128.hdf5
Epoch 129/500

Epoch 00129: saving model to data//model-129.hdf5
Epoch 130/500

Epoch 00130: saving model to data//model-130.hdf5
Epoch 131/500

Epoch 00131: saving model to data//model-131.hdf5
Epoch 132/500

Epoch 00132: saving model to data//model-132.hdf5
Epoch 133/500

Epoch 00133: saving model to data//model-133.hdf5
Epoch 134/500

Epoch 00134: saving model to data//model-134.hdf5
Epoch 135/500

Epoch 00135: saving model to data//model-135.hdf5
Epoch 136/500

Epoch 00136: saving model to data//model-136.hdf5
Epoch 137/500

Epoch 0013

Epoch 164/500

Epoch 00164: saving model to data//model-164.hdf5
Epoch 165/500

Epoch 00165: saving model to data//model-165.hdf5
Epoch 166/500

Epoch 00166: saving model to data//model-166.hdf5
Epoch 167/500

Epoch 00167: saving model to data//model-167.hdf5
Epoch 168/500

Epoch 00168: saving model to data//model-168.hdf5
Epoch 169/500

Epoch 00169: saving model to data//model-169.hdf5
Epoch 170/500

Epoch 00170: saving model to data//model-170.hdf5
Epoch 171/500

Epoch 00171: saving model to data//model-171.hdf5
Epoch 172/500

Epoch 00172: saving model to data//model-172.hdf5
Epoch 173/500

Epoch 00173: saving model to data//model-173.hdf5
Epoch 174/500

Epoch 00174: saving model to data//model-174.hdf5
Epoch 175/500

Epoch 00175: saving model to data//model-175.hdf5
Epoch 176/500

Epoch 00176: saving model to data//model-176.hdf5
Epoch 177/500

Epoch 00177: saving model to data//model-177.hdf5
Epoch 178/500

Epoch 00178: saving model to data//model-178.hdf5
Epoch 179/500

Epoch 0017

Epoch 206/500

Epoch 00206: saving model to data//model-206.hdf5
Epoch 207/500

Epoch 00207: saving model to data//model-207.hdf5
Epoch 208/500

Epoch 00208: saving model to data//model-208.hdf5
Epoch 209/500

Epoch 00209: saving model to data//model-209.hdf5
Epoch 210/500

Epoch 00210: saving model to data//model-210.hdf5
Epoch 211/500

Epoch 00211: saving model to data//model-211.hdf5
Epoch 212/500

Epoch 00212: saving model to data//model-212.hdf5
Epoch 213/500

Epoch 00213: saving model to data//model-213.hdf5
Epoch 214/500

Epoch 00214: saving model to data//model-214.hdf5
Epoch 215/500

Epoch 00215: saving model to data//model-215.hdf5
Epoch 216/500

Epoch 00216: saving model to data//model-216.hdf5
Epoch 217/500

Epoch 00217: saving model to data//model-217.hdf5
Epoch 218/500

Epoch 00218: saving model to data//model-218.hdf5
Epoch 219/500

Epoch 00219: saving model to data//model-219.hdf5
Epoch 220/500

Epoch 00220: saving model to data//model-220.hdf5
Epoch 221/500

Epoch 0022

Epoch 248/500

Epoch 00248: saving model to data//model-248.hdf5
Epoch 249/500

Epoch 00249: saving model to data//model-249.hdf5
Epoch 250/500

Epoch 00250: saving model to data//model-250.hdf5
Epoch 251/500

Epoch 00251: saving model to data//model-251.hdf5
Epoch 252/500

Epoch 00252: saving model to data//model-252.hdf5
Epoch 253/500

Epoch 00253: saving model to data//model-253.hdf5
Epoch 254/500

Epoch 00254: saving model to data//model-254.hdf5
Epoch 255/500

Epoch 00255: saving model to data//model-255.hdf5
Epoch 256/500

Epoch 00256: saving model to data//model-256.hdf5
Epoch 257/500

Epoch 00257: saving model to data//model-257.hdf5
Epoch 258/500

Epoch 00258: saving model to data//model-258.hdf5
Epoch 259/500

Epoch 00259: saving model to data//model-259.hdf5
Epoch 260/500

Epoch 00260: saving model to data//model-260.hdf5
Epoch 261/500

Epoch 00261: saving model to data//model-261.hdf5
Epoch 262/500

Epoch 00262: saving model to data//model-262.hdf5
Epoch 263/500

Epoch 0026

Epoch 290/500

Epoch 00290: saving model to data//model-290.hdf5
Epoch 291/500

Epoch 00291: saving model to data//model-291.hdf5
Epoch 292/500

Epoch 00292: saving model to data//model-292.hdf5
Epoch 293/500

Epoch 00293: saving model to data//model-293.hdf5
Epoch 294/500

Epoch 00294: saving model to data//model-294.hdf5
Epoch 295/500

Epoch 00295: saving model to data//model-295.hdf5
Epoch 296/500

Epoch 00296: saving model to data//model-296.hdf5
Epoch 297/500

Epoch 00297: saving model to data//model-297.hdf5
Epoch 298/500

Epoch 00298: saving model to data//model-298.hdf5
Epoch 299/500

Epoch 00299: saving model to data//model-299.hdf5
Epoch 300/500

Epoch 00300: saving model to data//model-300.hdf5
Epoch 301/500

Epoch 00301: saving model to data//model-301.hdf5
Epoch 302/500

Epoch 00302: saving model to data//model-302.hdf5
Epoch 303/500

Epoch 00303: saving model to data//model-303.hdf5
Epoch 304/500

Epoch 00304: saving model to data//model-304.hdf5
Epoch 305/500

Epoch 0030


Epoch 00331: saving model to data//model-331.hdf5
Epoch 332/500

Epoch 00332: saving model to data//model-332.hdf5
Epoch 333/500

Epoch 00333: saving model to data//model-333.hdf5
Epoch 334/500

Epoch 00334: saving model to data//model-334.hdf5
Epoch 335/500

Epoch 00335: saving model to data//model-335.hdf5
Epoch 336/500

Epoch 00336: saving model to data//model-336.hdf5
Epoch 337/500

Epoch 00337: saving model to data//model-337.hdf5
Epoch 338/500

Epoch 00338: saving model to data//model-338.hdf5
Epoch 339/500

Epoch 00339: saving model to data//model-339.hdf5
Epoch 340/500

Epoch 00340: saving model to data//model-340.hdf5
Epoch 341/500

Epoch 00341: saving model to data//model-341.hdf5
Epoch 342/500

Epoch 00342: saving model to data//model-342.hdf5
Epoch 343/500

Epoch 00343: saving model to data//model-343.hdf5
Epoch 344/500

Epoch 00344: saving model to data//model-344.hdf5
Epoch 345/500

Epoch 00345: saving model to data//model-345.hdf5
Epoch 346/500

Epoch 00346: saving mode


Epoch 00373: saving model to data//model-373.hdf5
Epoch 374/500

Epoch 00374: saving model to data//model-374.hdf5
Epoch 375/500

Epoch 00375: saving model to data//model-375.hdf5
Epoch 376/500

Epoch 00376: saving model to data//model-376.hdf5
Epoch 377/500

Epoch 00377: saving model to data//model-377.hdf5
Epoch 378/500

Epoch 00378: saving model to data//model-378.hdf5
Epoch 379/500

Epoch 00379: saving model to data//model-379.hdf5
Epoch 380/500

Epoch 00380: saving model to data//model-380.hdf5
Epoch 381/500

Epoch 00381: saving model to data//model-381.hdf5
Epoch 382/500

Epoch 00382: saving model to data//model-382.hdf5
Epoch 383/500

Epoch 00383: saving model to data//model-383.hdf5
Epoch 384/500

Epoch 00384: saving model to data//model-384.hdf5
Epoch 385/500

Epoch 00385: saving model to data//model-385.hdf5
Epoch 386/500

Epoch 00386: saving model to data//model-386.hdf5
Epoch 387/500

Epoch 00387: saving model to data//model-387.hdf5
Epoch 388/500

Epoch 00388: saving mode


Epoch 00415: saving model to data//model-415.hdf5
Epoch 416/500

Epoch 00416: saving model to data//model-416.hdf5
Epoch 417/500

Epoch 00417: saving model to data//model-417.hdf5
Epoch 418/500

Epoch 00418: saving model to data//model-418.hdf5
Epoch 419/500

Epoch 00419: saving model to data//model-419.hdf5
Epoch 420/500

Epoch 00420: saving model to data//model-420.hdf5
Epoch 421/500

Epoch 00421: saving model to data//model-421.hdf5
Epoch 422/500

Epoch 00422: saving model to data//model-422.hdf5
Epoch 423/500

Epoch 00423: saving model to data//model-423.hdf5
Epoch 424/500

Epoch 00424: saving model to data//model-424.hdf5
Epoch 425/500

Epoch 00425: saving model to data//model-425.hdf5
Epoch 426/500

Epoch 00426: saving model to data//model-426.hdf5
Epoch 427/500

Epoch 00427: saving model to data//model-427.hdf5
Epoch 428/500

Epoch 00428: saving model to data//model-428.hdf5
Epoch 429/500

Epoch 00429: saving model to data//model-429.hdf5
Epoch 430/500

Epoch 00430: saving mode


Epoch 00457: saving model to data//model-457.hdf5
Epoch 458/500

Epoch 00458: saving model to data//model-458.hdf5
Epoch 459/500

Epoch 00459: saving model to data//model-459.hdf5
Epoch 460/500

Epoch 00460: saving model to data//model-460.hdf5
Epoch 461/500

Epoch 00461: saving model to data//model-461.hdf5
Epoch 462/500

Epoch 00462: saving model to data//model-462.hdf5
Epoch 463/500

Epoch 00463: saving model to data//model-463.hdf5
Epoch 464/500

Epoch 00464: saving model to data//model-464.hdf5
Epoch 465/500

Epoch 00465: saving model to data//model-465.hdf5
Epoch 466/500

Epoch 00466: saving model to data//model-466.hdf5
Epoch 467/500

Epoch 00467: saving model to data//model-467.hdf5
Epoch 468/500

Epoch 00468: saving model to data//model-468.hdf5
Epoch 469/500

Epoch 00469: saving model to data//model-469.hdf5
Epoch 470/500

Epoch 00470: saving model to data//model-470.hdf5
Epoch 471/500

Epoch 00471: saving model to data//model-471.hdf5
Epoch 472/500

Epoch 00472: saving mode


Epoch 00499: saving model to data//model-499.hdf5
Epoch 500/500

Epoch 00500: saving model to data//model-500.hdf5


<keras.callbacks.History at 0x17628ab70>

# Prediction

In [292]:
#from keras.models import load_model
# load model from single file
#model = load_model('data/model-477.hdf5')
# make predictions
#yhat = model.predict(np.reshape(test[0:10,:],(10,4,1)), batch_size=1, steps=10, verbose=2)
x = np.zeros((1,10,4))
x[0,:,:] = test[0:10,:]
yhat = model.predict(x)
print(yhat)

ValueError: could not broadcast input array from shape (20,10,1) into shape (1,10,1)

# Reference
* [LSTM from Keras](https://keras.io/layers/recurrent/#lstm)
* [Keras LSTM tutorial](http://adventuresinmachinelearning.com/keras-lstm-tutorial/)