# LSTM Input Guide

Recurrent Neural Networks require a specific type of 3-dimensional array. During the Lambda curriculumn, we covered RNN's for text generation and thus we used the `pad_squences` generator. Since we are attempting time-series projection we need to use the `time_series_generator` from `keras.preprocessing.sequence`.

In [1]:
%cd ..
%cd .. 
%cd data

/Users/azel/github/Data-Science/jupyter_notebooks
/Users/azel/github/Data-Science
/Users/azel/github/Data-Science/data


In [77]:
from fin_data import DailyTimeSeries
import numpy as np

In [11]:
apple = DailyTimeSeries('AAPL')
df = apple.initiate()
df.head()

################################################################### 
 Ticker:  AAPL 
 Last Refreshed:  2019-09-23 13:27:48 
 Data Retrieved:  Daily Time Series with Splits and Dividend Events 
 ###################################################################


Unnamed: 0_level_0,AAPL_open,AAPL_high,AAPL_low,AAPL_close,AAPL_adjusted_close,AAPL_volume,AAPL_dividend_amount
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1999-09-21,73.188,73.25,69.0,69.25,2.1584,839389600.0,0.0
1999-09-22,69.75,71.625,69.016,70.313,2.1915,280792400.0,0.0
1999-09-23,71.125,71.25,63.0,63.313,1.9733,285938800.0,0.0
1999-09-24,63.375,67.016,63.0,64.938,2.024,294968800.0,0.0
1999-09-27,66.375,66.75,61.188,61.313,1.911,237048000.0,0.0


The documentation for the TimeSeriesGenerator looks like this:

In [10]:
from keras.preprocessing.sequence import TimeseriesGenerator
??TimeseriesGenerator

[0;31mInit signature:[0m
[0mTimeseriesGenerator[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtargets[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlength[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msampling_rate[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstride[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstart_index[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mend_index[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mshuffle[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbatch_size[0m[0;34m=[0m[0;36m128[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mTimeseriesGenerator[0m[0;34m([0m[0msequence[0m[0;34m.[0m[0mTimeseriesGenerator[0m[0

In [89]:
X = df.values
y = df['AAPL_close'].values

data_gen = TimeseriesGenerator(X, y, 
                               length=4,
                               sampling_rate=1, 
                               stride=1, 
                               batch_size=1)

In [90]:
a, b = data_gen[0]

In [91]:
a

array([[[7.318800e+01, 7.325000e+01, 6.900000e+01, 6.925000e+01,
         2.158400e+00, 8.393896e+08, 0.000000e+00],
        [6.975000e+01, 7.162500e+01, 6.901600e+01, 7.031300e+01,
         2.191500e+00, 2.807924e+08, 0.000000e+00],
        [7.112500e+01, 7.125000e+01, 6.300000e+01, 6.331300e+01,
         1.973300e+00, 2.859388e+08, 0.000000e+00],
        [6.337500e+01, 6.701600e+01, 6.300000e+01, 6.493800e+01,
         2.024000e+00, 2.949688e+08, 0.000000e+00]]])

In [92]:
b

array([61.313])

So as was demonstrated, the Keras TimeSeriesGenerator creates a generator-like object that engineers each successive time-step as well as the target time step. You do not have to shift the target before inputting it into the generator. 

When projecting single day time-series, you'll want to keep most of the parameters at `1`. Length is the only parameter you should play with, as it is the length of each sample of time-steps. If you want to project into several days into the future (which is the advantage provided by an LSTM over Gradient Boosting for Time Series Projection), change the `batch size`. 

You can also use `stride` to project far in the future as well. 

### Extracting Data from the Generator

When using a Keras Preprocessing Generator, you can use the `fit_generator` method to fit the model. However, if you want to extract data from the Generator, you can use this function:

In [93]:
def extract_data(generator):
    for i in np.arange(len(generator)):
        if i == 0:
            a, b = generator[i]
        else: 
            c, d = generator[i]

            a = np.vstack((a, c))
            b = np.vstack((b, d))
            
    return a, b

In [94]:
ext_X, ext_y = extract_data(data_gen)

In [95]:
ext_X.shape, ext_y.shape

((5030, 4, 7), (5030, 1))

This function does not work with batch_size > 1