# GEC Data Science Program
## Level 2: Lab 5

<div id="toc"></div>

### Working with Time Series Data

#### Data: Airline Monthly Traffic Data

https://datamarket.com/data/set/22u3/international-airline-passengers-monthly-totals-in-thousands-jan-49-dec-60

In [None]:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, ConvLSTM2D
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics 

In [None]:
import keras

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
%matplotlib inline
#import pandas
import matplotlib.pyplot as plt

In [None]:
def split_fit_predict(model, data, predictors, target, split_test_size=0.3):
    """
    1. Split 'data[predictors]','data[target] into train and test using 'split_test_size' ratio
    2. Fit 'model' using training data
    3. Predict against test data
    """
    if split_test_size == 0:
        df_train = data
        df_test = None
    else:
        df_train, df_test = train_test_split(data, test_size=split_test_size)
    X_train=df_train[predictors]
    X_test=df_test[predictors] if df_test else None
    y_train=df_train[target]
    y_test=df_test[target] if df_test else None
    model.fit(X_train,y_train)
    y_pred=model.predict(X_test) if df_test else None
    y_fit=model.predict(X_train)
    return y_pred, y_test, y_fit

In [None]:
dataset = pd.read_csv('/Users/shahab/Downloads/international-airline-passengers.csv', 
                          index_col=0, skipfooter=3)

In [None]:
dataset.columns=["passengers"]

In [None]:
dataset.head()

In [None]:
dataset.index=dataset.index.to_datetime()

In [None]:
plt.plot(dataset);

In [None]:
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset_n = scaler.fit_transform(dataset)

### Q: If we have a time series data, how can we predict the future values?

Let's say we have:
$t=[1, 2, 3, 4, 5, 6, 7, 8, 9]$

We can transform $t$ like this: 

For each value of $t$ we look at $n$ previous values (look-back), and create a feature matrix.

With look-back = 3:

$[1,2,3]->4$

$[2,3,4]->5$

$[3,4,5]->6$

$[4,5,6]->7$

$[5,6,7]->8$

$[6,7,8]->9$

$X=
\left(\begin{array}{cc} 
1 & 2 & 3\\
2 & 3 & 4\\
3 & 4 & 5\\
4 & 5 & 6\\
5 & 6 & 7\\
6 & 7 & 8\\
\end{array}\right)
$
$
y=\left(\begin{array}{cc}
4\\
5\\
6\\
7\\
8\\
9\\
\end{array}\right)
$

Now we can train a model using feature matrix $X$ and labels $y$. Then to predict the next value we use input vector $x=[7,8,9]$.

In [None]:
# convert an array of values into a look-back dataset matrix
def timeseries_to_matrix(timeseries, look_back=1):
    dataX, dataY = [], []
    for i in range(len(timeseries)-look_back):
        a = timeseries[i:(i+look_back)]
        dataX.append(a)
        dataY.append(timeseries[i + look_back])
    return np.array(dataX), np.array(dataY)

In [None]:
t=range(1,10)
timeseries_to_matrix(t,look_back=3)

In [None]:
look_back=5
X, y = timeseries_to_matrix(dataset_n[:,0], look_back)
#testX_r, testY_r = timeseries_to_matrix(test, look_back=5)

In [None]:
X

#### Q: How should we split time-series data into train and test sets?

In [None]:
train_size = int(len(dataset) * 0.9)
test_size = len(dataset) - train_size

In [None]:
train_size

In [None]:
X_train = X[:train_size]
X_test = X[train_size:]

y_train = y[:train_size]
y_test = y[train_size:]

In [None]:
X_train.shape, y_train.shape

In [None]:
X_test.shape, y_test.shape

Also create arrays for date-time values

In [None]:
T_train = dataset.index[look_back:train_size+look_back]
T_test = dataset.index[train_size+look_back:]

In [None]:
T_train.shape, T_test.shape

### Let's train a simple model on this transformed data:

In [None]:
rf = RandomForestRegressor()
lr = LinearRegression()

In [None]:
rf.fit(X_train,y_train)
y_pred_rf=rf.predict(X_test)
y_fit_rf=rf.predict(X_train)

In [None]:
lr.fit(X_train,y_train)
y_pred_lr = lr.predict(X_test)
y_fit_lr = lr.predict(X_train)

In [None]:
metrics.mean_squared_error(y_train, y_fit_rf)

In [None]:
metrics.mean_squared_error(y_test, y_pred_rf)

In [None]:
metrics.mean_squared_error(y_train, y_fit_lr)

In [None]:
metrics.mean_squared_error(y_test, y_pred_lr)

In [None]:
# # make predictions
# trainPredict = model.predict(trainX)
# testPredict = model.predict(testX)

In [None]:
# invert predictions
trainPredict = scaler.inverse_transform(y_fit_rf)
#trainY = scaler.inverse_transform([y_train])
testPredict = scaler.inverse_transform(y_pred_rf)
#testY = scaler.inverse_transform([y_test])

In [None]:
plt.plot(dataset,':')
plt.plot(T_train,trainPredict)
plt.plot(T_test, testPredict)

In [None]:
# invert predictions
trainPredict_lr = scaler.inverse_transform(y_fit_lr)
testPredict_lr = scaler.inverse_transform(y_pred_lr)

In [None]:
plt.plot(dataset,':')
plt.plot(T_train,trainPredict_lr)
plt.plot(T_test, testPredict_lr)

## Recurrent Neural Networks 
### Long-Term Short-Term Memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://keras.io/layers/recurrent/

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

The Long Short-Term Memory network (LSTM) is a type of Recurrent Neural Network (RNN).

A benefit of this type of network is that it can learn and remember over long sequences and does not rely on a pre-specified window lagged observation as input.

The LSTM layer expects input to be in a matrix with the dimensions: [samples, time steps, features].

- Samples: These are independent observations from the domain, typically rows of data.

- Time steps: These are separate time steps of a given variable for a given observation.

- Features: These are separate measures observed at the time of observation.


In [None]:
data_dim = 16
timesteps = 8
num_classes = 10

In [None]:
# Generate dummy training data
x_train_toy = np.random.random((1000, timesteps, data_dim))
y_train_toy = np.random.random((1000, num_classes))

In [None]:
model = Sequential()
model.add(LSTM(32, #return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(Dense(num_classes, activation='softmax'))

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [None]:
model.fit(x_train_toy,y_train_toy, epochs=2)

#### LSTM for flight data

In [None]:
# # reshape input to be [samples, time steps, features]
X_train_r = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_test_r = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

In [None]:
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(5, input_shape=(None,look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

In [None]:
model.fit(X_train_r, y_train, nb_epoch=50, batch_size=5, verbose=1)

### Q: Is it over-fitted or under-fitted? How do we find out?

### Let's predict:

In [None]:
y_fit_lstm=model.predict(X_train_r)
y_pred_lstm=model.predict(X_test_r)

trainPredict_lstm = scaler.inverse_transform(y_fit_lstm)
testPredict_lstm = scaler.inverse_transform(y_pred_lstm)

In [None]:
trainPredict_lstm.shape, testPredict_lstm.shape

In [None]:
plt.plot(dataset,':')
plt.plot(T_train,trainPredict_lstm)
plt.plot(T_test, testPredict_lstm,'.-')

In [None]:
metrics.mean_squared_error(y_train,y_fit_lstm)

In [None]:
metrics.mean_squared_error(y_test,y_pred_lstm)

### Stacking multiple LSTM Layers

You should use return_sequences=True

In [None]:
model = Sequential()
model.add(LSTM(32, return_sequences=True, #input_dim=look_back))
               input_shape=(None, look_back)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',
             metrics=['mean_squared_error'])

In [None]:
fitting=model.fit(X_train_r, y_train, nb_epoch=50, batch_size=5, verbose=1, validation_split=0.25)

In [None]:
plt.plot(fitting.history['loss'],'b')
plt.plot(fitting.history['val_loss'],'g')
plt.legend(('training_loss', 'testing_loss'))

In [None]:
y_fit_lstm=model.predict(X_train_r)
y_pred_lstm=model.predict(X_test_r)

trainPredict_lstm = scaler.inverse_transform(y_fit_lstm)
testPredict_lstm = scaler.inverse_transform(y_pred_lstm)

In [None]:
plt.plot(dataset,':')
plt.plot(T_train,trainPredict_lstm)
plt.plot(T_test, testPredict_lstm,'.-')

In [None]:
metrics.mean_squared_error(y_train,y_fit_lstm)

In [None]:
metrics.mean_squared_error(y_test,y_pred_lstm)

### Statefull LSTM

A stateful recurrent model is one for which the internal states (memories) obtained after processing a batch of samples are reused as initial states for the samples of the next batch. This allows to process longer sequences while keeping computational complexity manageable.

In [None]:
batch_size=10

#### Q: Fix the error:

In [None]:
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               input_shape=(None, look_back)))  
               # batch_input_shape=(batch_size, None, look_back)))
model.add(LSTM(32, return_sequences=True, stateful=True))  
model.add(LSTM(32, stateful=True))  
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',
             metrics=['mean_squared_error'])

#### Q: Fix the error:

In [None]:
fitting=model.fit(X_train_r, y_train, 
                  nb_epoch=100, batch_size=batch_size, verbose=1, validation_split=0.25)

In [None]:
plt.plot(fitting.history['loss'],'b')
plt.plot(fitting.history['val_loss'],'g')
plt.legend(('training_loss', 'testing_loss'))

In [None]:
y_fit_lstm=model.predict(X_train_r[?:], batch_size=batch_size)
y_pred_lstm=model.predict(X_test_r)

trainPredict_lstm = scaler.inverse_transform(y_fit_lstm)
testPredict_lstm = scaler.inverse_transform(y_pred_lstm)

In [None]:
len(trainPredict_lstm)

In [None]:
plt.plot(dataset,':')
plt.plot(T_train[9:],trainPredict_lstm)
plt.plot(T_test, testPredict_lstm,'.-')

In [None]:
metrics.mean_squared_error(y_train[?:],y_fit_lstm)

In [None]:
metrics.mean_squared_error(y_test,y_pred_lstm)

This is unerfitted. Let's add more epochs and early stopping.

In [None]:
from keras import callbacks

In [None]:
model = Sequential()
model.add(LSTM(60, return_sequences=True, stateful=True,
               #input_shape=(None, look_back)))  # returns a sequence of vectors of dimension 32
                batch_input_shape=(batch_size, None, look_back)))
model.add(LSTM(30, return_sequences=True, stateful=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(20, stateful=True))  # return a single vector of dimension 32
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',
             metrics=['mean_squared_error'])

In [None]:
early_stopping = callbacks.EarlyStopping(monitor='val_loss', patience=50, verbose=1, mode='auto')
fitting=model.fit(X_train_r[-12*batch_size:], y_train[-12*batch_size:], callbacks=[early_stopping],
                  nb_epoch=1000, batch_size=batch_size, verbose=1, validation_split=0.25,)

In [None]:
plt.plot(fitting.history['loss'],'b')
plt.plot(fitting.history['val_loss'],'g')
plt.legend(('training_loss', 'testing_loss'))

In [None]:
y_fit_lstm=model.predict(X_train_r[-12*batch_size:], batch_size=batch_size)
y_pred_lstm=model.predict(X_test_r)

trainPredict_lstm = scaler.inverse_transform(y_fit_lstm)
testPredict_lstm = scaler.inverse_transform(y_pred_lstm)

In [None]:
metrics.mean_squared_error(y_train[-12*batch_size:],y_fit_lstm)

In [None]:
metrics.mean_squared_error(y_test,y_pred_lstm)

### Transform Time Series to Stationary

In [None]:
# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return np.array(diff, dtype=np.float)

In [None]:
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

In [None]:
dataset.head()

In [None]:
differenced = difference(dataset.values, 1)
print(differenced)

Inverse transform is easy using the first value.

In [None]:
t0=dataset.values[0]
inverted=[t0]
for d in differenced:
    inverted.append(inverted[-1]+d)
inverted=np.array(inverted)

In [None]:
all(inverted==dataset.values)

In [None]:
def inverse_difference(t0, differenced):
    inverted=[t0]
    for d in differenced:
        inverted.append(inverted[-1]+d)
    inverted=np.array(inverted)
    return inverted

In [None]:
differenced_n = scaler.fit_transform(differenced)
look_back=5
X, y = timeseries_to_matrix(dataset_n[:,0], look_back)
dX, dy = timeseries_to_matrix(differenced_n[:,0], look_back)

dX_train = dX[:train_size]
dX_test = dX[train_size:]

dy_train = dy[:train_size]
dy_test = dy[train_size:]

X_train = X[:train_size]
X_test = X[train_size:]

y_train = y[:train_size]
y_test = y[train_size:]

In [None]:
dX.shape, dX_train.shape, dX_test.shape

In [None]:
dX_train_r = np.reshape(dX_train, (dX_train.shape[0], 1, dX_train.shape[1]))
dX_test_r = np.reshape(dX_test, (dX_test.shape[0], 1, dX_test.shape[1]))

In [None]:
model = Sequential()
model.add(LSTM(20, input_shape=(None,look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])

In [None]:
fitting=model.fit(dX_train_r, dy_train, nb_epoch=200, batch_size=10, verbose=1, validation_split=0.2)

In [None]:
plt.plot(fitting.history['loss'],'b')
plt.plot(fitting.history['val_loss'],'g')
plt.legend(('training_loss', 'testing_loss'))

In [None]:
dy_fit_lstm=model.predict(dX_train_r)
dy_pred_lstm=model.predict(dX_test_r)

dtrainPredict_lstm = scaler.inverse_transform(dy_fit_lstm)
dtestPredict_lstm = scaler.inverse_transform(dy_pred_lstm)

In [None]:
plt.plot(differenced[look_back:],':')
plt.plot(dtrainPredict_lstm)
plt.plot(range(len(dtrainPredict_lstm),len(differenced)-look_back),dtestPredict_lstm,'.-')

In [None]:
metrics.mean_squared_error(dy_train,dy_fit_lstm)

In [None]:
metrics.mean_squared_error(dy_test,dy_pred_lstm)

## Video classification methods

1. Classifying one frame at a time with a ConvNet
2. Using a time-distributed ConvNet and passing the features to an RNN(LSTM), in one network
3. Using a 3D convolutional network
4. Extracting features from each frame with a ConvNet and passing the sequence to a separate LSTM
5. Extracting features from each frame with a ConvNet and passing the sequence to a separate MLP

https://blog.coast.ai/five-video-classification-methods-implemented-in-keras-and-tensorflow-99cad29cc0b5

https://github.com/harvitronix/five-video-classification-methods/blob/master/models.py

The best method is #4: CNN+LSTM