# CS395 - Assignment 4
### Univarite LSTMs

Date: Feb 6, 2019  
By: Joshua Eli Swick

## Data Preparation

In [2]:
# import libs for all code below
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Bidirectional
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.layers import MaxPooling1D
from tensorflow.keras.layers import ConvLSTM2D

In [3]:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

In [5]:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
    print(X[i], y[i])

[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90


## Vanilla LSTM

### 1.
Run the given code 3 times. Show all outputs.

In [4]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))

In [5]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Vanilla LSTM output {run+1}: {yhat}")

Vanilla LSTM output 1: [[102.89862]]
Vanilla LSTM output 2: [[102.680595]]
Vanilla LSTM output 3: [[102.354904]]


### 2.
Change the activation function from 'relu' to 2 others. Run 3 times, show all outputs.

In [6]:
for run in range(3):
    # define model
    model = Sequential()
    # tanh activation
    model.add(LSTM(50, activation='tanh', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Vanilla LSTM, TanH activation, output {run+1}: {yhat}")

Vanilla LSTM, TanH activation, output 1: [[14.410823]]
Vanilla LSTM, TanH activation, output 2: [[12.103288]]
Vanilla LSTM, TanH activation, output 3: [[11.201336]]


In [7]:
for run in range(3):
    # define model
    model = Sequential()
    # tanh activation
    model.add(LSTM(50, activation='sigmoid', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Vanilla LSTM, Sigmoid activation, output {run+1}: {yhat}")

Vanilla LSTM, Sigmoid activation, output 1: [[9.99916]]
Vanilla LSTM, Sigmoid activation, output 2: [[9.784918]]
Vanilla LSTM, Sigmoid activation, output 3: [[12.060872]]


## Stacked LSTM

### 3.
Run give code 3 times. Show all outputs.

In [8]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='relu', return_sequences=True,
    input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Stacked LSTM, output {run+1}: {yhat}")

Stacked LSTM, output 1: [[102.345375]]
Stacked LSTM, output 2: [[102.31267]]
Stacked LSTM, output 3: [[103.0182]]


### 4.
Change the activation function from 'relu' to 2 others. Run 3 times, show all outputs.

In [10]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='tanh', return_sequences=True,
    input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation='tanh'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Stacked LSTM, TanH activation, output {run+1}: {yhat}")

Stacked LSTM, TanH activation, output 1: [[19.52164]]
Stacked LSTM, TanH activation, output 2: [[19.069542]]
Stacked LSTM, TanH activation, output 3: [[19.668125]]


In [12]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(LSTM(50, activation='sigmoid', return_sequences=True,
    input_shape=(n_steps, n_features)))
    model.add(LSTM(50, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Stacked LSTM, Sigmoid activation, output {run+1}: {yhat}")

Stacked LSTM, Sigmoid activation, output 1: [[12.196514]]
Stacked LSTM, Sigmoid activation, output 2: [[10.7032795]]
Stacked LSTM, Sigmoid activation, output 3: [[11.839983]]


### 5.
What are the primary differences between the Stacked LSTM and the Vanilla LSTM? Describe and explain what is occuring differently. When would you use the Stacked LSTM instead of the Vanilla LSTM and vise versa.

The primary difference between a Stacked LSTM and a Vanilla LSTM is the number of LSTM layers. A Vanilla LSTM has only one layer, while Stacked LSTMs have more than one LSTM layer, meaning multiple memory cells.

A Stacked LSTM may improve accuracy in cases where the model benefits from from learning the hierarchical representation of complex time-series data. A Vanilla LSTM is useful in less complex time series data that still have long term dependencies.

## Bidirectional LSTM

### 6.
Run the given code 3 times. Show all outputs.

In [14]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps,
    n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Bidirectional LSTM, output {run+1}: {yhat}")

Bidirectional LSTM, output 1: [[101.993484]]
Bidirectional LSTM, output 2: [[102.12818]]
Bidirectional LSTM, output 3: [[100.94301]]


### 7.
Change the activation function from 'relu' to 2 others. Run atleast 3 times, show outputs.

In [15]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(Bidirectional(LSTM(50, activation='tanh'), input_shape=(n_steps,
    n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Bidirectional LSTM, TanH activation, output {run+1}: {yhat}")

Bidirectional LSTM, TanH activation, output 1: [[26.314335]]
Bidirectional LSTM, TanH activation, output 2: [[27.845387]]
Bidirectional LSTM, TanH activation, output 3: [[26.081753]]


In [13]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(Bidirectional(LSTM(50, activation='sigmoid'), input_shape=(n_steps,
    n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=200, verbose=0)

    # demonstrate prediction
    x_input = array([70, 80, 90])
    x_input = x_input.reshape((1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Bidirectional LSTM, Sigmoid activation, output {run+1}: {yhat}")

Bidirectional LSTM, Sigmoid activation, output 1: [[22.500427]]
Bidirectional LSTM, Sigmoid activation, output 2: [[20.091812]]
Bidirectional LSTM, Sigmoid activation, output 3: [[21.567383]]


### 8.
What are the primary differences between the Bidirectional LSTM and the Vanilla LSTM? Describe and explain what is occurring differently. When would you use the Bidirectional LSTM instead of the Vanilla
LSTM and vice versa?

The primary difference between a Bidirectional LSTM and a Vanilla LSTM is that a Bidirectional LSTM run the inputs both from past to future and future to past, preserving information from both the past and the future. Vanilla LSTMs only preserve information from the past.

Bidirectional LSTMs may improve accuracy in models where context are important such as predicting latter part of a sentence in language.
A Vanilla LSTM may be more appropriate in less complex models where the context is less related to the future data points.

## CNN LSTM

In [14]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))

### 9.
Run the given code 3 times. Show all outputs.

In [15]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1,activation='relu'),
                              input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"CNN LSTM, output {run+1}: {yhat}")

CNN LSTM, output 1: [[102.702675]]
CNN LSTM, output 2: [[103.33146]]
CNN LSTM, output 3: [[103.06571]]


### 10.
Change the activation function from 'relu' to 2 others. Run 3 times, show all outputs.

In [20]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1,activation='tanh'),
                              input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation='tanh'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"CNN LSTM, TanH activation, output {run+1}: {yhat}")

CNN LSTM, TanH activation, output 1: [[31.176186]]
CNN LSTM, TanH activation, output 2: [[29.775518]]
CNN LSTM, TanH activation, output 3: [[31.138159]]


In [19]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1,activation='tanh'),
                              input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"CNN LSTM, Sigmoid activation, output {run+1}: {yhat}")

CNN LSTM, Sigmoid activation, output 1: [[18.309414]]
CNN LSTM, Sigmoid activation, output 2: [[20.266657]]
CNN LSTM, Sigmoid activation, output 3: [[16.436787]]


### 11.
What are the primary differences between the CNN LSTM and the Vanilla LSTM? Describe and explain what is occurring differently. When would you use the CNN LSTM instead of the Vanilla LSTM and vice versa?

The primary difference between a CNN LSTM and a Vanilla LSTM is that CNN LSTMs have a Convolutional Neural Network component.

CNN LSTM models improve accuracy in 2D models related to spatial inputs, like images and videos. Similar datasets, comprised of spatial data points, are not easily modeled using Vanilla LSTMs. Vanilla LSTMs should be used in simpler problems where a vanishing gradient is still a concern.

## ConvLSTM

In [6]:
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, rows, columns, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, 1, n_steps, n_features))

### 12.
Run the given code 3 times. Show all outputs.

In [13]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='relu', 
                         input_shape=(n_seq, 1, n_steps, n_features)))
    model.add(Flatten())
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Conv-LSTM, output {run+1}: {yhat}")

Conv-LSTM, output 1: [[103.641975]]
Conv-LSTM, output 2: [[104.15526]]
Conv-LSTM, output 3: [[104.22724]]


### 13. 
Change the activation function from 'relu' to 2 others. Run 3 times, show all outputs.

In [14]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='tanh',
                         input_shape=(n_seq, 1, n_steps, n_features)))
    model.add(Flatten())
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Conv-LSTM, TanH activation, output {run+1}: {yhat}")

Conv-LSTM, TanH activation, output 1: [[23.881386]]
Conv-LSTM, TanH activation, output 2: [[25.381102]]
Conv-LSTM, TanH activation, output 3: [[26.844866]]


In [15]:
for run in range(3):
    # define model
    model = Sequential()
    model.add(ConvLSTM2D(filters=64, kernel_size=(1,2), activation='sigmoid',
                         input_shape=(n_seq, 1, n_steps, n_features)))
    model.add(Flatten())
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')

    # fit model
    model.fit(X, y, epochs=500, verbose=0)

    # demonstrate prediction
    x_input = array([60, 70, 80, 90])
    x_input = x_input.reshape((1, n_seq, 1, n_steps, n_features))
    yhat = model.predict(x_input, verbose=0)
    print(f"Conv-LSTM, Sigmoid activation, output {run+1}: {yhat}")

Conv-LSTM, Sigmoid activation, output 1: [[21.401457]]
Conv-LSTM, Sigmoid activation, output 2: [[16.453602]]
Conv-LSTM, Sigmoid activation, output 3: [[19.233423]]


### 14.
What are the primary differences between the Conv‐LSTM and the CNN LSTM? Describe and explain what is occurring differently. When would you use the Conv‐LSTM instead of the CNN LSTM and vice versa?

The primary difference between a Conv-LSTM and a CNN LSTM is the ConvLSTM has Convolutional Neural Network directly within the LSTM, where the CNN LSTM is an integration between a CNN and an LSTM. The CNN layers output a 1-D result to the LSTM part of the model.

A Conv-LSTM is designed to take 3D input data, which is appropriate for problems that include video or radar image data.
A CNN LSTM is appropriate for 2D input data such as images.

### 15.
Provide a single table that illustrates the different LSTM models, the activation functions used, and the average of the 3 results (predictions) you received to 3 decimal places. What are your general findings?

| LSTM Model | Activation Function Used | Avg. of 3 Results | General Findings |
| ---------- | ------------------------ | ----------------- | ---------------- |
| Vanilla    | relu                     | 102.645 ||
| Vanilla    | tanh                     | 012.572 ||
| Vanilla    | sigmoid                  | 010.615 ||
| Stacked    | relu                     | 102.559 ||
| Stacked    | tanh                     | 019.420 ||
| Stacked    | sigmoid                  | 011.580 ||
| Bidirectional | relu                  | 101.688 ||
| Bidirectional | tanh                  | 026.747 ||
| Bidirectional | sigmoid               | 021.386 ||
| CNN        | relu                     | 103.034 ||
| CNN        | tanh                     | 030.697 ||
| CNN        | sigmoid                  | 018.338 ||
| Conv       | relu                     | 104.008 ||
| Conv       | tanh                     | 025.370 ||
| Conv       | sigmoid                  | 019.029 ||

Overall, the LSTM model used was less impactful than the activation function, where RelU was by far the most accurate.

Comparing the LSTM models using RelU activation function the most accurate models where, in order by decreasing accuracy: Bidirectional, Stacked, Vanilla, CNN, Conv.

The Bidrectional LSTM model being the most accurate is to be expected as that model is learning from the context of the data in relation to the "future" data, increasing the accuracy of the contextual prediction. The 