## LSTM - Basic Introduction

Four different types of sequence prediction problems:
1. Sequence Prediction (Weather forcast, stock market, product recomendation).
2. Sequence Classification (DNA sequence classification, Anomaly detection, Sentiment analysis).
3. Sequence Generation (Text, ,music, image caption generation).
4. Sequence-to-Sequence Prediction (time series prediction, text summarization, program extraction).

Poblem of using MLP for LSTM problems:

The MLP is better than ARIMA because: it is robust to Noise and missing values and it is nonlinear, can have multivariate inputs, multi-step outputs

Issues:

- Stateless, and it is unaware of Temporal structure
- Messy Scalling, fixed size input and outputs

### Input data prepration:
##### Scaling:
- Normalize Series Data

In [77]:
from pandas import Series 
from sklearn.preprocessing import MinMaxScaler
data = [10, 20, 30, 40, 50, 60]
series=Series(data)
print(series)

0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64


In [78]:
values = series.values
values = values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0,1))
scaler = scaler.fit(values)
print('Min: %f, Max: %f' % (scaler.data_min_, scaler.data_max_))

Min: 10.000000, Max: 60.000000




In [79]:
normalized = scaler.transform(values)
print(normalized)

[[ 0. ]
 [ 0.2]
 [ 0.4]
 [ 0.6]
 [ 0.8]
 [ 1. ]]


In [80]:
# Denormalize
inversed = scaler.inverse_transform(normalized)
print(inversed)

[[ 10.]
 [ 20.]
 [ 30.]
 [ 40.]
 [ 50.]
 [ 60.]]


- Standardize

In [81]:
from pandas import Series
from sklearn.preprocessing import StandardScaler
from math import sqrt
# define contrived series
data = [1.0, 5.5, 9.0, 2.6, 8.8, 3.0, 4.1, 7.9, 6.3]
series = Series(data)
print(series)
# prepare data for normalization
values = series.values
values = values.reshape(-1, 1)
# train the normalization
scaler = StandardScaler()
scaler = scaler.fit(values)
print('Mean: %f, StandardDeviation: %f' % (scaler.mean_, sqrt(scaler.var_)))
# normalize the dataset and print
standardized = scaler.transform(values)
print(standardized)
# inverse transform and print
inversed = scaler.inverse_transform(standardized)
print(inversed)

0    1.0
1    5.5
2    9.0
3    2.6
4    8.8
5    3.0
6    4.1
7    7.9
8    6.3
dtype: float64
Mean: 5.355556, StandardDeviation: 2.712568
[[-1.60569456]
 [ 0.05325007]
 [ 1.34354035]
 [-1.01584758]
 [ 1.26980948]
 [-0.86838584]
 [-0.46286604]
 [ 0.93802055]
 [ 0.34817357]]
[[ 1. ]
 [ 5.5]
 [ 9. ]
 [ 2.6]
 [ 8.8]
 [ 3. ]
 [ 4.1]
 [ 7.9]
 [ 6.3]]


##### Categorical Data

- Integer Encoding
- One Hot Encoding

In [82]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
print(data)

['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']


In [83]:
values = np.array(data)
print(values)

['cold' 'cold' 'warm' 'cold' 'hot' 'hot' 'warm' 'cold' 'warm' 'hot']


In [84]:
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print(integer_encoded)

[0 0 2 0 1 1 2 0 2 1]


In [85]:
onehot_encoded = OneHotEncoder(sparse = False)
integer_encoded = integer_encoded.reshape(-1,1)
print(integer_encoded)
onehot_encoded = onehot_encoded.fit_transform(integer_encoded)
print(onehot_encoded)

[[0]
 [0]
 [2]
 [0]
 [1]
 [1]
 [2]
 [0]
 [2]
 [1]]
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]


In [88]:
inverted = label_encoder.inverse_transform([np.argmax(onehot_encoded[0,:])])
print(inverted)

['cold']


## Sequence Prepration

- Pre-Sequence Padding

In [89]:
from keras.preprocessing.sequence import pad_sequences

Using TensorFlow backend.


In [91]:
sequences = [[1,2,3,4],[1,2,3],[1]]
padded = pad_sequences(sequences)
print(padded)

[[1 2 3 4]
 [0 1 2 3]
 [0 0 0 1]]


- Pre-Sequence Padding

In [92]:
padded = pad_sequences(sequences, padding='post')
print(padded)

[[1 2 3 4]
 [1 2 3 0]
 [1 0 0 0]]


- Pre-Sequence Truncation

In [93]:
truncated = pad_sequences (sequences, maxlen=2) 
print(truncated)

[[3 4]
 [2 3]
 [0 1]]


- Post-Sequence Truncation

In [94]:
truncated = pad_sequences (sequences, maxlen=2, truncating = 'post') 
print(truncated)

[[1 2]
 [1 2]
 [0 1]]


- Data Shifting

In [95]:
from pandas import DataFrame
# define the sequence
df = DataFrame()
df['t'] = [x for x in range(10)]
# shift backward
df['t+1'] = df['t'].shift(-1)
print(df)

   t  t+1
0  0  1.0
1  1  2.0
2  2  3.0
3  3  4.0
4  4  5.0
5  5  6.0
6  6  7.0
7  7  8.0
8  8  9.0
9  9  NaN


# LSTM in Keras

- Input: it  must be three-dimensional, comprised of samples, time steps,and features in that order.
    Samples. These are the rows in your data. One sample may be one sequence.
    Time steps. These are the past observations for a feature, such as lag variables.
    Features. These are columns in your data.
- output layer:
    Regression: linear activation fucntion
    Binary Classification: logistic activation
    Multiclass Classification: foftmax activation
- Loss funcion:
    Regression: Mean Square Error 
    Binary Classification: Logarithmic Loss (binary_crossantroopy
    Multiclass Classification: Multiclass Logarithmic Loss (categorical_crossantropy)

## LSTM State Management

- By default, the internal state of all LSTM memory units in the network is reset after each batch, e.g. when the network weights are updated.
- To address this issue, Keras provides exibility to decouple the resetting of internal state from updates to network weights by decoupling an LSTM layer as stateful. This can be done by setting the stateful argument on the LSTM layer to True.
- A stateful LSTM will not reset the internal state at the end of each batch. Instead, you have fine grained control over when to reset the internal state by calling the reset states() function.
- By default, the samples within an epoch are shuffled. This is a good practice when working with Multilayer Perceptron neural networks. If you are trying to preserve state across samples, then the order of samples in the training dataset may be important and must be preserved.

Ex. 

 model.fit(X, y, epochs=1, shuffle=False, batch_input_shape=(10, 5, 1))
 
 model.reset_states()
 
 
#### 1D Input Example (input_shape=(10, 1)):

In [102]:
from numpy import array
data = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
data = data.reshape((1, 10, 1))
print(data.shape)

(1, 10, 1)


#### 1D Input Example (input_shape=(10, 2)):

In [103]:
from numpy import array
data = array([
[0.1, 1.0],
[0.2, 0.9],
[0.3, 0.8],
[0.4, 0.7],
[0.5, 0.6],
[0.6, 0.5],
[0.7, 0.4],
[0.8, 0.3],
[0.9, 0.2],
[1.0, 0.1]])
data = data.reshape(1, 10, 2)
print(data.shape)

(1, 10, 2)


### RNN Models:
- Sequence Prediction:
    One-to-One Model (weather forcasting)
    One-to-Many Model (predicting a sequence of word from single image, or forcasting based on one event)
    Many-to-One Model (Forcasting based on the sequence, prdicting the classification value)
    Many-to-Many Model (text summerization, and classify a sequence of audio into a sequence of words)
    