# How to Develop LSTMs in Keras

In [None]:
# Define the Model
model = Sequential()
model.add(LSTM(2))  #LSTM hidden layer with 2 memory cells
model.add(Dense(1))

### Reshaping Data

The first hidden layer in the network must define the number of inputs to expect, e.g. the
shape of the input layer. Input must be three-dimensional, comprised of samples, time steps, and features in that order.

    1. Samples. These are the rows in your data. One sample may be one sequence.
    2. Time steps. These are the past observations for a feature, such as lag variables.
    3. Features. These are columns in your data.
    
Assuming your data is loaded as a NumPy array, you can convert a 1D or 2D dataset to
a 3D dataset using the reshape() function in NumPy. You can call the reshape() function
on your NumPy array and pass it a tuple of the dimensions to which to transform your data.
Imagine we had 2 columns of input data (X) in a NumPy array. We could treat the two columns as two time steps and reshape it as follows:    

In [None]:
data = data.reshape((data.shape[0], data.shape[1], 1))

If you would like columns in your 2D data to become features with one time step, you can
reshape it as follows:

In [None]:
data = data.reshape((data.shape[0], 1, data.shape[1]))

### Defining Model

You can specify the input shape argument that expects a tuple containing the number of
time steps and the number of features. For example, if we had two time steps and one feature for a univariate sequence with two lag observations per row, it would be specified as follows:

In [None]:
model = Sequential()
model.add(LSTM(5, input_shape=(2,1))) #here 2 is the number of time steps and 1 number of features, and 5 number of cells
model.add(Dense(1))


In [None]:
# with activation function
model = Sequential()
model.add(LSTM(5, input_shape=(2,1)))
model.add(Dense(1))
model.add(Activation('sigmoid')) # sigmoid for binary classification.

The choice of activation function is most important for the output layer as it will define the format that predictions will take. For example, below are some common predictive modeling problem types and the structure and standard activation function that you can use in the output layer:

    1. Regression: Linear activation function, or 'linear', and the number of neurons matching the number of outputs. This is the default activation function used for neurons in the Dense layer.
    
    2. Binary Classification (2 class): Logistic activation function, or 'sigmoid', and one neuron the output layer.
    
    3. Multiclass Classification (> 2 class): Softmax activation function, or 
    'softmax', and one output neuron per class value, assuming a one hot encoded output pattern.




### Compiling Model

In [None]:
# compile the model
model.compile(optimizer='sgd', loss='mse') #

Alternately, the optimizer can be created and configured before being provided as an argument to the compilation step.

In [None]:
algorithm = SGD(lr=0.1, momentum=0.3)
model.compile(optimizer=algorithm, loss= 'mse' )

The type of predictive modeling problem imposes constraints on the type of loss function
that can be used. For example, below are some standard loss functions for different predictive model types:
    
    1. Regression: Mean Squared Error or mean squared error, mse for short.
    
    2. Binary Classification (2 class): Logarithmic Loss, also called cross entropy or binary crossentropy.

    3. Multiclass Classification (> 2 class): Multiclass Logarithmic Loss or categorical crossentropy.


The most common optimization algorithm is classical stochastic gradient descent, but Keras
also supports a suite of other extensions of this classic optimization algorithm that work well with little or no configuration. Perhaps the most commonly used optimization algorithms because of their generally better performance are:
    
    1. Stochastic Gradient Descent, or sgd.
    2. Adam, or adam.
    3. RMSprop, or rmsprop.
    

Finally, you can also specify metrics to collect while fitting your model in addition to the loss function. Generally, the most useful additional metric to collect is accuracy for classification problems (e.g. ‘accuracy’ or ‘acc’ for short). The metrics to collect are specified by name in an array of metric or loss function names. For example:

In [None]:
model.compile(optimizer= sgd , loss= mean_squared_error , metrics=['accuracy'])

### Fit the Model

In [None]:
model.fit(X, y, batch_size=32, epochs=100)

Training can take a long time, from seconds to hours to days depending on the size of
the network and the size of the training data. By default, a progress bar is displayed on the command line for each epoch. This may create too much noise for you, or may cause problems for your environment, such as if you are in an interactive notebook or IDE. You can reduce the amount of information displayed to just the loss each epoch by setting the verbose argument to 2. You can turn o↵ all output by setting verbose to 0. For example:
    

In [None]:
history = model.fit(X, y, batch_size=10, epochs=100, verbose=0)

### Evaluate the Model

In [None]:
loss, accuracy = model.evaluate(X, y)

As with fitting the network, verbose output is provided to give an idea of the progress of
evaluating the model. We can turn this off by setting the verbose argument to 0.

In [None]:
loss, accuracy = model.evaluate(X, y, verbose=0)

### Make Predictions on the Model

Once we are satisfied with the performance of our fit model, we can use it to make predictions on new data. This is as easy as calling the predict() function on the model with an array of new input patterns. For example:

In [None]:
predictions = model.predict(X)

Alternately, for classification problems, we can use the predict classes() function that will automatically convert uncrisp predictions to crisp integer class values.

In [None]:
predictions = model.predict_classes(X)

As with fitting and evaluating the network, verbose output is provided to give an idea of
the progress of the model making predictions. We can turn this o↵ by setting the verbose
argument to 0.

In [None]:
predictions = model.predict(X, verbose=0)

### LSTM State Management

Keras provides flexibility to decouple the resetting of internal state from updates to network weights by defining an LSTM layer as stateful. This can be done by setting the stateful argument on the LSTM layer to True. When stateful LSTM layers are used, you must also define the batch size as part of the input shape in the definition of the network by setting the batch input shape argument and the batch size must be a factor of the number of samples in the training dataset. The batch input shape argument requires a 3-dimensional tuple defined as batch size, time steps, and features.

For example, we can define a stateful LSTM to be trained on a training dataset with 100
samples, a batch size of 10, and 5 time steps for 1 feature, as follows.


In [None]:
model.add(LSTM(2, stateful=True, batch_input_shape=(10, 5, 1)))# 10=batch size, 5 time step, 1=feature

A stateful LSTM will not reset the internal state at the end of each batch. Instead, you
have fine grained control over when to reset the internal state by calling the reset states() function. For example, we may want to reset the internal state at the end of each single epoch which we could do as follows:

In [None]:
for i in range(1000):
model.fit(X, y, epochs=1, batch_input_shape=(10, 5, 1))
model.reset_states()


The same batch size used in the definition of the stateful LSTM must also be used when
making predictions.

In [None]:
predictions = model.predict(X, batch_size=10)

By default, the samples within an epoch are shu✏ed. This is a good practice when working
with Multilayer Perceptron neural networks. If you are trying to preserve state across samples, then the order of samples in the training dataset may be important and must be preserved. This can be done by setting the shuffle argument in the fit() function to False. For example:

In [None]:
for i in range(1000):
model.fit(X, y, epochs=1, shuffle=False, batch_input_shape=(10, 5, 1))
model.reset_states()

### Example of LSTM With Single Input Sample

In [4]:
import numpy as np
data = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
data

array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [8]:
data = data.reshape((1, 10, 1))
print(data.shape)

(1, 10, 1)


### Example of LSTM With Multiple Input Features

In [10]:
data = np.array([
[0.1, 1.0],
[0.2, 0.9],
[0.3, 0.8],
[0.4, 0.7],
[0.5, 0.6],
[0.6, 0.5],
[0.7, 0.4],
[0.8, 0.3],
[0.9, 0.2],
[1.0, 0.1]])

In [11]:
# reshape data
data = data.reshape(1, 10, 2)

In [None]:
# model
model = Sequential()
model.add(LSTM(32, input_shape=(10, 2)))