# Sequential model

## [Getting started with the Keras Sequential model](https://keras.io/getting-started/sequential-model-guide/#getting-started-with-the-keras-sequential-model)

The [Sequential Model](https://keras.io/getting-started/sequential-model-guide/) is a linear stak of layers.

In [2]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD

In [4]:
model = Sequential([
    Dense(32, input_shape=(784,)), # first hidden layer, the first parameter specifies the number of neurons
    Activation('relu'),
    Dense(10), # the final layer will be the output layer, therefore there are 10 output neurons
    Activation('softmax'),
])

[Dense](https://keras.io/layers/core/#dense) implements the operation: `output = activation(dot(input, kernel) + bias)` where `activation` is the element-wise activation function passed as the `activation` argument, `kernel` is a weights matrix created by the layer, and `bias` is a bias vector created by the layer (only applicable if `use_bias` is `True`). Default activation is the identity function. We also have the option to initialize and regularize the kernel and bias. Refer the [initializers](initializers_regularizers.ipynb#Initializers) and [regularizers](initializers_regularizers.ipynb#Regularizers) notebooks respectively.
 
*Note:* If the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with `kernel`.

All the possible layers in keras can be [found here](https://keras.io/layers/core/).

You can also simply add layers via the `.add()` method:

In [5]:
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_dim=784)) # input_dim is another way to speify the size of input
model.add(Activation('relu'))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(10, activation='softmax'))

### [Activations](https://keras.io/activations/)

Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers. Check out the documentation for the different kinds of activation functions possible.

### Input and Output shapes
We only need to specify the input shape for the first layer, because the following layers can do automatic shape inference. In `input_shape`, the batch dimension is not included.

If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a `batch_size` argument to a layer. The most common situation would be a 2D input with shape (batch_size, input_dim). For a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).

## [Compilation](https://keras.io/getting-started/sequential-model-guide/#compilation)

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:


- [optimizer](https://keras.io/optimizers/):  This could be the string identifier of an existing optimizer (such as `rmsprop` or `adagrad`), or an instance of the Optimizer class.
```
from keras.optimizers import SGD
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
```
or
```
### pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')
```
- [loss function](https://keras.io/losses/):  This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as `categorical_crossentropy` or `mse`), or it can be an objective function.
- [list of metrics](https://keras.io/metrics/):  For any classification problem you will want to set this to `metrics=['accuracy']`. A metric could be the string identifier of an existing metric or a custom metric function. A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model.


In [6]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

## [Training](https://keras.io/getting-started/sequential-model-guide/#training)

Keras models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the `fit` function.

### Example 1: Two-class classification

In [7]:
# For a single-input model with 2 classes (binary classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1d01411dda0>

#### Accuracy of this model on training data

In [8]:
pred = model.predict(data)
from sklearn.metrics import confusion_matrix
cmat = confusion_matrix(pred.round(2)>0.5, labels)
print("accuracy on training data =", cmat.diagonal().sum()/cmat.sum())
# from keras.utils import plot_model
# plot_model(model, to_file='model.png')

accuracy on training data = 0.567


#### Improving accuracy with more epochs

We can improve the accuracy on the training set drastically by iterating for more epochs. But this is not how the number of epochs should be chosen in pratice because this might lead to overfitting.

Set `verbose=0` to avoid printing all the interations.

In [9]:
model.fit(data, labels, epochs=50, batch_size=32, verbose=0)
pred = model.predict(data)
from sklearn.metrics import confusion_matrix
cmat = confusion_matrix(pred.round(2)>0.5, labels)
print("accuracy on training data =", cmat.diagonal().sum()/cmat.sum())

accuracy on training data = 0.74


#### Visualizing the model
```
from keras.utils import plot_model
plot_model(model, to_file='bin_classification.png')
```

### Example 2: Multi-class classification

In [10]:
from keras.utils import to_categorical

# For a single-input model with 10 classes (categorical classification):
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding
one_hot_labels = to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=150, batch_size=32, verbose = 0)

# Accuracy on the training data
pred = model.predict(data)

# taking the maximum output as the predicted class label and then building the confusion matrix
cmat = confusion_matrix(pred.argmax(axis = 1), labels)
print(cmat)
print("accuracy on training data =", cmat.diagonal().sum()/cmat.sum())

[[38  1  3  1  4  1  2  2  2  3]
 [ 9 83  6 10  4 10  8  7  6  5]
 [ 8  3 62  4  1  3  4  6  1  5]
 [11  3  7 73  9  4  6  7  3  3]
 [ 7  1  2 13 71  5  5  5  5  4]
 [ 6  3  4  2  6 74  6  3  6  3]
 [ 2  2  0  2  0  0 58  7  0  3]
 [ 1  1  1  0  5  0  1 58  3  7]
 [ 6  3  8  5  2  5  8 11 71 10]
 [ 2  1  1  0  4  0  1  5  2 45]]
accuracy on training data = 0.633


### Example 3: Multilayer Perceptron (MLP) for multi-class softmax classification:

`Dropout` consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.

In [11]:
# Generate dummy data
import numpy as np
x_train = np.random.random((1000, 20))
y_train = to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
score

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


[2.3028013706207275, 0.10000000149011612]