In [1]:
from keras.models import Sequential
from keras.layers import Dense

Sequential?

Using TensorFlow backend.


# Getting started with the Keras Sequential model

The Sequential model is a linear stack of layers.

You can create a Sequential model by passing a list of layer instances to the constructor:

In [2]:
model = Sequential([
    Dense(32, activation='relu', input_shape=(784,)),
    Dense(10, activation='softmax'),
])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


You can also simply add layers via the .add() method:

In [3]:
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


## Specifying the input shape

The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:

* Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
* Some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.
* If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a batch_size argument to a layer. If you pass both batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8).

As such, the following snippets are strictly equivalent:

In [4]:
model = Sequential()
model.add(Dense(32, input_shape=(784,)))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


In [5]:
model = Sequential()
model.add(Dense(32, input_dim=784))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


## Compilation

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

* An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or adagrad), or an instance of the Optimizer class. See: optimizers.
* A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: losses.
* A list of metrics. For any classification problem you will want to set this to metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function.

We will separately go over optimizers, loss functions, and metrics in a later lesson.

In [6]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

## Training

Keras models are trained on Numpy arrays of input data and labels. For training a model we can use three functions:

* The fit function, this is the most basic
* The fit_generator. This is a bit more complicated as it takes in a generator instead of a numpy array. Often used for large datasets.
* the train_on_batch function which allows you to do a single gradient update over one batch of samples.

In [7]:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 32)                3232      
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 33        
Total params: 3,265
Trainable params: 3,265
Non-trainable params: 0
_________________________________________________________________


In [8]:
model.fit?

In [9]:
model.fit(
    data, 
    labels, 
    batch_size=32, 
    epochs=10, verbose=2, 
    callbacks=None, 
    validation_split=0.2, 
    validation_data=None, 
    shuffle=True, 
    class_weight=None, 
    sample_weight=None, 
    initial_epoch=0)

Train on 800 samples, validate on 200 samples
Epoch 1/10
0s - loss: 0.7249 - acc: 0.4738 - val_loss: 0.7026 - val_acc: 0.5300
Epoch 2/10
0s - loss: 0.7130 - acc: 0.4850 - val_loss: 0.6998 - val_acc: 0.5350
Epoch 3/10
0s - loss: 0.7050 - acc: 0.4863 - val_loss: 0.7050 - val_acc: 0.5000
Epoch 4/10
0s - loss: 0.6939 - acc: 0.5237 - val_loss: 0.7299 - val_acc: 0.4200
Epoch 5/10
0s - loss: 0.6932 - acc: 0.5350 - val_loss: 0.7286 - val_acc: 0.4300
Epoch 6/10
0s - loss: 0.6856 - acc: 0.5513 - val_loss: 0.7052 - val_acc: 0.5150
Epoch 7/10
0s - loss: 0.6804 - acc: 0.5887 - val_loss: 0.7062 - val_acc: 0.5150
Epoch 8/10
0s - loss: 0.6771 - acc: 0.5825 - val_loss: 0.7066 - val_acc: 0.5150
Epoch 9/10
0s - loss: 0.6695 - acc: 0.6100 - val_loss: 0.7090 - val_acc: 0.5450
Epoch 10/10
0s - loss: 0.6701 - acc: 0.5875 - val_loss: 0.7221 - val_acc: 0.4550


<keras.callbacks.History at 0x1132d3d10>

In [10]:
# The model will continue training where it left off
model.train_on_batch(
    data[:32],
    labels[:32],
    class_weight=None, 
    sample_weight=None,)

[0.63397765, 0.6875]

In [11]:
def data_gen():
    for datum, label in zip(data, labels):
        yield datum[None, :], label

In [12]:
# Be careful with steps per epoch because it can outlast the generator
model.fit_generator(
    data_gen(), 
    steps_per_epoch=900, 
    epochs=1, 
    verbose=1, 
    callbacks=None, 
    validation_data=None, # This can be a generator or a dataset
    validation_steps=None, 
    class_weight=None, 
    max_q_size=10, 
    workers=1, 
    pickle_safe=False, 
    initial_epoch=0)

Epoch 1/1


<keras.callbacks.History at 0x113b52c90>

## Evaluation

All evaluation methods have the same extra methods: X, X_on_batch, and X_generator, so I will leave it to the reader to explore those. I will show off all the Xs below:

* evaluate/test (test_on_batch)
* predict/predict_classes/predict_proba (only predict for generator etc)

In [13]:
model.evaluate(
    data, 
    labels, 
    batch_size=32, 
    verbose=1, 
    sample_weight=None)

  32/1000 [..............................] - ETA: 0s

[0.69383435416221617, 0.53900000000000003]

In [14]:
model.predict(
    data, 
    batch_size=32, 
    verbose=1)


  32/1000 [..............................] - ETA: 0s

array([[ 0.47716519],
       [ 0.66853929],
       [ 0.74089676],
       [ 0.67620659],
       [ 0.61077285],
       [ 0.74347764],
       [ 0.6085465 ],
       [ 0.67098713],
       [ 0.58207715],
       [ 0.68299049],
       [ 0.57352179],
       [ 0.62041366],
       [ 0.52467233],
       [ 0.67464489],
       [ 0.70867538],
       [ 0.52434838],
       [ 0.61905962],
       [ 0.55896056],
       [ 0.78814405],
       [ 0.67009896],
       [ 0.5257197 ],
       [ 0.52502114],
       [ 0.56842428],
       [ 0.65214133],
       [ 0.71219897],
       [ 0.6405549 ],
       [ 0.50973332],
       [ 0.59873462],
       [ 0.72551584],
       [ 0.47545141],
       [ 0.82394361],
       [ 0.73078161],
       [ 0.50666392],
       [ 0.50373143],
       [ 0.7241807 ],
       [ 0.65262187],
       [ 0.55081242],
       [ 0.55163211],
       [ 0.6980921 ],
       [ 0.73045397],
       [ 0.72161692],
       [ 0.48629966],
       [ 0.56592607],
       [ 0.38854933],
       [ 0.5390088 ],
       [ 0

In [15]:
model.predict_classes(
    data, 
    batch_size=32, 
    verbose=1)

  32/1000 [..............................] - ETA: 0s

array([[0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
    

In [16]:
model.predict_proba(
    data, 
    batch_size=32, 
    verbose=1)

  32/1000 [..............................] - ETA: 0s

array([[ 0.47716519],
       [ 0.66853929],
       [ 0.74089676],
       [ 0.67620659],
       [ 0.61077285],
       [ 0.74347764],
       [ 0.6085465 ],
       [ 0.67098713],
       [ 0.58207715],
       [ 0.68299049],
       [ 0.57352179],
       [ 0.62041366],
       [ 0.52467233],
       [ 0.67464489],
       [ 0.70867538],
       [ 0.52434838],
       [ 0.61905962],
       [ 0.55896056],
       [ 0.78814405],
       [ 0.67009896],
       [ 0.5257197 ],
       [ 0.52502114],
       [ 0.56842428],
       [ 0.65214133],
       [ 0.71219897],
       [ 0.6405549 ],
       [ 0.50973332],
       [ 0.59873462],
       [ 0.72551584],
       [ 0.47545141],
       [ 0.82394361],
       [ 0.73078161],
       [ 0.50666392],
       [ 0.50373143],
       [ 0.7241807 ],
       [ 0.65262187],
       [ 0.55081242],
       [ 0.55163211],
       [ 0.6980921 ],
       [ 0.73045397],
       [ 0.72161692],
       [ 0.48629966],
       [ 0.56592607],
       [ 0.38854933],
       [ 0.5390088 ],
       [ 0