In [1]:
from keras.models import Sequential
from keras.layers import Dense

Sequential?

Using TensorFlow backend.


# Getting started with the Keras Sequential model

The Sequential model is a linear stack of layers.

You can create a Sequential model by passing a list of layer instances to the constructor:

In [3]:
model = Sequential([
    Dense(32, activation='relu', input_shape=(784,)),
    Dense(10, activation='softmax'),
])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


You can also simply add layers via the .add() method:

In [4]:
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


## Specifying the input shape

The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:

* Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
* Some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.
* If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a batch_size argument to a layer. If you pass both batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8).

As such, the following snippets are strictly equivalent:

In [5]:
model = Sequential()
model.add(Dense(32, input_shape=(784,)))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


In [6]:
model = Sequential()
model.add(Dense(32, input_dim=784))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


## Compilation

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

* An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or adagrad), or an instance of the Optimizer class. See: optimizers.
* A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: losses.
* A list of metrics. For any classification problem you will want to set this to metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function.

We will separately go over optimizers, loss functions, and metrics in a later lesson.

In [7]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

## Training

Keras models are trained on Numpy arrays of input data and labels. For training a model we can use three functions:

* The fit function, this is the most basic
* The fit_generator. This is a bit more complicated as it takes in a generator instead of a numpy array. Often used for large datasets.
* the train_on_batch function which allows you to do a single gradient update over one batch of samples.

In [9]:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_11 (Dense)             (None, 32)                3232      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 33        
Total params: 3,265
Trainable params: 3,265
Non-trainable params: 0
_________________________________________________________________


In [10]:
model.fit?

In [13]:
model.fit(
    data, 
    labels, 
    batch_size=32, 
    epochs=10, verbose=2, 
    callbacks=None, 
    validation_split=0.2, 
    validation_data=None, 
    shuffle=True, 
    class_weight=None, 
    sample_weight=None, 
    initial_epoch=0)

Train on 800 samples, validate on 200 samples
Epoch 1/10
0s - loss: 0.7218 - acc: 0.4913 - val_loss: 0.6852 - val_acc: 0.5700
Epoch 2/10
0s - loss: 0.7066 - acc: 0.5112 - val_loss: 0.6943 - val_acc: 0.5000
Epoch 3/10
0s - loss: 0.7008 - acc: 0.5162 - val_loss: 0.6994 - val_acc: 0.4750
Epoch 4/10
0s - loss: 0.6935 - acc: 0.5200 - val_loss: 0.7150 - val_acc: 0.4750
Epoch 5/10
0s - loss: 0.6861 - acc: 0.5613 - val_loss: 0.6993 - val_acc: 0.4600
Epoch 6/10
0s - loss: 0.6813 - acc: 0.5675 - val_loss: 0.7029 - val_acc: 0.4950
Epoch 7/10
0s - loss: 0.6761 - acc: 0.5900 - val_loss: 0.7050 - val_acc: 0.4750
Epoch 8/10
0s - loss: 0.6719 - acc: 0.6062 - val_loss: 0.7122 - val_acc: 0.5000
Epoch 9/10
0s - loss: 0.6666 - acc: 0.6112 - val_loss: 0.7044 - val_acc: 0.5000
Epoch 10/10
0s - loss: 0.6633 - acc: 0.6075 - val_loss: 0.7069 - val_acc: 0.5000


<keras.callbacks.History at 0x113b06710>

In [14]:
# The model will continue training where it left off
model.train_on_batch(
    data[:32],
    labels[:32],
    class_weight=None, 
    sample_weight=None,)

[0.67578197, 0.53125]

In [35]:
def data_gen():
    for datum, label in zip(data, labels):
        yield datum[None, :], label

In [37]:
# Be careful with steps per epoch because it can outlast the generator
model.fit_generator(
    data_gen(), 
    steps_per_epoch=900, 
    epochs=1, 
    verbose=1, 
    callbacks=None, 
    validation_data=None, # This can be a generator or a dataset
    validation_steps=None, 
    class_weight=None, 
    max_q_size=10, 
    workers=1, 
    pickle_safe=False, 
    initial_epoch=0)

Epoch 1/1


<keras.callbacks.History at 0x114094290>

## Evaluation

All evaluation methods have the same extra methods: X, X_on_batch, and X_generator, so I will leave it to the reader to explore those. I will show off all the Xs below:

* evaluate/test (test_on_batch)
* predict/predict_classes/predict_proba (only predict for generator etc)

In [38]:
model.evaluate(
    data, 
    labels, 
    batch_size=32, 
    verbose=1, 
    sample_weight=None)

  32/1000 [..............................] - ETA: 0s

[0.64754414463043208, 0.63900000000000001]

In [39]:
model.predict(
    data, 
    batch_size=32, 
    verbose=1)


  32/1000 [..............................] - ETA: 0s

array([[ 0.66830254],
       [ 0.73722982],
       [ 0.69109821],
       [ 0.55633432],
       [ 0.56893396],
       [ 0.44622466],
       [ 0.4841831 ],
       [ 0.6196698 ],
       [ 0.50672472],
       [ 0.34392637],
       [ 0.31212413],
       [ 0.36759338],
       [ 0.64536023],
       [ 0.3909947 ],
       [ 0.53375876],
       [ 0.48460501],
       [ 0.46276298],
       [ 0.46837184],
       [ 0.60835433],
       [ 0.54511958],
       [ 0.53848141],
       [ 0.3768599 ],
       [ 0.5122444 ],
       [ 0.61772388],
       [ 0.55194932],
       [ 0.5927549 ],
       [ 0.67563736],
       [ 0.58783245],
       [ 0.48061544],
       [ 0.55602235],
       [ 0.49298742],
       [ 0.48482841],
       [ 0.41475126],
       [ 0.54708761],
       [ 0.51745051],
       [ 0.55852705],
       [ 0.57718587],
       [ 0.48399448],
       [ 0.51232857],
       [ 0.53579825],
       [ 0.45303559],
       [ 0.48525029],
       [ 0.56106049],
       [ 0.49302259],
       [ 0.59903908],
       [ 0

In [40]:
model.predict_classes(
    data, 
    batch_size=32, 
    verbose=1)

  32/1000 [..............................] - ETA: 0s

array([[1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
    

In [41]:
model.predict_proba(
    data, 
    batch_size=32, 
    verbose=1)

  32/1000 [..............................] - ETA: 0s

array([[ 0.66830254],
       [ 0.73722982],
       [ 0.69109821],
       [ 0.55633432],
       [ 0.56893396],
       [ 0.44622466],
       [ 0.4841831 ],
       [ 0.6196698 ],
       [ 0.50672472],
       [ 0.34392637],
       [ 0.31212413],
       [ 0.36759338],
       [ 0.64536023],
       [ 0.3909947 ],
       [ 0.53375876],
       [ 0.48460501],
       [ 0.46276298],
       [ 0.46837184],
       [ 0.60835433],
       [ 0.54511958],
       [ 0.53848141],
       [ 0.3768599 ],
       [ 0.5122444 ],
       [ 0.61772388],
       [ 0.55194932],
       [ 0.5927549 ],
       [ 0.67563736],
       [ 0.58783245],
       [ 0.48061544],
       [ 0.55602235],
       [ 0.49298742],
       [ 0.48482841],
       [ 0.41475126],
       [ 0.54708761],
       [ 0.51745051],
       [ 0.55852705],
       [ 0.57718587],
       [ 0.48399448],
       [ 0.51232857],
       [ 0.53579825],
       [ 0.45303559],
       [ 0.48525029],
       [ 0.56106049],
       [ 0.49302259],
       [ 0.59903908],
       [ 0