## 序贯（Sequential）模型
序贯模型是多个网络层的线性堆叠。

Dense层是常用的全连接层，所实现的运算是`output = activation(dot(input, kernel)+bias)`。其中`activatio`n是逐元素计算的激活函数，`kernel`是本层的权值矩阵，`bias`为偏置向量，只有当`use_bias=True`才会添加。

In [5]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(input_shape=(32,), units=784),
    Activation('relu'),
    Dense(units=10),
    Activation('softmax')
])

也可以通过`.add()`方法一个个地将layer加入模型中：

In [7]:
model = Sequential()
model.add(Dense(units=32, input_shape=(784, )))
model.add(Activation('relu'))

### 指定输入数据的shape
Sequential的第一层需要接收一个关于输入模型shape的参数，后面各个层可以自动推导出中间数据的shape。有几种方法指定第一层输入数据的shape。

+ 传递一个`input_shape`关键字给第一层，`input_shape`是一个tuple型数据，其中可以填入**None**，表示此位置**可能是任何正整数**，数据的batch_size不应包含在其中。
+ 有些2D层，如Dense，支持通过指定其输入维度`input_dim`来隐含的指定输入数据shape。一些3D的时域层支持通过参数`input_dim`和`input_length`来指定输入shape。
+ 如果你需要为输入指定一个固定大小的`batch_size`（常用于stateful RNN网络），可以传递batch_size参数到一个层中，例如你想指定输入张量的batch大小是32，数据shape是（6，8），则你需要传递`batch_size=32`和`input_shape=(6,8)`。

### 编译
`compile`接收三个参数。
+ 优化器optimizer：可以传入预定义的优化器名，如`rmsprop`，`adagrad`。或一个`Optimizer`类的对象。
+ 损失函数loss
+ 指标列表metrics。对于分类问题，一般将该列表设置为`metrics=['accuracy']`

In [11]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
             loss='binary_crossentropy',
             metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
             loss='mse')

# 自定义指标
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
             loss='binary_crossentropy',
             metrics=['accuracy', mean_pred])

### 训练
Keras以Numpy数组作为输入数据和标签的数据类型。训练模型一般使用fit函数，该函数的详情见这里。下面是一些例子。

In [15]:
# For a single-input model with 2 classes (binary classification):
model = Sequential()
model.add(Dense(units=32, activation='relu', input_dim=100))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2a0af727eb8>

In [20]:
# For a single-input model with 10 classes (categorical classification):

model = Sequential()
model.add(Dense(units=32, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

import keras
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical onehot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)

model.fit(data, one_hot_labels, epochs=20, batch_size=32)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x2a0b2de1668>

## 案例
### 基于多层感知器的softmax多分类

In [12]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

import numpy as np
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

x_train = np.random.random((1000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)

model = Sequential()

model.add(Dense(units=64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy', mean_pred])

model.fit(x_train, y_train, epochs=10, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
print(score)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[2.3003528118133545, 0.14000000059604645, 0.10000001639127731]


### 类似VGG的卷积神经网络

1. Conv2D
Conv2D中padding参数为valid或same。默认为valid。

2. Dropout
Dropout的比例是断开的神经元比例，也就是去除0.25（剩下0.75）。

3. Flatten
Flatten层将输入压平，把多维的输入一维化。

4. model.evaluate()
返回的结果对应compile中metrics参数列表，从第二个值开始对应，第一个值为loss值。假设将返回值变量赋给score，则`score[0] = loss, score[1] = metrics[0]，score[2] = metrics[1]`，以此类推。

In [2]:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import SGD

In [3]:
x_train = np.random.random((100, 100, 100, 3))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
x_test = np.random.random((20, 100, 100, 3))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(20, 1)), num_classes=10)

In [7]:
model = Sequential()
# 这里使用了32个size为3*3卷积层
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# padding参数为valid或same。默认为valid
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# dropout的比例是断开的神经元比例，也就是去除0.25（剩下0.75）
model.add(Dropout(0.25))

# Flatten层将输入压平，把多维的输入一维化
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
