# keras函数式API
* 多输入模型
* 多输出模型：适用于不同的输出具有统计相关性
* 类图模型
  * **Inception系列网络**：依赖于Inception模块，输入被多个并行的卷积分支处理
  * **残差连接**：最早出现于ResNet网络，将前面的表示重新注入下游数据流中


![](https://dpzbhybb2pdcj.cloudfront.net/chollet/Figures/07fig02.jpg)
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/HighResolutionFigures/figure_7-3.png)
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/Figures/07fig04.jpg)
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/HighResolutionFigures/figure_7-5.png)

## 函数式API简介

In [None]:
from keras import Input, layers
input_tensor = Input(shape=(32,))
dense = layers.Dense(32, activation='relu') # 每个层是一个函数
output_tensor = dense(input_tensor)         # 张量 = 层函数（张量）

In [2]:
from keras.models import Sequential, Model
from keras import layers
from keras import Input

seq_model = Sequential()           # Sequentiao模型
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))
seq_model.summary()

input_tensor = Input(shape=(64,))  # 函数式API模型
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
api_model = Model(input_tensor, output_tensor)
api_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_8 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_9 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_11

<keras.callbacks.History at 0x2a50cac3c88>



11.550111495971679

In [None]:
api_model.compile(optimizer='rmsprop', loss='categorical_crossentropy') # 编译，相同

import numpy as np
x_train = np.random.random((1000, 64))
y_train = np.random.random((1000, 10))

api_model.fit(x_train, y_train, epochs=10, batch_size=128)               # 训练相同
api_model.evaluate(x_train, y_train)                                     # 评估相同

如果使用不相关的输入、输出构建模型，会得到`RuntimeError`  
这个报错告诉我们，Keras无法从给定输出张量到达`input_1`
~~~python
>>> unrelated_input = Input(shape=(32,))
>>> bad_model = model = Model(unrelated_input, output_tensor)
RuntimeError: Graph disconnected: cannot
obtain value for tensor
Tensor("input_1:0", shape=(?, 64), dtype=float32) at layer "input_1".
~~~

## 多输入模型
* 张量的组合方式：相加`keras.layers.add`，连接`keras.layers.concatenate`等
* 典型的问答模型连个输入：一个自然语言描述问题；一个文本片段（提供用于回答问题的信息）。然后模型生成一个回答，最简单情况下，回答只包含一个单词，通过对预定义的词表做softmax得到。
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/HighResolutionFigures/figure_7-6.png)

In [4]:
from keras.models import Model  # 一个双输入的问答模型
from keras import layers
from keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

text_input = Input(shape=(None,), dtype='int32', name='text')

embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

question_input = Input(shape=(None,),dtype='int32', name='question')
embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

concatenated = layers.concatenate([encoded_text, encoded_question],axis=-1)
answer = layers.Dense(answer_vocabulary_size,activation='softmax')(concatenated)

model = Model([text_input, question_input], answer) # 指定连个输入和输出
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['acc'])
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               (None, None)         0                                            
__________________________________________________________________________________________________
question (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_3 (Embedding)         (None, None, 10000)  640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_4 (Embedding)         (None, None, 10000)  320000      question[0][0]                   
__________________________________________________________________________________________________
lstm_3 (LS

In [5]:
import numpy as np

num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size,size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))
answers = np.random.randint(0, 1,size=(num_samples, answer_vocabulary_size))

model.fit([text, question], answers, 
          epochs=10, batch_size=128) # 
model.fit({'text': text, 'question': question}, answers,
          epochs=10, batch_size=128) # 对输入命名之后才能这样做。


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2a6039000b8>

## 多输出模型
* 为不同的输出指定不同的损失函数，使用不同的损失加权
* 例子中预测：年龄（标量回归）、收入（标量会不）、性别（二分类）
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/Figures/07fig07.jpg)

In [None]:
from keras import layers
from keras import Input
from keras.models import Model
vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)

age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups,activation='softmax',name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)

model = Model(posts_input,[age_prediction, income_prediction, gender_prediction])

In [None]:
model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights=[0.25, 1., 10.])
model.compile(optimizer='rmsprop',                        # 使用不同的损失加权
              loss={'age': 'mse',                         # [3,5]左右
                    'income': 'categorical_crossentropy', # 
                    'gender': 'binary_crossentropy'},     # 0.1左右
              loss_weights={'age': 0.25,
                            'income': 1.,
                            'gender': 10.})


In [None]:
model.fit(posts, [age_targets, income_targets, gender_targets],
          epochs=10, batch_size=64)
model.fit(posts, {'age': age_targets,
                  'income': income_targets,
                  'gender': gender_targets},
          epochs=10, batch_size=64)

## 层组成的有向无环图

### Inception模块
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/Figures/07fig08.jpg)
* Inception模块：分别学习空间特征和主通道特征
* Inception V3位于：`keras.applications.inception_v3`包括在ImageNet上预训练得到的权重
* Xception：代表极端的Inception。空间卷积+逐点卷积

In [None]:
from keras import layers
branch_a = layers.Conv2D(128, 1,activation='relu', strides=2)(x)

branch_b = layers.Conv2D(128, 1, activation='relu')(x)
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2)(x)
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

### 残差连接
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/HighResolutionFigures/figure_7-5.png)
* 解决了大规模深度学习的两个共性问题：
  * **梯度消失（引入一个纯线性的信息携带轨道）**
  * **表示瓶颈（弥补信息在传播过程中的丢失）**。
* 前面的层不是和后面的层连接在一起，而是**与后面层的激活相加**
  * 形状相同：**恒等残差连接**
  * 形状不同：**线性残差连接**（可以是不带激活的Dense层；对于卷积特征图是不待激活的1*1卷积）

In [None]:
from keras import layers  # 恒等残差连接
x = ...
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

y = layers.add([y, x]) 

In [None]:
from keras import layers  # 线性残差连接
x = ...
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)

residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)

y = layers.add([y, residual]) 

## 共享层权重

In [None]:
from keras import layers
from keras import Input
from keras.models import Model

lstm = layers.LSTM(32) # 将一个LSTM实例化一次，多次重复使用一个层实例

left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

model = Model([left_input, right_input], predictions)
model.fit([left_data, right_data], targets)

## 模型作为层
使用**双摄像头作为输入**的视觉模型可以**感知深度**。Keras中实现连体视觉模型（共享卷积基）如下cell
~~~python
y = model(x)
y1, y2 = model([x1, x2])
~~~

In [None]:
from keras import layers
from keras import applications
from keras import Input

xception_base = applications.Xception(weights=None,include_top=False)

left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))

left_features = xception_base(left_input)
right_input = xception_base(right_input)

merged_features = layers.concatenate([left_features, right_input], axis=-1)

# Keras回调函数和TensorBoard
## 训练过程中将回调函数用于模型

* **模型检查点**：训练中保存模型
* **提前终止**：如果随时不再改善，则中断训练
* **训练过程中动态调节参数**：例如优化器的学习率
* **记录训练指标和验证指标**：将模型表示可视化。keras进度条就是一个回调函数
~~~python
keras.callbacks.ModelCheckpoint
keras.callbacks.EarlyStopping
keras.callbacks.LearningRateScheduler
keras.callbacks.ReduceLROnPlateau
keras.callbacks.CSVLogger
~~~

In [None]:
import keras                       # EarlyStopping 通常和 ModelCheckpoint 结合使用
callbacks_list = [
    keras.callbacks.EarlyStopping( # 如果不再改善，则中段训练
        monitor='acc',             # 监控指标为验证精度
        patience=1,                # 在多余一轮的时间不再改善则中断训练
    ),
    keras.callbacks.ModelCheckpoint( # 每轮过后，保存模型权重
        filepath='my_model.h5',      # 保存路径
        monitor='val_loss',          # 监控指标
        save_best_only=True,         # 如果 val_loss 没有改善则不覆盖文件
    )
    ...
    keras.callbacks.ReduceLROnPlateau( # 降低学习率，回调函数
        monitor='val_loss'             # 监控指标
        factor=0.1,                    # 触发时将学习率除以10
        patience=10,                   # 10轮内没有改善则触发
    )
]

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
model.fit(x, y,
          epochs=10,
          batch_size=32,
          callbacks=callbacks_list,
          validation_data=(x_val, y_val))# 要监控验证损失，记得传入验证数据validation_data

* **编写自己的回调函数**：创建`keras.callbacks.Callback`类的子类。实现下面这些方法：  
  on_epoch_begin  &nbsp;&nbsp;每轮开始时被调用  
  on_epoch_end   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;每轮结束时被调用
  
  on_batch_begin  &nbsp;&nbsp;处理每个批次前被调用  
  on_batch_end   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;处理每个批次后被调用  
  
  on_train_begin  &nbsp;&nbsp;训练开始时被调用  
  on_train_end   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;训练结束时被调用  
* **`logs`参数** ：是一个字典，里面包含前一个批量、前一个轮次、或前一个训练的信息。训练指标和验证指标等
* **`self.model`** ：调用回调函数的模型实例
* **`self.validation_data`** ：传入fit的验证数据

In [1]:
'编写自己回调函数实例：每轮结束后，将模型的每层激活保存到硬盘，激活是对验证集的第一个样本计算得到'
import keras
import numpy as np

class ActivationLogger(keras.callbacks.Callback):

    def set_model(self, model):
        self.model = model  # 模型实例
        layer_outputs = [layer.output for layer in model.layers]
        self.activations_model = keras.models.Model(model.input,layer_outputs) # 新的模型实例

    def on_epoch_end(self, epoch, logs=None):
        if self.validation_data is None:
            raise RuntimeError('Requires validation_data.')

        validation_sample = self.validation_data[0][0:1] # 验证集的第一个sample
        activations = self.activations_model.predict(validation_sample)
        f = open('activations_at_epoch_' + str(epoch) + '.npz', 'w')
        np.savez(f, activations)
        f.close()

'编写自己回调函数实例：每轮结束后，将模型的每层激活保存到硬盘，激活是对验证集的第一个样本计算得到'

Using TensorFlow backend.


## TensorBoard简介

`TensorBoard` 是一个内置于TensorFlow中的基于浏览器的可视化工具。  
**注意**:只有当keras使用tensorflow后端时，这一方法才能用于keras模型。

In [1]:
import keras               # IMDB 文本分类，使用TensorBoard可视化
import numpy as np
from keras import layers
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 2000
max_len = 500

np_load_old = np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
np.load = np_load_old

x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

model = keras.models.Sequential()
model.add(layers.Embedding(max_features, 128,input_length=max_len,name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embed (Embedding)            (None, 500, 128)          256000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 494, 32)           28704     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 92, 32)            7200      
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 291,937
Trainable params: 291,937
Non-trainable params: 

In [6]:
from keras.utils import plot_model # pydot  pydot-ng  graphviz 库
plot_model(model,show_shapes=True, to_file='model.png')

In [2]:
import tensorflow as tf

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.80
callbacks = [ 
    keras.callbacks.TensorBoard(
        log_dir=r'my_log_dir',  # TensorBoard 日志文件写入的位置
        histogram_freq=1,         # 每一轮之后记录激活直方图
        embeddings_freq=1,        # 每一轮之后记录嵌入数据 
        #embeddings_data = np.arange(0, max_len).reshape((1, max_len)),
        #embeddings_data = x_train[:100],
        #embeddings_data = x_train,
        #embeddings_data = np.arange(0, max_len).reshape((1, max_len)),
        embeddings_data = x_train[:1],
    )
]
history = model.fit(x_train, y_train,
                    epochs=3,
                    batch_size=128,
                    validation_split=0.2,
                    callbacks=callbacks)

Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Train on 20000 samples, validate on 5000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


# 让模型性能发挥到极致
让模型从“具有不错的性能”上升到“新能卓越且能够赢得机器学习竞赛”

## 高级架构模式

### 批标准化
* 三种重要的**设计模式**：残差连接；标准化；深度可分离卷积；
* **标准化**：数据输入模型之前做标准化。假设数据服从正态分布（高斯分布），确保分布中心为0，同时缩放到方差为1
* **批标准化**：即是在训练过程中，数据的均值和方差随时间变化，也可以适应性的将数据标准化，有助于梯度的传播。对于有些特别深的网络，只有包含多个`BatchNormalization`层时才能进行训练。例如keras中的许多高级CNN（ResNet50\Inception V3\Xception）
* **`BatchNormalization`层通常在卷积层或密集连接层之后使用**  
  conv_model.add(layers.Conv2D(32, 3, activation='relu'))  
  conv_model.add(layers.BatchNormalization())  

  dense_model.add(layers.Dense(32, activation='relu'))  
  dense_model.add(layers.BatchNormalization())		  
* **`BatchNormalization`层接受一个参数axis**，表示对那个特征轴做标准化，默认是-1，即输入张量的最后一个轴。  
  对于Dense，conv1D，RNN层，`data_famat`为`channels_last`的Conv2D层，默认值正确  
  （`data_famat`为`channels_last`的Conv2D层，axis应设为1）  
* ***批再标准化***：是对普通批标准化的最新改进
* ***自标准化神经网络***：使用***特殊的激活函数（selu）和特殊的初始化器（lecun_normal）***，让数据通过任何Dnese层之后保持数据的标准化。这种方案虽然有趣，但目前仅限于密集连接网络，其有效性尚未得到大规模重复。
### 深度可分离卷积
* 可以代替Conv2D，让模型更轻量，任务性能更高
* 深度可分离卷积：`SeparableConv2D`层；逐通道卷积+逐点卷积(**如下图：**)
* 将空间特征学习，通道特征学习分开。**如果空间位置高度相关，不同通道相互独立。那么这样做很有意义**
* 深度可分离卷积是Xception的基础
![](https://dpzbhybb2pdcj.cloudfront.net/chollet/Figures/07fig16_alt.jpg)

In [None]:
from keras.models import Sequential, Model # 构建一个情况的深度可分离卷积，用于图像多分类任务（softmax）
from keras import layers

height = 64
width = 64
channels = 3
num_classes = 10

model = Sequential()
model.add(layers.SeparableConv2D(32, 3,activation='relu',input_shape=(height, width, channels,)))
model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.GlobalAveragePooling2D())

model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(num_classes, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

## 超参数优化
* 整天调节参数的工作，最好留个机器去做。
* 制定一个原则，系统性的自动碳素可能的决策空间。
* 可供选择的技术：贝叶斯优化、遗传算法、简单随机搜索
* 挑战性：  
  1、**计算反馈信号的代价可能很高**，它需要在数据及上创建一个新模型并从头开始训练  
  2、**超参数空间由许多离散的决定组成，既不是连续的，也不是可微的。**通常不能再超参数空间使用梯度方法。  
* 方法：  
  1、通常情况下，**随机搜索**（随机选择需要评估的超参数，并重复这一过程）就是最好的方案，也是最简单的方案。  
  2、**`Hyperopt`**是一个用于超参数优化的Python库，其内部使用`Parzen`估计器的树来预测那组超参数可能会得到很好的效果。另一个叫做**`Hyperas`的库将`Hyperopt`与`Keras`模型结合在一起**。

## 小结

# 本章总结