本文主要介绍了模型训练中的各方面内容，包括模型构建、loss、优化器、metrics、正则化、学习率、激活函数、epochs、参数初始化、超参数搜索等。

In [1]:
import tensorflow as tf
from tensorflow import keras
import sklearn
import numpy as np
import pandas
import matplotlib as mpl
print(tf.__version__)

2.4.1



## 5、超参数搜索

神经网络的灵活性也是它们的主要缺点之一：有许多需要调整的超参数。你不仅可以使用任何可以想象的网络结构，而且即使在简单的MLP中，你也可以更改层数、每层神经元数、每层要使用的激活函数的类型、权重初始化逻辑，以及更多。

一种选择是简单地尝试超参数的许多组合，然后查看哪种对验证集最有效（或使用K折交叉验证）。例如我们可以像第2章中一样使用GridSearchCV或RandomizedSearchCV来探索超参数空间。为此我们需要将Keras模型包装在模仿常规ScikitLearn回归器的对象中。

下面我们详细介绍在tensorflow中进行超参数搜索的方式。


我们先构建一个基本模型用于之后的超参数搜索。本次我们使用housing数据集。


In [34]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
x_train = scaler.fit_transform(X_train)
x_valid = scaler.transform(X_valid)
x_test = scaler.transform(X_test)




超参数搜索，一种选择是简单地尝试超参数的许多组合，然后查看哪种对验证集最有效（或使用K折交叉验证）。例如我们可以使用GridSearchCV或RandomizedSearchCV来探索超参数空间。为此我们需要将Keras模型包装在模仿常规ScikitLearn回归器的对象中。第一步是创建一个函数，该函数将在给定一组超参数的情况下构建并编译Keras模型：

In [35]:
def build_model(n_hidden=1, n_neurons=10, learning_rate=0.0001, input_shape=[8]):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for _ in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation='relu'))
    model.add(keras.layers.Dense(1))
    optimizer = keras.optimizers.SGD(lr=learning_rate)
    model.compile(loss='mse', optimizer=optimizer, metrics='mse')
    return model



我们简单看一下只运行一次模型的情况。

指定任何超参数，因此它将使用我们在build_model（）中定义的默认超参数。现在，我们可以像常规ScikitLearn回归器一样使用该对象：我们可以使用其fit（）方法进行训练，然后使用其score（）方法进行评估，然后使用predict()方法预测。

传递给fit（）方法的任何其他参数都将传递给内部的Keras模型。还要注意，该分数将与MSE相反，因为ScikitLearn希望获得分数，而不是损失（即分数越高越好）。

In [39]:

keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)
keras_reg.fit(x_train, y_train, epochs=5,
             validation_data=(x_valid, y_valid),
             callbacks=[keras.callbacks.EarlyStopping(patience=5)])
mse_test = keras_reg.score(x_test, y_test)
y_pred = keras_reg.predict(x_test[:5])
print(mse_test)
print(y_pred)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
-2.56375789642334
[1.3969504  0.6476435  0.48683828 1.5239983  0.50394875]


下面我们开始超参数搜索。

我们不想训练和评估这样的单个模型，尽管我们想训练数百个变体，并查看哪种变体在验证集上表现最佳。由于存在许多超参数，因此最好使用随机搜索而不是网格搜索。让我们尝试探索隐藏层的数量、神经元的数量和学习率：


In [51]:
from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],
    "n_neurons": np.arange(10, 30),
    "learning_rate": reciprocal(3e-4, 3e-3),
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(x_train, x_train, epochs=5,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=3)])

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28 ...


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
[CV]  learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28, total=   5.6s
[CV] learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28 ...
Epoch 1/5


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    5.7s remaining:    0.0s


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28, total=   3.9s
[CV] learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
[CV]  learning_rate=0.0012225396848063632, n_hidden=3, n_neurons=28, total=   3.4s
[CV] learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10, total=   5.7s
[CV] learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10, total=   3.3s
[CV] learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0006409560173237001, n_hidden=3, n_neurons=10, total=   3.3s
[CV] learning_rate=0.001601132636860792, n_hidden=1, n_neurons=19 ....


[CV]  learning_rate=0.0007636664106224726, n_hidden=2, n_neurons=10, total=   3.0s
[CV] learning_rate=0.0007636664106224726, n_hidden=2, n_neurons=10 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0007636664106224726, n_hidden=2, n_neurons=10, total=   3.5s
[CV] learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19 ..
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19, total=   3.6s
[CV] learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19 ..
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19, total=   3.3s
[CV] learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19 ..
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.00042383295739553174, n_hidden=1, n_neurons=19, total=   3.0s
[CV] learning_rate=0.0007377729539282017, n_hidden=0, n_neurons=27 ...
Epoch 1/5
Epoch 2/5
Epoch 3

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[CV]  learning_rate=0.0004812827529905966, n_hidden=2, n_neurons=13, total=   3.1s


[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:  1.7min finished


RuntimeError: Cannot clone object <tensorflow.python.keras.wrappers.scikit_learn.KerasRegressor object at 0x7ff04fe68070>, as the constructor either does not set or modifies parameter learning_rate

搜索可能持续数小时，具体时间取决于硬件、数据集的大小、模型的复杂性以及n_iter和cv的值。当结束时，你可以访问找到的最佳参数、最佳分数和经过训练的Keras模型：

In [50]:
print(rnd_search_cv.best_params_)
print(rnd_search_cv.best_score_)
model = rnd_search_cv.best_estimator_.model

{'learning_rate': 0.011135720468105712, 'n_hidden': 2, 'n_neurons': 93}
-0.8941203554471334


AttributeError: 'RandomizedSearchCV' object has no attribute 'best_estimator_'

完整代码如下：

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full, random_state=42)

scaler = StandardScaler()
x_train = scaler.fit_transform(X_train)
x_valid = scaler.transform(X_valid)
x_test = scaler.transform(X_test) 

def build_model(n_hidden=1, n_neurons=10, learning_rate=0.0001, input_shape=[8]):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for _ in range(n_hidden):
        model.add(keras.layers.Dense(n_neurons, activation='relu'))
    model.add(keras.layers.Dense(1))
    optimizer = keras.optimizers.SGD(lr=learning_rate)
    model.compile(loss='mse', optimizer=optimizer, metrics='mse')
    return model

from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    "n_hidden": [0, 1, 2, 3],
    "n_neurons": np.arange(10, 30),
    "learning_rate": reciprocal(3e-4, 3e-3),
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3, verbose=2)
rnd_search_cv.fit(x_train, x_train, epochs=5,
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=3)])