人工神经网络的实现例子。这里介绍ANN 的两个应用：回归和分类。首先建立一个基本的回归模型；之后加入 early stopping 功能提高模型的性能；最后介绍用于分类的基本模型。

# **加载数据和软件包**

以下加载我们需要用到的软件包。作为例子的数据'fuel'已经加载到目录 '../input/dl-course-data/fuel.csv' 下面。

In [None]:
# Setup plotting
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('animation', html='html5')

# Setup feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.deep_learning_intro.ex6 import *

In [None]:
import pandas as pd
import numpy as np

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import callbacks

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.model_selection import GroupShuffleSplit

# **ANN 用于回归 (Regression)**

加载 fuel 的数据作为例子，首先加载数据，划分特征值和目标值。这里数据的列 'FE' 作为目标值。同时对数据进行标准的处理：scaling, encoding。

In [None]:
fuel = pd.read_csv('../input/dl-course-data/fuel.csv')

X = fuel.copy()
# Remove target
y = X.pop('FE')

preprocessor = make_column_transformer(
    (StandardScaler(),
     make_column_selector(dtype_include=np.number)),
    (OneHotEncoder(sparse=False),
     make_column_selector(dtype_include=object)),
)

X = preprocessor.fit_transform(X)
y = np.log(y) # log transform target instead of standardizing

input_shape = [X.shape[1]]
print("Input shape: {}".format(input_shape))

一个基本的 ANN 模型建立分为几个步骤：
- 初始化模型
- 追加模型的层 （layer)
- 编译模型
- 用数据对模型进行训练
- 模型对新数据预测

以下对模型进行初始化，并建立三个隐藏层 (hidden layer)，最后建立输出层。

三个隐藏层的神经元数目分别为 128， 128， 64。最后输出的是实数，因此输出层的神经元数目为 1。

In [None]:
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=input_shape),
    layers.Dense(128, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])

之后对模型编译，即，定义优化方法 optimizer 和损失函数 loss。

In [None]:
model.compile(
    optimizer='adam',
    loss='mae'
)

模型建立以后，用数据对木星进行训练。训练的结果是一个基于数据和 loss function 的最优的模型。这里我们定义 epoch=200, bach_size = 218。

In [None]:
history = model.fit(
    X, y,
    batch_size=128,
    epochs=200
)

最后检查模型的性能，画出 loss funcion 的曲线。

In [None]:
history_df = pd.DataFrame(history.history)

# Start the plot at epoch 5. You can change this to get a different view.
history_df.loc[5:, ['loss']].plot();

# **改进的回归模型**

以上介绍了一个基本的 ANN 回归模型。这里介绍两种基本的模型改进方法：
- train/validation 划分
- early stopping callback

以下的例子加载数据 'spottify' 作为示范。

In [None]:
spotify = pd.read_csv('../input/dl-course-data/spotify.csv')

X = spotify.copy().dropna()
y = X.pop('track_popularity')
artists = X['track_artist']

features_num = ['danceability', 'energy', 'key', 'loudness', 'mode',
                'speechiness', 'acousticness', 'instrumentalness',
                'liveness', 'valence', 'tempo', 'duration_ms']
features_cat = ['playlist_genre']

preprocessor = make_column_transformer(
    (StandardScaler(), features_num),
    (OneHotEncoder(), features_cat),
)

经过标准的处理以后，把数据划分为 train(75%) 和 validation(25%)。

In [None]:
# We'll do a "grouped" split to keep all of an artist's songs in one
# split or the other. This is to help prevent signal leakage.
def group_split(X, y, group, train_size=0.75):
    splitter = GroupShuffleSplit(train_size=train_size)
    train, test = next(splitter.split(X, y, groups=group))
    return (X.iloc[train], X.iloc[test], y.iloc[train], y.iloc[test])

X_train, X_valid, y_train, y_valid = group_split(X, y, artists)

X_train = preprocessor.fit_transform(X_train)
X_valid = preprocessor.transform(X_valid)
y_train = y_train / 100 # popularity is on a scale 0-100, so this rescales to 0-1.
y_valid = y_valid / 100

input_shape = [X_train.shape[1]]
print("Input shape: {}".format(input_shape))

在建立模型以前首先定义 early stopping。以下的定义中有两个主要的参数：
- patience: 定义了一个epoch的数目，当经过连续的 epoch 之后模型没有改进，模型训练停止
- min_delta: 是一个阈值，当 loss 的绝对变化小于这个阈值，我们认为模型没有改进，从而停止训练

In [None]:
# Define early stopping callack
early_stopping = callbacks.EarlyStopping(
    patience=5,
    min_delta=0.001,
    restore_best_weights=True,
)

定义了 early stopping 之后，开始建立模型：初始化一个三层的模型；编译模型；用数据训练模型。

在训练模型的步骤中，加入了之前定义的 early_stopping 函数。

In [None]:
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
])
model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=50,
    callbacks=[early_stopping]
)

最后观测模型的性能。

In [None]:
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
print("Minimum Validation Loss: {:0.4f}".format(history_df['val_loss'].min()));

# **ANN 用于分类 （classification)**

首先加载数据。这里，以 'hotel' 的数据为例进行演示。

In [None]:
hotel = pd.read_csv('../input/dl-course-data/hotel.csv')

X = hotel.copy()
y = X.pop('is_canceled')

X['arrival_date_month'] = \
    X['arrival_date_month'].map(
        {'January':1, 'February': 2, 'March':3,
         'April':4, 'May':5, 'June':6, 'July':7,
         'August':8, 'September':9, 'October':10,
         'November':11, 'December':12}
    )

features_num = [
    "lead_time", "arrival_date_week_number",
    "arrival_date_day_of_month", "stays_in_weekend_nights",
    "stays_in_week_nights", "adults", "children", "babies",
    "is_repeated_guest", "previous_cancellations",
    "previous_bookings_not_canceled", "required_car_parking_spaces",
    "total_of_special_requests", "adr",
]
features_cat = [
    "hotel", "arrival_date_month", "meal",
    "market_segment", "distribution_channel",
    "reserved_room_type", "deposit_type", "customer_type",
]

transformer_num = make_pipeline(
    SimpleImputer(strategy="constant"), # there are a few missing values
    StandardScaler(),
)
transformer_cat = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="NA"),
    OneHotEncoder(handle_unknown='ignore'),
)

preprocessor = make_column_transformer(
    (transformer_num, features_num),
    (transformer_cat, features_cat),
)

# stratify - make sure classes are evenlly represented across splits
X_train, X_valid, y_train, y_valid = \
    train_test_split(X, y, stratify=y, train_size=0.75)

X_train = preprocessor.fit_transform(X_train)
X_valid = preprocessor.transform(X_valid)

input_shape = [X_train.shape[1]]

分类和回归问题的一个主要不同点是，在分类中，普通的模型性能度量不能用做 loss function, 因为在模型优化的算法中，SGD 要求 loss function 是光滑函数(便于求导）。因此，cross-entropy 函数用来作为分类问题的 loss function。

同时，我们用 sigmoid activation 将输出的实数值转化为离散的数值 (即分类问题的标签)。

以下初始化模型，并加入 hidden layers。不同于之前的模型，我们加入了两个函数：
- batchnormalization: 一种将每一层的输入特征值进行标准化的技术，其结果是训练的结果更为稳定，并更快地收敛到最优结果。
- dropout: 在层之间的数据传输中，摒弃部分数据，增加模型的随机性，避免过度拟合。

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# YOUR CODE HERE: define the model given in the diagram
model = keras.Sequential([
    layers.BatchNormalization(input_shape=input_shape),
    layers.Dense(256, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    layers.Dense(256, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    layers.Dense(1, activation='sigmoid'),
])

编译模型，注意这里的 loss function 类型是 binary_crossentrpy。

In [None]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

如前定义 early stopping，并对模型进行训练。

In [None]:
early_stopping = keras.callbacks.EarlyStopping(
    patience=5,
    min_delta=0.001,
    restore_best_weights=True,
)
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=200,
    callbacks=[early_stopping],
)

最后观测模型的性能。

In [None]:
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot(title="Cross-entropy")
history_df.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot(title="Accuracy")