**卷積神經網路（CNN）**

卷積神經網路 （Convolutional Neural Network, CNN）是一種結構類似於人類或動物的 視覺系統 的神經網路，包含一個或多個卷積層（Convolutional Layer）、池化層（Pooling Layer）和全連接層（Fully-connected Layer）。

**使用 Keras 實現卷積神經網路**

卷積神經網路的一個範例實現如下所示，和 上節中的多層感知器 在程式碼結構上很類似，只是新加入了一些卷積層和池化層。這裡的網路結構並不是唯一的，可以增加、刪除或調整 CNN 的網路結構和參數，以達到更好的性能。

In [1]:
import tensorflow as tf
import numpy as np

class CNN(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.conv1 = tf.keras.layers.Conv2D(
            filters=32,             # 卷積層神經元（卷積核）數目
            kernel_size=[5, 5],     # 接受區的大小
            padding='same',         # padding策略（vaild 或 same）
            activation=tf.nn.relu   # 激活函数
        )
        self.pool1 = tf.keras.layers.MaxPool2D(pool_size=[2, 2], strides=2)
        self.conv2 = tf.keras.layers.Conv2D(
            filters=64,
            kernel_size=[5, 5],
            padding='same',
            activation=tf.nn.relu
        )
        self.pool2 = tf.keras.layers.MaxPool2D(pool_size=[2, 2], strides=2)
        self.flatten = tf.keras.layers.Reshape(target_shape=(7 * 7 * 64,))
        self.dense1 = tf.keras.layers.Dense(units=1024, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(units=10)

    def call(self, inputs):
        x = self.conv1(inputs)                  # [batch_size, 28, 28, 32]
        x = self.pool1(x)                       # [batch_size, 14, 14, 32]
        x = self.conv2(x)                       # [batch_size, 14, 14, 64]
        x = self.pool2(x)                       # [batch_size, 7, 7, 64]
        x = self.flatten(x)                     # [batch_size, 7 * 7 * 64]
        x = self.dense1(x)                      # [batch_size, 1024]
        x = self.dense2(x)                      # [batch_size, 10]
        output = tf.nn.softmax(x)
        return output

將前節的 model = MLP() 更換成 model = CNN() ，輸出如下:

In [2]:
class MNISTLoader():
    def __init__(self):
        mnist = tf.keras.datasets.mnist
        (self.train_data, self.train_label), (self.test_data, self.test_label) = mnist.load_data()
        # MNIST中的圖片預設為uint8（0-255的數字）。以下程式碼將其正規化到0-1之間的浮點數，並在最後增加一維作為顏色通道
        self.train_data = np.expand_dims(self.train_data.astype(np.float32) / 255.0, axis=-1)      # [60000, 28, 28, 1]
        self.test_data = np.expand_dims(self.test_data.astype(np.float32) / 255.0, axis=-1)        # [10000, 28, 28, 1]
        self.train_label = self.train_label.astype(np.int32)    # [60000]
        self.test_label = self.test_label.astype(np.int32)      # [10000]
        self.num_train_data, self.num_test_data = self.train_data.shape[0], self.test_data.shape[0]

    def get_batch(self, batch_size):
        # 從資料集中隨機取出batch_size個元素並返回
        index = np.random.randint(0, self.num_train_data, batch_size)
        return self.train_data[index, :], self.train_label[index]

num_epochs = 5
batch_size = 50
learning_rate = 0.001

model = CNN()
data_loader = MNISTLoader()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

num_batches = int(data_loader.num_train_data // batch_size * num_epochs)

for batch_index in range(num_batches):
    X, y = data_loader.get_batch(batch_size)
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)
        loss = tf.reduce_mean(loss)
        print("batch %d: loss %f" % (batch_index, loss.numpy()))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))

sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
num_batches = int(data_loader.num_test_data // batch_size)

for batch_index in range(num_batches):
    start_index, end_index = batch_index * batch_size, (batch_index + 1) * batch_size
    y_pred = model.predict(data_loader.test_data[start_index: end_index])
    sparse_categorical_accuracy.update_state(y_true=data_loader.test_label[start_index: end_index], y_pred=y_pred)
print("test accuracy: %f" % sparse_categorical_accuracy.result())

[1;30;43m串流輸出內容已截斷至最後 5000 行。[0m
batch 1002: loss 0.021849
batch 1003: loss 0.034118
batch 1004: loss 0.006212
batch 1005: loss 0.036586
batch 1006: loss 0.028067
batch 1007: loss 0.008129
batch 1008: loss 0.174186
batch 1009: loss 0.023669
batch 1010: loss 0.006397
batch 1011: loss 0.026877
batch 1012: loss 0.111043
batch 1013: loss 0.015150
batch 1014: loss 0.006553
batch 1015: loss 0.044868
batch 1016: loss 0.020607
batch 1017: loss 0.008848
batch 1018: loss 0.042595
batch 1019: loss 0.009741
batch 1020: loss 0.082978
batch 1021: loss 0.001592
batch 1022: loss 0.016811
batch 1023: loss 0.007867
batch 1024: loss 0.034346
batch 1025: loss 0.002871
batch 1026: loss 0.028608
batch 1027: loss 0.002519
batch 1028: loss 0.002293
batch 1029: loss 0.008516
batch 1030: loss 0.002301
batch 1031: loss 0.045885
batch 1032: loss 0.056740
batch 1033: loss 0.007229
batch 1034: loss 0.043727
batch 1035: loss 0.012239
batch 1036: loss 0.150810
batch 1037: loss 0.003233
batch 1038: loss 0.005994
bat

輸出如下:

test accuracy: 0.988100
可以發現準確率相較於前節的多層感知器有非常顯著的提高。事實上，通過改變模型的網路結構（比如加入 Dropout 層防止過擬合），準確率還有進一步提升的空間。

以下展示一個例子，使用 MobileNetV2 網路在 tf_flowers 五種分類數據集上進行訓練（為了程式碼的簡短高效，在該範例中我們使用了 TensorFlow Datasets 和 tf.data 載入和預處理資料）。通過將 weights 設置為 None ，我們隨機初始化變數而不使用預訓練權重值。同時將 classes 設置為 5，對應於 5 種分類的資料集。

In [3]:
import tensorflow as tf
import tensorflow_datasets as tfds

num_epoch = 5
batch_size = 50
learning_rate = 0.001

dataset = tfds.load("tf_flowers", split=tfds.Split.TRAIN, as_supervised=True)
dataset = dataset.map(lambda img, label: (tf.image.resize(img, (224, 224)) / 255.0, label)).shuffle(1024).batch(batch_size)
model = tf.keras.applications.MobileNetV2(weights=None, classes=5)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

for e in range(num_epoch):
    for images, labels in dataset:
        with tf.GradientTape() as tape:
            labels_pred = model(images, training=True)
            loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=labels, y_pred=labels_pred)
            loss = tf.reduce_mean(loss)
            print("loss %f" % loss.numpy())
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_variables))
    print(labels_pred)

local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead set
data_dir=gs://tfds-data/datasets.



[1mDownloading and preparing dataset tf_flowers/3.0.0 (download: 218.21 MiB, generated: Unknown size, total: 218.21 MiB) to /root/tensorflow_datasets/tf_flowers/3.0.0...[0m


HBox(children=(FloatProgress(value=0.0, description='Dl Completed...', max=5.0, style=ProgressStyle(descriptio…



[1mDataset tf_flowers downloaded and prepared to /root/tensorflow_datasets/tf_flowers/3.0.0. Subsequent calls will reuse this data.[0m
loss 1.683380
loss 1.953311
loss 1.662202
loss 1.563612
loss 1.412023
loss 1.898868
loss 1.920251
loss 1.966073
loss 1.556153
loss 1.517604
loss 1.447944
loss 1.783732
loss 1.594101
loss 1.787952
loss 2.038664
loss 1.755512
loss 1.675319
loss 1.868346
loss 1.732924
loss 1.533585
loss 1.368506
loss 1.352351
loss 2.132801
loss 1.783697
loss 1.654644
loss 1.597305
loss 1.586108
loss 1.381024
loss 1.767676
loss 1.400218
loss 1.489383
loss 1.214276
loss 1.544339
loss 1.503307
loss 1.557021
loss 1.464893
loss 1.333449
loss 1.291500
loss 1.398383
loss 1.456021
loss 1.277002
loss 1.394924
loss 1.600682
loss 1.266449
loss 1.386210
loss 1.407745
loss 1.216751
loss 1.222620
loss 1.306808
loss 1.397368
loss 1.549113
loss 1.194493
loss 1.599142
loss 1.424698
loss 1.242841
loss 1.448552
loss 1.333461
loss 1.232947
loss 1.342741
loss 1.242627
loss 1.245229
loss 1.

下面，我們使用 TensorFlow 來驗證一下上圖的計算結果。

將上圖中的輸入圖片、權重值矩陣 W 和偏移項 b 表示為 NumPy 陣列 image , W , b 如下：

In [4]:
# TensorFlow 的圖片表示為 [圖片數目，長，寬，色彩通道數] 的四維張量
# 這裡我們的輸入圖片 image 的張量形狀為 [1, 7, 7, 1]
image = np.array([[
    [0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 1, 2, 1, 0],
    [0, 0, 2, 2, 0, 1, 0],
    [0, 1, 1, 0, 2, 1, 0],
    [0, 0, 2, 1, 1, 0, 0],
    [0, 2, 1, 1, 2, 0, 0],
    [0, 0, 0, 0, 0, 0, 0]
]], dtype=np.float32)
image = np.expand_dims(image, axis=-1)  
W = np.array([[
    [ 0, 0, -1], 
    [ 0, 1, 0 ], 
    [-2, 0, 2 ]
]], dtype=np.float32)
b = np.array([1], dtype=np.float32)

然後建立一個僅有一個卷積層的模型，用 W 和 b 初始化 4 ：

In [5]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=1,              # 卷積層神經元（卷積核）數目
        kernel_size=[3, 3],     # 接受區大小
        kernel_initializer=tf.constant_initializer(W),
        bias_initializer=tf.constant_initializer(b)
    )]
)

最後將圖片資料 image 輸入模型，列印輸出：

In [6]:
output = model(image)
print(tf.squeeze(output))

tf.Tensor(
[[ 6.  5. -2.  1.  2.]
 [ 3.  0.  3.  2. -2.]
 [ 4.  2. -1.  0.  0.]
 [ 2.  1.  2. -1. -3.]
 [ 1.  1.  1.  3.  1.]], shape=(5, 5), dtype=float32)


程式運行結果為：

tf.Tensor(
[[ 6.  5. -2.  1.  2.]
 [ 3.  0.  3.  2. -2.]
 [ 4.  2. -1.  0.  0.]
 [ 2.  1.  2. -1. -3.]
 [ 1.  1.  1.  3.  1.]], shape=(5, 5), dtype=float32)