依照[政大 Python AI 深度學習達人的第一堂課](http://moocs.nccu.edu.tw/media/23095)課程進行

In [1]:
%env KERAS_BACKEND=tensorflow
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

env: KERAS_BACKEND=tensorflow


In [2]:
# keras function
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

# keras dataset
from keras.datasets import mnist

# keras utilis function (one hot encoding)
from keras.utils import np_utils

Using TensorFlow backend.


## 讀取 MNIST 資料

In [3]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)
print(x_test.shape)

(60000, 28, 28)
(10000, 28, 28)


## 輸入格式整理
原本每筆數據是 28 * 28 的矩陣，但標準神經網路只吃平的，每次要 28 * 28 = 784 長的像量，因此 reshape

In [4]:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

取訓練和測試資料的數字為 0, 1 的資料

In [5]:
x_train_01 = x_train[y_train <= 1]
x_test_01 = x_test[y_test <= 1]

將 label 轉換為 one-hot encoding

In [6]:
y_train_10 = np_utils.to_categorical(y_train, 10)
y_test_10 = np_utils.to_categorical(y_test, 10)

y_train_01 = y_train[y_train <= 1]
y_train_01 = np_utils.to_categorical(y_train_01, 2)

y_test_01 = y_test[y_test <= 1]
y_test_01 = np_utils.to_categorical(y_test_01, 2)

適時確認資料的大小確保資料的一致性

In [7]:
x_train_01.shape, x_test_01.shape

((12665, 784), (2115, 784))

In [8]:
y_train_01.shape, y_test_01.shape

((12665, 2), (2115, 2))

## 回顧 Sequential API
以下列方式建立一個具有下列的設定
* 使用 2 個 hidden layers
* 每個 hidden layer 用 500 個神經元
* Activation Function 唯一指名 sigmoid

的神經網路，建立指令是透過建立 `Sequential()` 和 `.add` 的方式逐層建立，如下：

In [9]:
# Construct a sandbox to put layers inside
model = Sequential()

# Put fully-connected layers (Dense) inside
model.add(Dense(500, input_dim=784))
model.add(Activation('sigmoid'))
model.add(Dense(500))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_1 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_2 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5010      
_________________________________________________________________
activation_3 (Activation)    (None, 10)                0         
Total params: 648,010
Trainable params:

## 觀察 model.layers
觀察 `model.layers`，可以發現 `model` 其實就是一堆神經網路層疊起來

In [10]:
model.layers

[<keras.layers.core.Dense at 0x64590ce90>,
 <keras.layers.core.Activation at 0x11750e310>,
 <keras.layers.core.Dense at 0x10896aed0>,
 <keras.layers.core.Activation at 0x645909d90>,
 <keras.layers.core.Dense at 0x645909590>,
 <keras.layers.core.Activation at 0x64596ab90>]

換言之，每個 `.add` 其實在做的事情就是

`model.add(Dense(500, input_dim=784))` 是將 `<keras.layers.core.Dense at 0x647672fd0>` 加進 model.layers

`model.add(Activation('sigmoid'))` 將 `<keras.layers.core.Activation at 0x6422a1c90>` 加進 model.layers

... 以此類推

## 以 list 的形式使用 Sequential API
神經網路是將隱藏曾逐層堆疊在一起的 list，因此也可以 list 的形式建立相同的神經網路
首先將兩個隱藏層及其 Activation Function 分別寫在 list 中，如下：

In [11]:
first_layer = [Dense(500, input_dim=784),
              Activation('sigmoid')]

second_layer = [Dense(500),
               Activation('sigmoid')]

output_layer = [Dense(10),
               Activation('softmax')]

從基本的 Python 資料結構中，我們知道 list 可以用 `+` 來進行合併，所以先看三個 list 合併後的樣子

In [12]:
first_layer + second_layer + output_layer

[<keras.layers.core.Dense at 0x64597cbd0>,
 <keras.layers.core.Activation at 0x64597c710>,
 <keras.layers.core.Dense at 0x64597cc90>,
 <keras.layers.core.Activation at 0x645978290>,
 <keras.layers.core.Dense at 0x64597ca10>,
 <keras.layers.core.Activation at 0x64596a210>]

合併起來的 list 看起來和某個 `model.layers` 一樣，接著要做的就是讓這個 list 真的變成某個神經網路的 `.layers`

將寫成 list 的隱藏層 `+` 寫來送進 `Sequential` 中即可

In [13]:
model = Sequential(first_layer + second_layer + output_layer)
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_4 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_5 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 10)                5010      
_________________________________________________________________
activation_6 (Activation)    (None, 10)                0         
Total params: 648,010
Trainable params: 648,010
Non-trainable params: 0
________________________________________________

## 手上有 MNIST 手寫辨識模型，今天想建立可辨識 0 或 1 的模型，除了最後一層，想沿用前兩層的網路設定集結構該怎麼做？

首先準備一個上面一樣的神經網路手寫辨識模型，除了最後一層之外都被包在一起

In [14]:
all_except_last = [Dense(500, input_dim=784),
                  Activation('sigmoid'),
                  Dense(500),
                  Activation('sigmoid')]

output_layer = [Dense(10),
               Activation('softmax')]

model_0_to_9 = Sequential(all_except_last + output_layer)
model_0_to_9.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_7 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_8 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 10)                5010      
_________________________________________________________________
activation_9 (Activation)    (None, 10)                0         
Total params: 648,010
Trainable params: 648,010
Non-trainable params: 0
________________________________________________

建立完後，讀取第一週已經訓練好的神經網路權重

In [16]:
model_0_to_9.load_weights('handwriteing_model_weights.h5')

由於沒有要真的使用手寫辨識模型，所以無需 compile, fit 或 evaluate

接著定義新的 output layer

In [18]:
new_output_layer = [Dense(2),
                   Activation('softmax')]

model_0_to_1 = Sequential(all_except_last + new_output_layer)
model_0_to_1.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_7 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_8 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 2)                 1002      
_________________________________________________________________
activation_11 (Activation)   (None, 2)                 0         
Total params: 644,002
Trainable params: 644,002
Non-trainable params: 0
________________________________________________

注意如果僅沿用而不訓練到前兩層，可以透過下面的方式將借來的神經網路冷凍起來

In [19]:
for layer in all_except_last:
    layer.trainable = False

冷凍後神經網路的 summary 會有些變化

In [20]:
model_0_to_1.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_7 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_8 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 2)                 1002      
_________________________________________________________________
activation_11 (Activation)   (None, 2)                 0         
Total params: 644,002
Trainable params: 1,002
Non-trainable params: 643,000
____________________________________________

接著訓練這個部分架構及權重跟別人借用的 0 或 1 手寫辨識模型

In [21]:
model_0_to_1.compile(loss='mse', optimizer=SGD(lr=0.1), metrics=['accuracy'])

## 訓練第一個轉移學習學到的神經網路

完成第一個 transfer learning 的神經網路，這裡還有兩件事要決定
* 一次要訓練幾筆資料 (`batch_size`)，以 100 筆調一次參數
* 這 12665 筆資料一共要訓練幾次 (`epochs`)，訓練 5 次試試 (因為只剩 0 或 1 的資料，訓練太多易 over-fitting)

於是要有比第一週快上 100 倍的效果 (因為訓練資料只剩 1/5，且可訓練權重數量從 64 萬變 1 千)

In [22]:
x_train_01.shape, y_train_01.shape

((12665, 784), (12665, 2))

In [23]:
model_0_to_1.fit(x_train_01, y_train_01, batch_size=100, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x1a4ecee8d0>

In [24]:
score = model_0_to_1.evaluate(x_test_01, y_test_01)



In [25]:
print('測試資料的 loss：', score[0])
print('測試資料正確率：', score[1])

測試資料的 loss： 0.0014023022643063685
測試資料正確率： 0.9990543723106384


## 完成第一個透過轉移學習得到的神經網路模型

轉移學習的模型差不多都是這樣建立的，實際上，Keras 亦提供許多被證實有良好表現且訓練好 (pre-trained) 的模型，如：

* Xception
* VGG16
* VGG19
* ResNet50
* InceptionV3
* InceptionResNetV2
* MobileNet
* DenseNet
* NASNet

詳細使用方式可參考 Keras [Documentation](https://keras.io/applications/)
但使用這些模型進行轉移訓練，**可能**需要其他更彈性的神經網路寫法