#  GridSearchCV
## 說明
模型參數很難得可以一次就取得最佳狀況，不少情況是需要做調參，如果一次一個參數跑完睡覺醒來再調一個，那中間或許就會浪費掉幾個小時的空檔了，這對急於取的成果的團隊來說是非常大的損失，Keras提供了與sklearn的接口，讓我們可以方便的透過sklearn的GridsearchCV來取得最佳參數，想當然爾，計算成本非常昂貴就是了。  

實作上會以[keras_dnn_Mnist](https://github.com/shaoeChen/deeplearning/blob/master/keras/keras_dnn_Mnist.ipynb)為範例來修正，加入GridsearchCV。

註：為求版面簡潔，部份kears_dnn_Mnist說明會刪除

## 載入需求套件
* keras與sklearn的接口需載入`keras.wrappers.scikit_learn`內的`KerasClassifier`或`KerasRegression`
    * 依實際需求載入分類或迴歸
* sklearn需載入`sklearn.model_selection`的`GridSearchCV`

In [1]:
import numpy as np
np.random.seed(10)
import pandas as pd

from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import GridSearchCV

import matplotlib.pyplot as plt
%matplotlib inline

Using TensorFlow backend.


此資料集為手寫辨識，若目錄底下沒有資料會重新下載，需要多點時間  
下載之後檔案置於user\\.keras\\datasets

## 資料預處理

In [2]:
#  keras自帶資料集
from keras.datasets import mnist
mnist_data = mnist.load_data()

載入資料集後的第一件事是觀察你的資料集，舉凡訓練集、測試集樣本數，以及資料維度  
我們可以發現，資料集是二值化之後的照片，所以維度為(28,28)

In [3]:
#  index[0]為訓練資料集，index[1]為測試驗證資料集
#  做資料賦值
X_train_original, y_train_original = mnist_data[0]
X_test_original, y_test_original = mnist_data[1]

In [4]:
print('train example:', X_train_original.shape[0])
print('train_data_shape:', X_train_original.shape)
print('train_label_shape:', y_train_original.shape)
print('test_example:', X_test_original.shape[0])
print('test_data_shape:', X_test_original.shape)
print('test_label_shape:', y_test_original.shape)

train example: 60000
train_data_shape: (60000, 28, 28)
train_label_shape: (60000,)
test_example: 10000
test_data_shape: (10000, 28, 28)
test_label_shape: (10000,)


目前我們的資料集為(m, pixel_x, pixel_y)，也就是(60000, 28, 28)，我們必需將照片pixel向量化(或稱flatten)，變成28\*28=784，意指特徵n=784，m=60000。

In [17]:
#  參數-1所指為剩下的，即28*28=784
X_train_original.reshape(X_train_original.shape[0], -1).shape

(60000, 784)

In [18]:
X_train_flatten = X_train_original.reshape(X_train_original.shape[0], -1)
X_test_flatten = X_test_original.reshape(X_test_original.shape[0], -1)

資料的處理通常都會做標準化來收縮資料分佈，在照片上最常見的處理方式就是除255

In [19]:
X_train = X_train_flatten / 255
X_test = X_test_flatten / 255
X_train_non = X_train_flatten
X_test_non = X_test_flatten

In [20]:
#  類別轉one-hot encoder
y_train = np_utils.to_categorical(y_train_original, num_classes=10)
y_test =  np_utils.to_categorical(y_test_original, num_classes=10)

在調整之後記得確認資料維度是否正確，並且檢查label是否轉置正常

In [21]:
print('feature numbers:', X_train.shape[0])
print('train example:', X_train.shape[1])
print('train_data_shape:', X_train.shape)
print('train_label_shape:', y_train.shape)
print('test_example:', X_test.shape[1])
print('test_data_shape:', X_test.shape)
print('test_label_shape:', y_test.shape)
print('y_test:',y_test[1])
print('y_test_original:', y_test_original[1])

feature numbers: 60000
train example: 784
train_data_shape: (60000, 784)
train_label_shape: (60000, 10)
test_example: 784
test_data_shape: (10000, 784)
test_label_shape: (10000, 10)
y_test: [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
y_test_original: 2


目前為止，我們的資料預處理已經完成，特徵數(n)=784，訓練樣本數=60000，測試樣本數=10000，預測類別=10(0-9)

## 實作模型

我們將model以function方式來實作，並將預計調參的部份透過function參數來傳遞，以最佳化函數為例，我們希望比較`adam`與`sgd`的差別，那就以function的參數來傳值。

In [12]:
def new_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(units=392  #  輸出數量
                    , activation='relu'  #  啟動函數
                    , kernel_initializer='he_normal'  #  權重初始化方式
                    , kernel_regularizer='l2'  #  正規化方式
                    , input_shape=(784,)))  #  輸入維度，僅l=1層需要設置    
    model.add(Dense(units=196  #  輸出數量，0-9
                    , activation='relu'  #  啟動函數
                    , kernel_initializer='he_normal'  #  權重初始化方式
                    , kernel_regularizer='l2'))  #  正規化方式
    model.add(Dense(units=98  #  輸出數量，0-9
                    , activation='relu'  #  啟動函數
                    , kernel_initializer='he_normal'  #  權重初始化方式
                    , kernel_regularizer='l2'))  #  正規化方式
    model.add(Dense(units=49  #  輸出數量，0-9
                    , activation='relu'  #  啟動函數
                    , kernel_initializer='he_normal'  #  權重初始化方式
                    , kernel_regularizer='l2'))  #  正規化方式
    model.add(Dense(units=10  #  輸出數量，0-9
                    , activation='softmax'  #  啟動函數以softmax執行
                    , kernel_initializer='he_normal'  #  權重初始化方式
                    , kernel_regularizer='l2'))  #  正規化方式
    model.compile(optimizer=optimizer
                  , loss='categorical_crossentropy'
                  , metrics=['accuracy'])    
    return model

接下來，以`KerasClassifier`來實作一個模型，參數`build_fn`的來源即為剛才所建立的function。  
verbose是否設置為1看個人需求，習慣上我會設置1，方便如果計算到崩潰之後還有點東西可以看，不過這部份還是可以配合keras的callback來寫入log。

In [13]:
model = KerasClassifier(build_fn=new_model, verbose=1)

設置超參數列表，格式需為dict  

可利用batch_size與epochs來設置批次訓練數量與迭代次數  
param_grid['batch_size']=[8]  
param_grid['epochs']=[30]

In [25]:
param_grid={}
param_grid['optimizer']=['adam', 'sgd']
param_grid['batch_size']=[8]  
param_grid['epochs']=[2]  #  單純範例，所以設置兩次迭代

實作GridSearchCV，主要參數為`estimator`與`param_grid`  
* estimator設置剛才所實作的KerasClassifier
* param_grid設置剛才所建立的超參數列表

In [26]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose=9)
results = grid.fit(X_train, y_train)

Fitting 3 folds for each of 2 candidates, totalling 6 fits
[CV] epochs=2, batch_size=8, optimizer=adam ..........................
Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=adam, score=0.904500, total= 2.4min
[CV] epochs=2, batch_size=8, optimizer=adam ..........................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.7min remaining:    0.0s


Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=adam, score=0.884650, total= 2.3min
[CV] epochs=2, batch_size=8, optimizer=adam ..........................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.3min remaining:    0.0s


Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=adam, score=0.908700, total= 2.5min
[CV] epochs=2, batch_size=8, optimizer=sgd ...........................


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  8.1min remaining:    0.0s


Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=sgd, score=0.915100, total= 1.6min
[CV] epochs=2, batch_size=8, optimizer=sgd ...........................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  9.9min remaining:    0.0s


Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=sgd, score=0.911450, total= 1.4min
[CV] epochs=2, batch_size=8, optimizer=sgd ...........................


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 11.6min remaining:    0.0s


Epoch 1/2
Epoch 2/2
[CV]  epochs=2, batch_size=8, optimizer=sgd, score=0.916700, total= 1.4min


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed: 13.2min remaining:    0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed: 13.2min finished


Epoch 1/2
Epoch 2/2


回傳的物件內容豐富，可以研究下，針對結果可以利用pandas來呈現，以利判讀。

In [28]:
type(results.cv_results_)

dict

In [27]:
pd.DataFrame.from_dict(results.cv_results_)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_batch_size,param_epochs,param_optimizer,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,136.629667,9.014,0.899283,0.901358,8,2,adam,"{'epochs': 2, 'batch_size': 8, 'optimizer': 'a...",2,0.9045,0.9065,0.88465,0.888925,0.9087,0.90865,2.620398,1.719693,0.010488,0.008835
1,81.703667,7.098333,0.914417,0.9184,8,2,sgd,"{'epochs': 2, 'batch_size': 8, 'optimizer': 's...",1,0.9151,0.918175,0.91145,0.91845,0.9167,0.918575,4.963884,0.219688,0.002197,0.000167


## fit_params
keras的callback是在fit的時候給予相對應的參數，但是透過GridSearchCV要加入callback的時候似乎有點摸不著頭緒，下面範例提供參考，就不再執行了。

In [None]:
#  設置提早停止訓練的條件
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='loss', min_delta=0., patience=2, verbose=1)

In [None]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose=9, fit_params={'callbacks': [early_stopping]})
results = grid.fit(X_train, y_train)

## 總結
透過GridViewCV，我們可以將參數條列出來讓模型自動的將各排列組合訓練之後再來取最佳參數，但是需注意到記憶體用量，一但計算過程中崩潰，可能團隊也會跟著欲哭無淚，如果排列組合真的很多的話，記得搭配callback來做log記錄，或是checkpoint的記錄。