## 作業
    
(1) 以 Adam 為例, 調整 batch_size, epoch, 觀察 accurancy, loss 的變化
    
(2) 以同一模型, 分別驗證 SGD, Adam, Rmsprop 的 accurancy

## 參考資料

参数 clipnorm 和 clipvalue 能在所有的优化器中使用，用于控制梯度裁剪(Gradient Clipping)。

[优化器的用法](https://keras.io/zh/optimizers/)

[优化器如何选择](https://blog.csdn.net/qq_35860352/article/details/80772142)

[深度学习优化函数详解（5）- Nesterov accelerated gradient (NAG)](https://blog.csdn.net/tsyccnh/article/details/76673073)

[An overview of gradient descent optimization algorithms](https://arxiv.org/pdf/1609.04747.pdf)

[How to Select the Right Optimization Method for Your Problem](http://www.redcedartech.com/pdfs/Select_Optimization_Method.pdf)

[Second Order Optimization Algorithms I - Yinyu Ye](https://web.stanford.edu/class/msande311/lecture13.pdf)

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation
tf.logging.set_verbosity(tf.logging.ERROR)

Using TensorFlow backend.


## 準備資料

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [3]:
def normalize_mnist_data(x, y):
    x = x.reshape(x.shape[0], x.shape[1] * x.shape[2])
    y = (y[:, None] == np.arange(np.unique(y).shape[0])).astype(int)
    return x, y

def normalize_result(x, y):
    print('x.shape:', x.shape)
    print('y.shape:', y.shape)

In [4]:
print('Before normalization:')
normalize_result(X_train, y_train)

Before normalization:
x.shape: (60000, 28, 28)
y.shape: (60000,)


In [5]:
print('After normalization:')
X_train, y_train = normalize_mnist_data(X_train, y_train)
X_test, y_test = normalize_mnist_data(X_test, y_test)
normalize_result(X_train, y_train)

After normalization:
x.shape: (60000, 784)
y.shape: (60000, 10)


## 定義函數

In [6]:
def show_prediction_score(model, x, y):
    scores = model.evaluate(x, y, batch_size=200, verbose=0)
    for i, metrics_name in enumerate(model.metrics_names):
        print('The test {} is {:.3f}'.format(metrics_name, scores[i]))

## 定義模型

In [7]:
model = Sequential()
model.add(Dense(500, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dense(500))
model.add(Activation('relu'))  
model.add(Dense(500))
model.add(Activation('relu'))
model.add(Dense(500))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [8]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 500)               392500    
_________________________________________________________________
activation_1 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_2 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 500)               250500    
_________________________________________________________________
activation_3 (Activation)    (None, 500)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 500)               250500    
__________

## 以不同的 epoch 和 batch size 觀看準確率的變化

In [9]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_history = model.fit(X_train, y_train, epochs=20, batch_size=128, validation_split=0.3, shuffle=True, verbose=2)

Train on 42000 samples, validate on 18000 samples
Epoch 1/20
 - 13s - loss: 2.9136 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 2/20
 - 12s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 3/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 4/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 5/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 6/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 7/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 8/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 9/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 10/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 11/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 12/20
 - 11s - los

In [10]:
show_prediction_score(model, X_test, y_test)

The test loss is 2.920
The test acc is 0.818


In [11]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_history = model.fit(X_train, y_train, epochs=10, batch_size=512, validation_split=0.3, shuffle=True, verbose=2)

Train on 42000 samples, validate on 18000 samples
Epoch 1/10
 - 7s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 2/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 3/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 4/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 5/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 6/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 7/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 8/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 9/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 10/10
 - 6s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182


In [12]:
show_prediction_score(model, X_test, y_test)

The test loss is 2.920
The test acc is 0.818


## 以不同的 Optimizer 觀看準確率的變化

In [13]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])
train_history = model.fit(X_train, y_train, epochs=20, batch_size=128, validation_split=0.3, shuffle=True, verbose=2)

Train on 42000 samples, validate on 18000 samples
Epoch 1/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 2/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 3/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 4/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 5/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 6/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 7/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 8/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 9/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 10/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 11/20
 - 8s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 12/20
 - 8s - loss: 2.9172 - 

In [14]:
show_prediction_score(model, X_test, y_test)

The test loss is 2.920
The test acc is 0.818


In [15]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
train_history = model.fit(X_train, y_train, epochs=20, batch_size=128, validation_split=0.3, shuffle=True, verbose=2)

Train on 42000 samples, validate on 18000 samples
Epoch 1/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 2/20
 - 9s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 3/20
 - 9s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 4/20
 - 9s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 5/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 6/20
 - 11s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 7/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 8/20
 - 9s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 9/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 10/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 11/20
 - 10s - loss: 2.9172 - acc: 0.8180 - val_loss: 2.9145 - val_acc: 0.8182
Epoch 12/20
 - 12s - loss: 2

In [16]:
show_prediction_score(model, X_test, y_test)

The test loss is 2.920
The test acc is 0.818
