# MNIST 手寫數字辨識 (MNIST_Autoencoder_TF)

2017/07/20   
徐仕杰

### Tips:
- 記得要download data set: 
[Mnist](https://github.com/Backlu/tf-keras-tutorial/blob/master/basic/mnist.pkl.xz)
- 在command前面加** ! **可以執行console command
- 在command前面加** ? **可以查詢Help
- 什麼是one-hot representation:
[one-hot](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science)  
- 好用的markdown語法
[markdown](https://www.zybuluo.com/codeep/note/163962#1如何输入一个方程式序列)  
<br>
- import PIL error : pip install Pillow
- import pandas error: pip install pandas
- import lzma error: 請用python 3



## Outline

-  [Import Package & Functions](#import) 
-  [1. Import MNIST Data](#Import Data) 
-  [2. seMMA- Autoencoder](#開始Deep Learning)  
-  [3. Reference](#reference)

<a id='import'></a>
## Import Package & Functions

In [1]:
import pandas as pd
import os
import sys
from PIL import Image
import numpy as np
import lzma
import pickle
from IPython.display import display
import tensorflow as tf
from tfdot import tfdot
from tensorflow.contrib.tensorboard.plugins import projector
import shutil

In [2]:
def showX(X, rows=1):
    assert X.shape[0] % rows == 0
    int_X = (X*255).clip(0,255).astype('uint8')
    # N*784 -> N*28*28 -> 28*N*28 -> 28 * 28N
    int_X_reshape = int_X.reshape(rows, -1,28,28).swapaxes(1,2).reshape(28*rows,-1)
    display(Image.fromarray(int_X_reshape))

In [3]:
def updateProgress(msg):
    sys.stdout.write('\r')
    sys.stdout.write(msg)
    sys.stdout.flush()

In [4]:
def variable_summaries(var, name):  
    with tf.name_scope('summaries_'+str(name)):  
        mean = tf.reduce_mean(var)  
        tf.summary.scalar('mean', mean)  
        stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))  
        tf.summary.scalar('stddev', stddev)  
        tf.summary.scalar('max', tf.reduce_max(var))  
        tf.summary.scalar('min', tf.reduce_min(var))  
        tf.summary.histogram('histogram', var)  

<a id='Import MNIST Data'></a>
## 1. Import MNIST Data

#### 先把MNIST資料讀進來
- Training Data: 訓練Model
- Validataion Data: 訓練Model的時候, 同步監控目前模型的好壞
- Testing Data: 訓練結束後, 評估模型的好壞

In [5]:
with lzma.open("mnist.pkl.xz", 'rb') as f:
    train_set, validation_set, test_set = pickle.load(f, encoding='latin1')

print('list裡的前面是picture X',train_set[0])
print('後面是label Y',train_set[1])


list裡的前面是picture X [[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
後面是label Y [5 0 4 ..., 8 4 8]


In [6]:
train_X, train_y = train_set
validation_X, validation_y = validation_set
test_X, test_y = test_set
print('training data size:',len(train_X))
print('validataion data size:',len(validation_X))
print('testing data size:',len(test_X))
print('picture shape:',train_X[0].shape)

training data size: 50000
validataion data size: 10000
testing data size: 10000
picture shape: (784,)


#### 把Y label變成one-hot representation

In [7]:
train_Y = np.eye(10)[train_y]
test_Y = np.eye(10)[test_y]
validation_Y = np.eye(10)[validation_y]

<a id='開始Deep Learning'></a>
## 2. Autoencoder

** DNN, Deep Neural Network (就是很多層的f(WX+B)) **

**這是一層的DNN(其實就是Softmax Regression)**：
![](img/dnn_1.png)

**這是很多層的DNN**
![](img/dnn_2.png)  


** What is autoencoder ? **

** autoencoder可以幹嘛？ **


- Deep Learning ABC 
    -  [A. 定義參數](#定義參數) 
    -  [B. 設計一個Model從X預測Y](#設計一個) 
    -  [C. 選一個loss function,](#選一個loss) 
    -  [D. 選一個optimizer](#選一個o) 
    -  [E. 開始執行訓練](#開始執行) 
    -  [F. 算一下正確率](#算一下正)  

<a id='定義參數'></a>
### A. 定義參數(Placeholder, Variable, Constant)
tips: 把要餵進Model的資料X,Y定義成placeholder, 把要讓電腦幫忙找的權重W,B定義成Variable

In [24]:
# hyperparameters
lr = 0.1 # learning rate
n_inputs = 784 # 每一行的维度
n_classes = 10  # RNN最后的输出類別個数

In [25]:
tf.reset_default_graph()
X =tf.placeholder(tf.float32, [None, n_inputs], name="X")
#Y_ =tf.placeholder(tf.float32, [None, n_classes], name="Y_")

W = {
    'enc_wd1': tf.Variable(tf.random_normal([n_inputs,600], stddev=0.01), name="enc_wd1"),
    'enc_wd2': tf.Variable(tf.random_normal([600,480], stddev=0.01), name="enc_wd2"),
    'dec_wd1': tf.Variable(tf.random_normal([480,600], stddev=0.01), name="dec_wd1"),
    'dec_wd2': tf.Variable(tf.random_normal([600,n_inputs], stddev=0.01), name="dec_wd2"),
    'out': tf.Variable(tf.random_normal([480, n_classes]), name="out")
}
#variable_summaries(W['wd1'],'wd1')
#variable_summaries(W['wd2'],'wd2')
#variable_summaries(W['out'],'out')

B = {
    'enc_bd1': tf.Variable(tf.random_normal([600]),name="enc_bd1"),
    'enc_bd2': tf.Variable(tf.random_normal([480]), name="enc_bd2"),
    'dec_bd1': tf.Variable(tf.random_normal([600]),name="dec_bd1"),
    'dec_bd2': tf.Variable(tf.random_normal([n_inputs]), name="dec_bd2"),
    'out': tf.Variable(tf.random_normal([10]), name="out"),
}

#variable_summaries(B['bd1'],'bd1')
#variable_summaries(B['bd2'],'bd2')
#variable_summaries(B['out'],'out')


<a id='設計一個'></a>
###  B. 設計一個Model從X預測X^ 
Input Layer: X  
Hidden Layer 1: $H_1=f_1(W_1X+B_1)$  
Hidden Layer 2: $H_2=f_2(W_2H_1+B_2)$  
Output Layer : $Y=f_3(W_3H_2+B_3)$  


### L1. Input Layer 輸入層
- do nothing

### L2. Hidden Layer 隱藏層 x 2
- 線性(WX+B) + 非線性(activation function)  


In [26]:
with tf.name_scope("Hidden_Layer_enc1"):
    _encoded = tf.matmul(X, W['enc_wd1']) + B['enc_bd1'] 
    encoded = tf.nn.sigmoid(_encoded, name="H1")

with tf.name_scope("Hidden_Layer_enc2"):
    _encoded = tf.matmul(H1, W['enc_wd2']) + B['enc_bd2'] 
    encoded = tf.nn.relu(_encoded, name="encoded")

### L3. Output Layer 輸出層

with tf.name_scope("Hidden_Layer_dec1"):
    _H3 = tf.matmul(encoded, W['dec_wd1']) + B['dec_bd1'] 
    H3 = tf.nn.relu(_H3, name="H2")

In [27]:
with tf.name_scope("Hidden_Layer_dec2"):
    _decoded = tf.matmul(encoded, W['dec_wd2']) + B['dec_bd2'] 
    decoded = tf.nn.sigmoid(_decoded, name="decoded")

with tf.name_scope('outlayer'):
    _pred = tf.matmul(H2, W['out']) + B['out']
    pred = tf.nn.softmax(_pred, name="pred")
    
with tf.name_scope('accuracy'):
    correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(Y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

<a id='選一個loss'></a>
###  C. 選一個loss function, 當作Machine learning的目標
- cross_entorpy $-log(\Pr(Y_{true}))$

In [28]:
# Cost Function basic term
cross_entropy = -1. * X * tf.log(decoded) - (1. - X) * tf.log(1. - decoded)
loss = tf.reduce_mean(cross_entropy)
#loss = tf.reduce_mean(-tf.reduce_sum(X * tf.log(decoded), reduction_indices=[1]))
#loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=decoded, labels=X))

<a id='選一個o'></a>
### D. 選一個optimizer, 根據Data和我們訂的目標找參數W, B
- 試試看adaptive gradient descent(adagrad), 這個方法會在每個步驟都根據前面步驟的梯度來調整learing rate, 大致上的概念就是一開始走快一點, 接近最低點的時候走小步一點, 當梯度值突然很大的時候也走大步一點, 看一下李宏毅教授的上課影片就能理解    
<br>
$W^{t+1}=W^t- {\frac{\eta^t}{\sigma^t}}g^t$  
$\sigma=\sqrt{\frac{1}{t+1}\sum({g^i})^2}$
<br>  
$\eta$: learning rate  
$\sigma$: 過去所有梯度的root mean square  
$g$: 梯度值  


In [29]:
optimizer = tf.train.AdagradOptimizer(lr).minimize(loss)

***

<a id='開始執行'></a>
### E. 開始執行訓練(Training Data + Validataion Data)

In [30]:
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)

In [31]:
epoch = 15
batch_size = 128
total_batch= len(train_X) / batch_size
for ep in range(epoch+1):
    for i in range(int(total_batch)+1):
        rnd_idx = np.random.choice(train_X.shape[0], batch_size, replace=False)
        batch_x = train_X[rnd_idx]
        #batch_y = train_Y[rnd_idx]
        _, loss_v= sess.run([optimizer, loss], feed_dict={X: batch_x})
        if i%100 ==0:
            #loss_s, acc_s, summary= sess.run([loss], feed_dict={X: validation_X })
            updateProgress('epoch:{x0}, batch:{x4} loss:{x3}'.format(x0=ep,x3=loss_v,x4=i))
    print()


epoch:0, batch:300 loss:0.27613514661788944
epoch:1, batch:300 loss:0.26194632053375244
epoch:2, batch:300 loss:0.27446952462196354
epoch:3, batch:300 loss:0.26187404990196237

KeyboardInterrupt: 

In [None]:
# generate decoded image with test data
decoded_imgs = decoded.eval(feed_dict={X: test_X})

In [None]:
showX(train_X[:10])

In [None]:
showX(decoded_imgs[:10])