<a href="https://colab.research.google.com/github/juhee3199/Machine-learning_advanced-study/blob/master/2-15%20%EB%94%A5%EB%9F%AC%EB%8B%9D3%20%EB%B3%B5%EC%8A%B5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 문제 1

numerical_example_code.py 파일에는 수치적 미분으로 구현한

수업시간 DNN 예제 코드가 저장되어 있습니다. 수치 미분을 이용한 구현은 명확한 한계점이 존재합니다. 

만약 hidden_depth=5를 10으로 2배로 늘린다면 예상되는 실행시간은 몇배로 증가할까요?

(파라미터 대입을 통한 답이 아닌, 이론적으로 설명하세요)

In [None]:
"""수치 미분을 이용한 구현은 gradientdescent한 스텝을 계산을 위해 곱연산이 약 N*(N+1)번 필요하다.
hiden_depth를 2배로 늘리면 파라미터의 수가 2N개가 되어서 곱연산이 2N*(2N+1)번 필요할 것이다. 
이에 따라 실행 시간은 약 3배로 증가한다.
"""

In [46]:
### 수치 미분을 이용한 심층 신경망 학습

# ## Import modules


import time
import numpy as np


# ##  예제 코드

epsilon = 0.0001

def _t(x):
    return np.transpose(x)

def _m(A, B):
    return np.matmul(A, B)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def mean_squared_error(h, y):
    return 1 / 2 * np.mean(np.square(h - y))

class Dense:
    def __init__(self, W, b, a):
        self.W = W
        self.b = b
        self.a = a
        
        self.dW = np.zeros_like(self.W)
        self.db = np.zeros_like(self.b)

    def __call__(self, x):
        return self.a(_m(_t(self.W), x) + self.b)   # matmul((ixo)T,ix1) + ox1


class DNN:
    def __init__(self, hidden_depth, num_neuron, num_input, num_output, activation=sigmoid):
        def init_var(i, o):
            return np.random.normal(0.0, 0.01, (i, o)), np.zeros((o,))

        self.sequence = list()
        # First hidden layer
        W, b = init_var(num_input, num_neuron)
        self.sequence.append(Dense(W, b, activation))
        
        # Hidden layers
        for _ in range(hidden_depth - 1):
            W, b = init_var(num_neuron, num_neuron)
            self.sequence.append(Dense(W, b, activation))

        # Output layer
        W, b = init_var(num_neuron, num_output)
        self.sequence.append(Dense(W, b, activation))

    def __call__(self, x):
        for layer in self.sequence:
            x = layer(x)
        return x

    def calc_gradient(self, x, y, loss_func):
        def get_new_sequence(layer_index, new_layer):
            new_sequence = list()
            for i, layer in enumerate(self.sequence):
                if i == layer_index:
                    new_sequence.append(new_layer)
                else:
                    new_sequence.append(layer)
            return new_sequence
        
        def eval_sequence(x, sequence):
            for layer in sequence:
                x = layer(x)
            return x
        
        loss = loss_func(self(x), y)
        
        for layer_id, layer in enumerate(self.sequence):
            for w_i, w in enumerate(layer.W):
                for w_j, ww in enumerate(w):
                    W = np.copy(layer.W)
                    W[w_i][w_j] = ww + epsilon
                    
                    new_layer = Dense(W, layer.b, layer.a)
                    new_seq = get_new_sequence(layer_id, new_layer)
                    h = eval_sequence(x, new_seq)
                    
                    num_grad = (loss_func(h, y) - loss) / epsilon # (f(x+eps) - f(x)) / eps
                    layer.dW[w_i][w_j] = num_grad
                    
            for b_i, bb in enumerate(layer.b):
                b = np.copy(layer.b)
                b[b_i] = bb + epsilon

                new_layer = Dense(layer.W, b, layer.a)
                new_seq = get_new_sequence(layer_id, new_layer)
                h = eval_sequence(x, new_seq)

                num_grad = (loss_func(h, y) - loss) / epsilon # (f(x+eps) - f(x)) / eps
                layer.db[b_i] = num_grad
        
        return loss


def gradient_descent(network, x, y, loss_obj, alpha=0.01):
    loss = network.calc_gradient(x, y, loss_obj)
    for layer in network.sequence:
        layer.W += -alpha * layer.dW
        layer.b += -alpha * layer.db
    return loss

In [48]:
x = np.random.normal(0.0, 1.0, (10,))
y = np.random.normal(0.0, 1.0, (2,))

dnn = DNN(hidden_depth=10, num_neuron=32,num_input=10, num_output=2, activation=sigmoid)

t = time.time()
for epoch in range(100):
    loss = gradient_descent(dnn, x, y, mean_squared_error, 0.01)
    print('Epoch {}: Test loss{}'.format(epoch, loss))


print('{} seconds elapsed.'.format(time.time() - t))

# hidden_depth = 5: 48.27160286903381 seconds elapsed.
# hidden_depth = 10: 154.6369285583496 seconds elapsed.

Epoch 0: Test loss0.06484947652243556
Epoch 1: Test loss0.06448388702646624
Epoch 2: Test loss0.06412043730242596
Epoch 3: Test loss0.06375912569311103
Epoch 4: Test loss0.06339995025618941
Epoch 5: Test loss0.06304290876955439
Epoch 6: Test loss0.06268799873667782
Epoch 7: Test loss0.062335217391997086
Epoch 8: Test loss0.06198456170626981
Epoch 9: Test loss0.06163602839197021
Epoch 10: Test loss0.061289613908666725
Epoch 11: Test loss0.06094531446839861
Epoch 12: Test loss0.0606031260410389
Epoch 13: Test loss0.06026304435965804
Epoch 14: Test loss0.05992506492586989
Epoch 15: Test loss0.059589183015159315
Epoch 16: Test loss0.05925539368218741
Epoch 17: Test loss0.05892369176608198
Epoch 18: Test loss0.05859407189569654
Epoch 19: Test loss0.05826652849484582
Epoch 20: Test loss0.057941055787510416
Epoch 21: Test loss0.057617647802999744
Epoch 22: Test loss0.05729629838109926
Epoch 23: Test loss0.05697700117715808
Epoch 24: Test loss0.05665974966715895
Epoch 25: Test loss0.0563445371

## 문제 2

backprop_example_code.py 파일에는 

역전파 알고리즘으로 구현한 DNN 예제 코드가 저장되어 있습니다.(from example_code import *)

activation function을 **Tanh class**로 구현하여 

DNN을 학습 시켜 보세요 (Tanh Class를 구현해 activation =Tanh 만 하면됩니다.)



In [None]:
## 역전파 학습법을 이용한 심층 신경망 학습

# ## 유틸리티 함수
import numpy as np
import time 

def _t(x):
    return np.transpose(x)

def _m(A, B):
    return np.matmul(A, B)

class MeanSquaredError: # 1/2 * mean((h - y)^2)  --> h - y
    def __init__(self):
        self.dh = 1
        self.last_diff = 1

    def __call__(self, h, y):
        self.last_diff = h - y
        return 1 / 2 * np.mean(np.square(self.last_diff))

    def grad(self):
        return self.last_diff
    

In [None]:
# ## Activation Function

# 1) sigmoid
class Sigmoid:
    def __init__(self):
        self.last_o = 1

    def __call__(self, x):
        self.last_o = 1.0 / (1.0 + np.exp(-x))
        return self.last_o

    def grad(self): # sigmoid(x)(1 - sigmoid(x))
        return self.last_o * (1.0 - self.last_o)


# 2) Tanh

class Tanh:
    def __init__(self):
        self.last_o = 1

    def __call__(self, x):
        self.last_o = (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))
        return self.last_o

    def grad(self): # (1 + tanh(x))(1 - tanh(x))
        return (1 + self.last_o) * (1 - self.last_o)

In [None]:
# Dense 클래스 구현
# (weight, bias)
# 역전파까지 함께 정의
class Dense:
    def __init__(self, W, b, a_obj):
        self.W = W
        self.b = b
        self.a = a_obj()
        
        self.dW = np.zeros_like(self.W)
        self.db = np.zeros_like(self.b)
        self.dh = np.zeros_like(_t(self.W))
        
        self.last_x = np.zeros((self.W.shape[0]))
        self.last_h = np.zeros((self.W.shape[1]))
        

    def __call__(self, x):
        self.last_x = x
        self.last_h = _m(_t(self.W), x) + self.b
        return self.a(self.last_h)

    def grad(self): # dy/dh = W
        return self.W * self.a.grad()

    def grad_W(self, dh):
        grad = np.ones_like(self.W)
        grad_a = self.a.grad()
        for j in range(grad.shape[1]): # dy/dw = x
            grad[:, j] = dh[j] * grad_a[j] * self.last_x
        return grad

    def grad_b(self, dh): # dy/db = 1
        return dh * self.a.grad()


# 심층 신경망 클래스 구현
    # layer 개수만큼 정의
class DNN:
    def __init__(self, hidden_depth, num_neuron, input, output, activation=Sigmoid):
        def init_var(i, o):
            return np.random.normal(0.0, 0.01, (i, o)), np.zeros((o,))

        self.sequence = list()
        # First hidden layer
        W, b = init_var(input, num_neuron)
        self.sequence.append(Dense(W, b, activation))

        # Hidden Layers
        for index in range(hidden_depth):
            W, b = init_var(num_neuron, num_neuron)
            self.sequence.append(Dense(W, b, activation))

        # Output Layer
        W, b = init_var(num_neuron, output)
        self.sequence.append(Dense(W, b, activation))

    def __call__(self, x):
        for layer in self.sequence:
            x = layer(x)
        return x

    def calc_gradient(self, loss_obj):
        loss_obj.dh = loss_obj.grad()
        self.sequence.append(loss_obj)
        
        # back-prop loop
        for i in range(len(self.sequence) - 1, 0, -1):
            l1 = self.sequence[i]
            l0 = self.sequence[i - 1]
            
            l0.dh = _m(l0.grad(), l1.dh)
            l0.dW = l0.grad_W(l1.dh)
            l0.db = l0.grad_b(l1.dh)
        
        self.sequence.remove(loss_obj)


# ## 경사하강 학습법

def gradient_descent(network, x, y, loss_obj, alpha=0.01):
    loss = loss_obj(network(x), y)  # Forward inference
    network.calc_gradient(loss_obj)  # Back-propagation
    for layer in network.sequence:
        layer.W += -alpha * layer.dW
        layer.b += -alpha * layer.db
    return loss

In [None]:
x = np.random.normal(0.0, 1.0, (10,))
y = np.random.normal(0.0, 1.0, (2,))

dnn = DNN(hidden_depth=5, num_neuron=32, input=10, output=2, activation=Tanh)

t = time.time()
for epoch in range(100):
    loss = MeanSquaredError()
    print('Epoch {}: Test loss{}'.format(epoch, loss))

print('{} seconds elapsed.'.format(time.time() - t))


Epoch 0: Test loss<__main__.MeanSquaredError object at 0x0000028D57919CF8>
Epoch 1: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 2: Test loss<__main__.MeanSquaredError object at 0x0000028D57919CF8>
Epoch 3: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 4: Test loss<__main__.MeanSquaredError object at 0x0000028D57919CF8>
Epoch 5: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 6: Test loss<__main__.MeanSquaredError object at 0x0000028D57919CF8>
Epoch 7: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 8: Test loss<__main__.MeanSquaredError object at 0x0000028D57919CF8>
Epoch 9: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 10: Test loss<__main__.MeanSquaredError object at 0x0000028D57D440F0>
Epoch 11: Test loss<__main__.MeanSquaredError object at 0x0000028D57D0B438>
Epoch 12: Test loss<__main__.MeanSquaredError object at 0x0000028D57D440F0>
Epoch 13: Test loss<__

### 문제 3 

두가지 방식으로 구현한 DNN (수치적 미분/ 역전파 알고리즘)에는 확인한 바와 같이 

학습 속도가 명확히 차이가 나는 것을 확인 할 수 있습니다.

역전파 알고리즘이 학습이 빠른 이유를 설명하고, 어떠한 방식으로 구현하고 있나요?:

In [None]:
"""
수치적 미분은 각 스칼라 변수를 조금씩 바꾸어 대입해 보면서 수치적 기울기를 구한다.  
그래서 N개의 매개변수를 가지고 있다면, 이를 미분하기 위해서 N+1번 더 손실함수를 평가해야 한다.
이는 경사하강법 한 스텝 계산을 위해 N(N+1)의 곱하기 연산을 해야함을 뜻한다.
하지만 역전파 알고리즘은 정방향 연산을 하여 loss를 구하는 과정에서 중간 결과를 저장하고, 이후 loss를 각 파라미터로 미분하는 과정에서 직렬 연결된 두 함수의 미분을 위해 연쇄 법칙(역방향 연산)을 이용한다.
이에 따라 역전파 알고리즘 학습은 단 한번의 손실함수 평가로 미분을 구하게 되면서 훨씬 빠른 속도로 학습이 된다.
"""

### 문제 4 _뉴럴 네트워크 학습 알고리즘 구현
  
1. tensorflow를 임포트 하고 하이퍼파라미터 epoch를 10으로 정의하세요.

In [1]:
import tensorflow as tf

EPOCHS = 10

2. __init__, call 함수를 사용하여 네트워크 구조 정의

In [2]:
class Mymodel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.flatten = tf.keras.layers.Flatten(input_shape = (28,28))
        self.dense1 = tf.keras.layers.Dense(32, activation = 'relu')
        self.dense2 = tf.keras.layers.Dense(64, activation = 'relu')
        self.dense3 = tf.keras.layers.Dense(128, activation = 'relu')
        self.dense4 = tf.keras.layers.Dense(256, activation = 'relu')
        self.dense5 = tf.keras.layers.Dense(10, activation = 'softmax')
    def call(self, x, traning = None, mask = None):
        x = self.flatten(x)
        x = self.dense1(x)
        x = self.dense2(x)
        x = self.dense3(x)
        x = self.dense4(x)
        return self.dense5(x)

3. 학습함수와 테스트 함수를 구현하세요.

In [29]:
# 학습함수 구현

@tf.function
def train_step(model, images, labels, loss_object, optimizer, train_loss, train_accuracy):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss =loss_object(labels, predictions)
        
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    train_loss(loss)
    train_accuracy(labels, predictions)
    
    
# 테스트 함수 구현
@tf.function
def test_step(model, images, labels, loss_object, test_loss, test_accuracy):
    predictions = model(images)
    loss = loss_object(labels, predictions)
    
    test_loss(loss)
    test_accuracy(labels, predictions)
             


4. 복습과제에 함께 첨부한 SVHN 데이터를 불러오세요.
  - scipy.io를 임포트하여 데이터 불러오기
  - 'train_32x32.mat' / 'test_32x32.mat'

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
import numpy as np
import scipy.io as sio
import matplotlib.pyplot as plt
%matplotlib inline

path = '/content/drive/MyDrive/Colab Notebooks/Bitamin_머신러닝 심화/2-15 딥러닝-3 복습'

train_data = sio.loadmat(path + '/train_32x32.mat')
test_data = sio.loadmat(path +'/test_32x32.mat')


x_train = np.array(train_data['X'])
x_test = np.array(test_data['X'])

y_train = train_data['y']
y_test = test_data['y']

In [11]:
# Fix the axes of the images

x_train = np.moveaxis(x_train, -1, 0)
x_test = np.moveaxis(x_test, -1, 0)

print(x_train.shape)
print(x_test.shape)

(73257, 32, 32, 3)
(26032, 32, 32, 3)


* 파라미터는 아래와 같습니다. 
  - shuffle = 1024, batch = 32
* from_tensor_slices 할 때 라벨 범위 조정을 위해 y는 -1한 숫자를 넣습니다. (ex. y_train -> y_train-1)

In [20]:
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train-1)).shuffle(1024).batch(32)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test-1)).batch(32)

5. 모델을 생성하고 손실함수와 최적화 알고리즘을 정의하세요
6. 성능지표를 정의하세요

In [14]:
#  모델 생성

model = Mymodel()

# 손실 함수 및 최적화 알고리즘 정의

loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

In [15]:
# 성능 지표 정의

train_loss = tf.keras.metrics.Mean(name = 'train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'train_accuracy')

test_loss = tf.keras.metrics.Mean(name = 'test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'test_accuracy')

7. for문을 사용하여 실제 데이터 함수에 적용하세요. (학습 루프 구현)

In [30]:
for epoch in range(EPOCHS):
    for images, labels in train_ds:
        train_step(model, images, labels, loss_object, optimizer, train_loss, train_accuracy)
    for images, labels in test_ds:
        test_step(model, images, labels, loss_object, test_loss, test_accuracy)
        
    template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
    print(template.format(epoch+1,
                         train_loss.result(),
                         train_accuracy.result() *100,
                         test_loss.result(),
                         test_accuracy.result() * 100))

    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()


Epoch 1, Loss: 2.5203797817230225, Accuracy: 18.792743682861328, Test Loss: 2.22653865814209, Test Accuracy: 19.587430953979492
Epoch 2, Loss: 2.2374134063720703, Accuracy: 18.921058654785156, Test Loss: 2.227358102798462, Test Accuracy: 19.587430953979492
Epoch 3, Loss: 2.237299680709839, Accuracy: 18.921058654785156, Test Loss: 2.2277166843414307, Test Accuracy: 19.587430953979492
Epoch 4, Loss: 2.23742938041687, Accuracy: 18.921058654785156, Test Loss: 2.227597236633301, Test Accuracy: 19.587430953979492
Epoch 5, Loss: 2.2373063564300537, Accuracy: 18.921058654785156, Test Loss: 2.2279269695281982, Test Accuracy: 19.587430953979492
Epoch 6, Loss: 2.237198829650879, Accuracy: 18.921058654785156, Test Loss: 2.2267038822174072, Test Accuracy: 19.587430953979492
Epoch 7, Loss: 2.237062454223633, Accuracy: 18.921058654785156, Test Loss: 2.226311206817627, Test Accuracy: 19.587430953979492
Epoch 8, Loss: 2.237055540084839, Accuracy: 18.921058654785156, Test Loss: 2.226062774658203, Test A