## 5.2 Chain rule

$$ z = t^2 $$
$$ t = x + y $$

$$ \frac{\partial z}{\partial t} = 2t $$
$$ \frac{\partial t}{\partial x} = 1 $$

$$ \frac{\partial z}{\partial x} = \frac{\partial z}{\partial t} \frac{\partial t}{\partial x} = 2(x+y)$$

## 5.4 단순 계층 구현

In [58]:
import numpy as np

In [1]:
# 곱셈 계층

class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None
        
    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x * y
        
        return out
    
    def backward(self, dout):
        # x와 y를 바꾼다
        dx = dout * self.y
        dy = dout * self.x
        
        return dx, dy

In [14]:
apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

round(price)

220

In [19]:
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

print('dapple: ', dapple)
print('dapple_num: ', round(dapple_num))
print('tax: ', round(dtax))

dapple:  2.2
dapple_num:  110
tax:  200


In [22]:
# 덧셈 계층

class AddLayer:
    def __init__(self):
        pass
    
    def forward(self, x, y):
        out = x + y
        return out
    
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy

In [55]:
ap = 100
ap_num = 2
og = 150
og_num = 3
tax = 1.1

mul_ap = MulLayer()
mul_og = MulLayer()
add_two = AddLayer()
mul_tax = MulLayer()

# forward
ap_pr = mul_ap.forward(ap, ap_num)
og_pr = mul_og.forward(og, og_num)
sum_pr = add_two.forward(ap_pr, og_pr)
result = mul_tax.forward(sum_pr, tax)

print('result: ', round(result))


# backward
dresult = 1

dsum_pr, dtax = mul_tax.backward(dresult)

dap_pr, dog_pr = add_two.backward(dsum_pr)

dap, dap_num = mul_ap.backward(dap_pr)

dog, dog_num = mul_og.backward(dog_pr)

for i in [dresult, dsum_pr, dtax, dap_pr, dog_pr, dap, dap_num, dog, dog_num]:
    print(round(i, 2))

result:  715
1
1.1
650
1.1
1.1
2.2
110.0
3.3
165.0


## 5.5 활성화 함수 계층 구현

### ReLU

$ y = $
$$x(x>0)$$
$$0(x<0)$$
$\frac{\partial y}{\partial x}=$
$$1(x>0)$$
$$0(x<0)$$

In [56]:
# ReLU 계층

class Relu:
    def __init__(self):
        self.mask = None
        
    def forward(self, x):
        self.mask = (x <= 0)
        out = x.copy()
        out[self.mask] = 0
        
        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        
        return dx

In [59]:
x = np.array([[1.0, -0.5], [-2.0, 3.0]])
print(x)

[[ 1.  -0.5]
 [-2.   3. ]]


In [60]:
mask = (x <= 0)
print(mask)

[[False  True]
 [ True False]]


In [61]:
x[mask]

array([-0.5, -2. ])

### Sigmoid

$$y = \frac{1}{1 + exp(-x)}$$

$$\frac{\partial L}{\partial y} y^2 exp(-x) = \frac{\partial L}{\partial y}y(1-y)$$
$$\sigma ' = \sigma (1 - \sigma)$$

In [62]:
# Sigmoid 계층

class Sigmoid:
    def __init__(self):
        self.out = None
        
    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        
        return out
    
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out # y(1-y)
        
        return dx

## 5.6 Affine, Softmax 계층 구현

X (2,) * W (2, 3) = Y (3,)
- Affine transformation

$$\frac{\partial L}{\partial X} = \frac{\partial L}{\partial Y} \dot\ W^T$$ (2,) = (3,) * (3, 2)
$$\frac{\partial L}{\partial W} = X^T \dot\ \frac{\partial L}{\partial Y}$$ (2, 3) = (2, 1) * (1, 3)
$$X = (x_0, x_1, ... , x_n)$$
$$\frac{\partial L}{\partial X} = (\frac{\partial L}{\partial x_0}, \frac{\partial L}{\partial x_1}, ..., \frac{\partial L}{\partial x_n})$$

### 배치용 Affine 계층 (N개의 데이터 묶음: 배치 input)

X (N, 2) * W (2, 3) = Y (N, 3)
- Affine transformation

$$\frac{\partial L}{\partial X} = \frac{\partial L}{\partial Y} \dot\ W^T$$ (N, 2) = (N, 3) * (3, 2)
$$\frac{\partial L}{\partial W} = X^T \dot\ \frac{\partial L}{\partial Y}$$ (2, 3) = (2, N) * (N, 3)

In [63]:
# Broadcasting

X_dot_W = np.array([[0,0,0], [10,10,10]])
B = np.array([1,2,3])
X_dot_W

array([[ 0,  0,  0],
       [10, 10, 10]])

In [64]:
X_dot_W + B

array([[ 1,  2,  3],
       [11, 12, 13]])

In [65]:
dY = np.array([[1,2,3], [4,5,6]])
dY

array([[1, 2, 3],
       [4, 5, 6]])

In [67]:
dB = np.sum(dY, axis=0)
dB

array([5, 7, 9])

In [68]:
class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
        
    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        
        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        
        return dx

In [69]:
class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None
        self.y = None # softmax의 출력
        self.t = None # 정답레이블 (one-hot vetor)
        
    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)
        self.loss = cross_entropy_error(self.y, self.t)
        
        return self.loss
    
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size
        
        return dx

## 5.7 오차역전파 구현