### 1.  sigmoid 함수 구현 
* 딥러닝에서는 matrix(행렬), vector가 input이기 때문에 1개의 숫자를 받는 "math" 라이브러리보다 "numpy"라이브러리를 주로 사용함
* numpy 함수에 vector로 입력값을 넣으면, input size 그대로 output 생성 가능 
* $sigmoid(x) = \frac{1}{1+e^{-x}}$ 
<img src="images/Sigmoid.png" style="width:500px;height:228px;">


In [40]:
# math 라이브러리 이용해서 sigmoid 함수 구현 
import math

def sigmoid_using_math(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    s = 1/(1+math.exp(-x))
    return s 

In [41]:
# 실수 1개 test -> 정상 동작 
print("실수 1개 test: ",sigmoid_using_math(0.1))
# vector test -> 에러 발생 
# print("vector test: ",sigmoid_using_math(np.array([0.1, 0.5])))

실수 1개 test:  0.52497918747894


In [42]:
# numpy 라이브러리 이용해서 sigmoid 함수 구현 
import numpy as np

def sigmoid_using_numpy(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    s = 1/(1+np.exp(-x))
    return s 

In [43]:
# 실수 1개 test -> 정상 동작 
print("실수 1개 test: ",sigmoid_using_numpy(0.1))
# 실수 리스트 test -> 에러 발생 
print("vector test: ",sigmoid_using_numpy(np.array([0.1, 0.5])))

실수 1개 test:  0.52497918747894
vector test:  [0.52497919 0.62245933]


### 2. gradient 계산
* sigmoid function의 gradient(slope, derivative) 계산 
* $sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))$


In [44]:
def sigmoid_gradient(x):
    """
    Compute the gradient of the sigmoid function with respect to its input x.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    grad -- Your computed gradient.
    """
    
    s = 1/(1+np.exp(-x))
    grad = s*(1-s)
    
    return grad

In [45]:
# test
sigmoid_gradient(np.array([0.1, 0.5, 0.3]))

array([0.24937604, 0.23500371, 0.24445831])

### 3. reshape arrays
* np.reshape을 이용하여 3D vector를 1D Vector로 "unroll"할 수 있음 
* 예를 들어, 이미지 vector를 (length, height, depth) -> (lenght * height * depth, 1) 

In [46]:
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    vector = image.reshape(image.shape[0]*image.shape[1]*image.shape[2], 1)
    
    return vector


In [58]:
# test 
test_image = np.array([[[ 0.67826139,  0.29380381],
                         [ 0.90714982,  0.52835647],
                         [ 0.4215251 ,  0.45017551]],

                       [[ 0.92814219,  0.96677647],
                        [ 0.85304703,  0.52351845],
                        [ 0.19981397,  0.27417313]],

                       [[ 0.60659855,  0.00533165],
                        [ 0.10820313,  0.49978937],
                        [ 0.34144279,  0.94630077]]])
print("test_image shape:", test_image.shape)
print("result:", image2vector(test_image))


test_image shape: (3, 3, 2)
result: [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


### 4. 정규화 (Normalizing rows)
* 정규화를 수행하면, gradient descent 수렴 속도가 빨라질 수 있음 
* 아래 예시는 각 행마다 정규화를 수행한 모습
* axis=1은 row-wise 방식, axis=0은 column-wise 방식
* $\frac{x}{\| x\|}$ 계산 시 broadcasting 적용된 것
* $x = \begin{bmatrix}
        0 & 3 & 4 \\
        2 & 6 & 4 \\
\end{bmatrix}$
*  
$\| x\| = \text{np.linalg.norm(x, axis=1, keepdims=True)} = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}$
* 
$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}$ 

In [69]:
def normalize_rows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    x_norm = np.linalg.norm(x, axis=1, keepdims=True)
    x = x / x_norm   

    return x

In [70]:
# test 
test_x = np.array([[0, 3, 4],
                   [1, 6, 4]])
normalize_rows(test_x)

array([[0.        , 0.6       , 0.8       ],
       [0.13736056, 0.82416338, 0.54944226]])

### 5. softmax 함수 구현
* softmax 함수는 0~1사이의 값으로 모두 정규화하여 값을 리턴하며 출력 값들의 총합은 항상 1이 되는 특성을 갖는다.
* $\text{for } x \in \mathbb{R}^{1\times n} \text{,     }$

\begin{align*}
 softmax(x) &= softmax\left(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}\right) \\&= \begin{bmatrix}
    \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} 
\end{align*}
* $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  

\begin{align*}
softmax(x) &= softmax\begin{bmatrix}
            x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
            x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
            \vdots & \vdots & \vdots & \ddots & \vdots \\
            x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
            \end{bmatrix} \\ \\&= 
 \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} \\ \\ &= \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    \vdots  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} 
\end{align*}

In [76]:
# GRADED FUNCTION: softmax

def softmax_func(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (m,n).

    Argument:
    x -- A numpy matrix of shape (m,n)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (m,n)
    """
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    s = x_exp / x_sum
    
    return s

In [77]:
# test
test_x = np.array([[9, 2, 5, 0, 0],
                   [7, 5, 0, 0 ,0]])

softmax_func(test_x)

array([[9.80897665e-01, 8.94462891e-04, 1.79657674e-02, 1.21052389e-04,
        1.21052389e-04],
       [8.78679856e-01, 1.18916387e-01, 8.01252314e-04, 8.01252314e-04,
        8.01252314e-04]])

### 6. for문 사용하지 말고, vectorization !
* dot, outer, elementwise product 등을 계산할 때 벡터화해서 계산하는 것이 효율적

In [90]:
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]

#### 예시 ) dot product (내적)

In [104]:
### dot product ###
# 하나씩 계산해서 더함 
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i] * x2[i]
toc = time.process_time()
# print ("dot = " + str(dot) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

# 벡터화
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
# print ("dot = " + str(dot) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

### outer product ###
# 하나씩 계산해서 더함 
tic = time.process_time()
outer = np.zeros((len(x1), len(x2)))
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i] * x2[j]
toc = time.process_time()
# print("\n")
# print ("dot = " + str(outer) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

# 벡터화
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
# print ("dot = " + str(outer) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

### elementwise ###
# 하나씩 계산해서 더함 
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i] * x2[i]
toc = time.process_time()
# print("\n")
# print ("dot = " + str(mul) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

# 벡터화 
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
# print ("dot = " + str(mul) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

### general dot product ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array

# 하나씩 계산해서 더함 
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j] * x1[j]
toc = time.process_time()
# print("\n")
# print ("dot = " + str(gdot) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")

# 벡터화 
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
# print ("dot = " + str(dot) + "\n계산시간 = " + str(1000 * (toc - tic)) + "ms")


### 7. loss function 구현
* L1 loss
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^{m-1}|y^{(i)} - \hat{y}^{(i)}| \end{align*}$$
* L2 loss
$$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^{m-1}(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}$$

In [105]:
def L1(y, yhat):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    loss = np.sum(abs(y-yhat))
    
    return loss

In [106]:
def L2(y, yhat):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """

    loss = np.sum(np.dot(y-yhat, y-yhat))
    
    return loss