## Logistic Regression as a Neural Network

### Logistic Regression as a neural network
1. Logistic Regression
    1. Given x, want $\hat{y} = p(y=1|x), 0 \leq \hat{y} \leq1$.
    2. output: $\hat{y} = \sigma(\omega^Tx + b)$
    3. $  z = \omega^Tx + b, \sigma(z) = \frac{1}{1 + e^{-z}} $

2. Loss Regression Cost Function
    1. Given ${(x_1, y_1), (x_2, y_2), ..., (x_m, y_m)},  \hat{y_i}  \approx y_i$
    2. $L(\hat{y}, y) = \frac{1}{2}(\hat{y} - y)^2$
    3. 为了产生凸优化问题，定义交叉熵$L(\hat{y}, y)=-(ylog(\hat{y}) + (1-y)log(1-\hat{y}))$
    4. Cost function $J(w, b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}, y) = -\frac{1}{m}\sum_{i=1}^{m}[ylog(\hat{y}) + (1-y)log(1-\hat{y})]$
    5. 即loss 是单一样本的指标，cost是整个训练集在参数w, b上的指标。

3. Gradient Descent
    1. find w, b to minimize J(w, b)
    2. Repeat {  
        $ w := w - \alpha \frac{dJ(w)}{dw}$  
        }
    3. for multi-variable, {  
        $w := w - \alpha \frac{\partial J(w, b)}{\partial w}$  
        $b := b - \alpha \frac{\partial J(w, b)}{\partial b}$  
    }

4. Computation Graph
    1. Set J(a, b, c) = 3(a + b*c)
    2. foreward and backward propagation
    ![foreward](figures/fp.png)
    3. 链式法则，$\frac{dJ}{dc} = \frac{dJ}{dv}\frac{dv}{du}\frac{du}{dc}$

5. Logistic Regression Gradient Descent
    1. $(\hat{y}, y)=-(ylog(\hat{y}) + (1-y)log(1-\hat{y}))$
    ![lrgd](figures/lrgd.png)
        * $\frac{dL}{da} = -\frac{y}{a} + \frac{1-y}{1-a}$
        * $\frac{dL}{dz} = \frac{dL}{da}\frac{da}{dz}  = \frac{dL}{da}  \cdot a(1-a) = a-y$
        * $\frac{dL}{dw} = \frac{dL}{da}\frac{da}{dz}\frac{dz}{dw} = xdz$

6. Batch Gradient Descent 
    1. 每次迭代更新全部w，最后除以样本数m
    2. 使用向量运算迭代更新

### Python and Vectorization

1. What is vectorization
    * $w \in R^n, x \in R^n$分别表示两个n维向量， $w^Tx +b$的两种实现
    * 尽量避免for循环
    

In [1]:
import numpy as np

a = np.array([1, 2, 3, 4])
print(a)

[1 2 3 4]


In [4]:
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)
# create a 1000000-dims vector

tic = time.time()
c = np.dot(a, b)
toc = time.time()

print(c)
print("Vectorized version:" + str(1000 * (toc - tic)) + "ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i] * b[i]
toc = time.time()

print(c)
print("For loop version:" + str(1000 * (toc - tic)) + "ms")

250379.8787853438
Vectorized version:33.2331657409668ms
250379.8787853569
For loop version:879.6980381011963ms


2. Vectorizing Logistic Regression's Gradient Computation
    * 由1.5.A可知 $dz_i = a_i - y_i$
    * 令 $A = [a_1, a_2, ..., a_m], Y=[y_1, y_2, ..., y_m], dZ = [dz_1, dz_2, ..., dz_m] = A - Y$
    * $db = \sum_{i=1}^{m} dz_i$ = np.mean(dZ)
    * $dw= \frac{1}{m}[x_1, x_2, ..., x_m][dz_1; dz_2; ...; dz_m] = \frac{1}{m}XdZ^T$ 

w = np.random.rand(10)    
b = np.zeros(10)  
Z = np.dot(w.T, X) + b 
A = sigmoid(Z)  
dZ = A - Y  
dw = X * dZ.T / m  
db = np.mean(dZ)  
w = w - alpha * dw  
b = b - alpha * db