# Neural Networks and Deep Learning

## Supervised learning

Examples
- home features -> price: **NN**
- ad, user info -> click on ad? (0,1): **NN**
- image -> object$(1\dots1000)$: **CNN**
- audio -> text transcript: **RNN**
- english -> chinese: **RNN**
- image, radar info -> position of other cars: **Hybrid**

Data
- structured data
- unstructured data

## Binary classification

- $m$ training examples: $\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}) \dots (x^{(m)}, y^{(m)})\}$

$X = 
\begin{bmatrix}
    \vdots & \vdots & \vdots \\
    \vdots & \vdots & \vdots \\ 
    X^{(1)} & X^{(2)} \ldots & X^{(m)} \\
    \vdots & \vdots & \vdots \\
    \vdots & \vdots & \vdots \\ 
\end{bmatrix}$

- $X \in {\rm I\!R^{n_{x}, m}}$
- $X$.shape = $(n_{x}, m)$

$Y = 
\begin{bmatrix}
    Y^{(1)} & Y^{(2)} \ldots & Y^{(m)} \\
\end{bmatrix}$

- $Y \in {\rm I\!R^{1, m}}$
- $Y$.shape = $(1, m)$

## Logistic regression

- given $x$, want $\hat{y} = P(y=1|x)$ (where $x \in {\rm I\!R^{n_{x}}}$ and $0 \le \hat{y} \le 1$)
- parameters: $w \in {\rm I\!R^{n_{x}}}$, $b \in {\rm I\!R}$
- output $\hat{y} = \sigma{(w^{T}x + b)}$

$\sigma{(z)} = \dfrac{1}{1+e^{-z}}$
- if $z$ large positive, $\sigma{(z)} \approx 1$
- if $z$ large negative, $\sigma{(z)} \approx 0$

## Loss function

- $L(\hat{y}, y) = -(ylog\hat{y} + (1-y)log(1-\hat{y}))$
- if $y = 1$, $L(\hat{y}, y) = -log\hat{y}$ => want $\hat{y}$ large as possible ($y \approx 1$)
- if $y = 0$, $L(\hat{y}, y) = -log(1-\hat{y})$ => want $\hat{y}$ small as possible ($y \approx 0$)

## Cost function

$J(w,b) = \dfrac{1}{m}\displaystyle\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})$ = $\dfrac{1}{m}\displaystyle\sum_{i=1}^{m}-(y^{(i)}log\hat{y}^{(i)} + (1-y^{(i)})log(1-\hat{y}^{(i)}))$

## Gradient descent

- want $w,b$ that minimizes $J(w,b)$
- $w := w - \alpha\dfrac{\partial J(w,b)}{\partial w}$
- $b := b - \alpha\dfrac{\partial J(w,b)}{\partial b}$

## Logistic regression gradient descent

- $z = w^{T} + b$
- $y = a = \sigma(z)$
- $L(a,y) = -(ylog(a) + (1-y)log(1-a))$

Example with 2 features with a single training example

- parameters: $x_{1}, w_{1}, x_{2}, w_{2}, b$
- $z = w_{1}x_{1} + w_{2}x_{2} + b$
- $a = \sigma(z)$
- $L(a,y)$

Derivatives

- $da = \dfrac{\partial L(a,y)}{\partial a} = -\dfrac{y}{a} + \dfrac{1-y}{1-a}$
- $dz = \dfrac{\partial L(a,y)}{\partial z} = \dfrac{\partial L(a,y)}{\partial a}\dfrac{\partial a}{\partial z} = \left(-\dfrac{y}{a} + \dfrac{1-y}{1-a}\right)a(1-a) = a - z$ 
- $dw_{1} = x_{1}dz$
- $dw_{2} = x_{2}dz$
- $db = dz$