- [Binary Classification](#bincls)
	- Use logistic regression algorithm for binary classification
- [Logistic Regresssion](#logreg)
- [Gradient Decent](#gradecent)
- [Derivatives](#deriv)
- [Computation Graph](#compgraph)
- [Logistic Regression Gradient Descent](#logreg-gradecent)

### Binary Classification

- Process m training sets w/o loops using matrix
- Computation of NN : Forward & Backward propagation steps
- 'Logistic Regression' (algorithm for binary classification) to convey above ideas

<img src="https://i.imgur.com/w1cixJF.png" style="width:550px;height:350px;">

<img src="https://i.imgur.com/TkGyzQ9.pngK" style="width:550px;height:350px; float: left;">

Unroll $i^{th}$ training example into $X^{(i)}$ (three 64 by 64 matrices of RGB)<br>
Put entire training sets into $ X \in$ $R^{n_{x} \times m}$<br>
$X.shape=(n_{x}, m)$<br>
$Y.shape=(1, m)$


$n_{x} =$ (number of input features) $= 64\times64\times3 = 12,288$<br>
$m_{train} =$ (number of training data)<br>
$m_{test} =$ (number of test data)<br>




<img id="logreg" src="https://i.imgur.com/9iAacgB.png" style="width:550px;height:320px;">

In linear regression, $\hat{y} = w^{T}X+b$. But $\hat{y}$ should be $0\leq\hat{y}\leq1$<br>
In logistic regression,
$\hat{y} = \sigma({w^T+b})$ using sigmoid function

<img src="https://i.imgur.com/ae4mzIU.png" style="width:570px;height:330px;">

$\hat{y}^{(i)}= \sigma(w^T X^{(i)} + b) = \sigma(z^{(i)}) = \frac{1}{1+e^{-z^{(i)}}} where z^{(i)}= w^T X^{(i)} + b$

$Given\ \{(x^{(1)}, y^{(1)}), ... (x^{(m)}, y^{(m)})\},\ want\ \hat{y}^{(i)} \approx y^{(i)}$

In logistic regresion implementation objective is to find parameters $w$ and $b$ to make $\hat{y}$ a good estimate of the chance of Y being equal to 1.


In training logistic regression model, find $w$ and $b$ that minimize overall cost function, $J(w,b)$


<img id="gradecent" src="https://i.imgur.com/m5HTDzl.png" style="width:550px;height:300px; float: left;">

- You've seen the Logistic regression model. Loss function that measures how well you're doing on the single training example. Cost function that measures how well your parameters $w$ and $b$ are doing on your entire training set.
- Now let's talk about how you can use the gradient descent algorithm to train, or to learn, the parameters w and b on your training set.

- Take iterative steps from initial position to the global optimum


<img src="https://i.imgur.com/2mGFlWu.png" style="width:550px;height:300px; float: left;">

<img id="deriv" src="https://i.imgur.com/ffqllqb.png" style="width:550px;height:280px; float: left;">

<img src="https://i.imgur.com/HlVVp1t.png" style="width:550px;height:280px; float: left;">

<img src="https://i.imgur.com/DrdyBeg.png" style="width:550px;height:280px; float: left;">

<img id="compgraph" src="https://i.imgur.com/zEX9fxz.png" style="width:550px;height:300px;">

One step of backward propagation on a computation graph yields derivative of final output variable.

<img src="https://i.imgur.com/XSPppwf.png" style="width:550px;height:300px; float: left;">

<img src="https://i.imgur.com/fTmw5jJ.png" style="width:550px;height:300px; float: left;">

- Logistic regression Gradient descent

Want: modify w and b to reduce L-> go backwards propagation to compute derivatives

<img id="logreg-gradecent" src="https://i.imgur.com/jKcH9jB.png" style="width:550px;height:300px;">

<img src="https://i.imgur.com/Ntnnqey.png" style="width:550px;height:300px; float: left;">

$$dz = \frac{dL}{da} \cdot \frac{da}{dz} = \{-\frac{y}{a} + \frac{1-y}{1-a}\}\cdot\{a\cdot(1-a)\}$$<br>
$$\frac{da}{dz} = \frac{d\sigma(z)}{dz} = \frac{d}{dz}(\frac{1}{1+e^{-z}}) = a\cdot(1-a)$$

<img src="https://i.imgur.com/AhMybZm.png" style="width:550px;height:250px; float: left;">

$n_x = 2,\ m = 1$

<img src="https://i.imgur.com/2pseXrt.png" style="width:550px;height:280px; float: left;">

In [1]:
# Type the following in 'Python Console' to install dependencies in jupyter
# import sys
# !conda install --yes --prefix {sys.prefix} numpy

import numpy as np
import matplotlib.pyplot as plt
from lr_utils import load_dataset

%matplotlib inline

ModuleNotFoundError: No module named 'h5py'

In [None]:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

In [None]:
# [data types](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html)

# 64x64 with 8-bit integer components
print(np.iinfo(np.uint8))

for index in range(train_set_x_orig.shape[0]):
    cat = train_set_x_orig[index]
    assert(np.ndarray(shape=(64, 64, 3), dtype=np.uint8).shape == cat.shape)

In [None]:
# [data types](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html)

# 64x64 with 8-bit integer components
print(np.iinfo(np.uint8))

for index in range(train_set_x_orig.shape[0]):
    cat = train_set_x_orig[index]
    assert(np.ndarray(shape=(64, 64, 3), dtype=np.uint8).shape == cat.shape)

In [None]:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

In [None]:
# [data types](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html)

# 64x64 with 8-bit integer components
print(np.iinfo(np.uint8))

for index in range(train_set_x_orig.shape[0]):
    cat = train_set_x_orig[index]
    assert(np.ndarray(shape=(64, 64, 3), dtype=np.uint8).shape == cat.shape)

In [None]:
# [data types](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html)

# 64x64 with 8-bit integer components
print(np.iinfo(np.uint8))

for index in range(train_set_x_orig.shape[0]):
    cat = train_set_x_orig[index]
    assert(np.ndarray(shape=(64, 64, 3), dtype=np.uint8).shape == cat.shape)

In [None]:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

In [1]:
# [data types](https://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html)

# 64x64 with 8-bit integer components
print(np.iinfo(np.uint8))

for index in range(train_set_x_orig.shape[0]):
    cat = train_set_x_orig[index]
    assert(np.ndarray(shape=(64, 64, 3), dtype=np.uint8).shape == cat.shape)

NameError: name 'np' is not defined