# Week 2

## Binary classification

## Logistic regression

$\hat{y} = \sigma(w^Tx+b)$

where the sigmoid function is $\sigma(z) = \frac{1}{1+e^{-z}}$

$w$ is an $n_x$ dimensional vector, and $b$ is a real number

## Logistic regression cost function

The loss (error) function: $L(\hat{y},y) = \frac{1}{2}(\hat{y}-y)^2$

$L(\hat{y}-y) = -(ylog\hat{y} + (1-y)log(1-\hat{y}))$

Cost funcion: $J(w,b) = -\frac{1}{n} \sum^n_{i=1}[y^{(i)}log\hat{y}^{(i)}+(1-y^{(i)})log(1-\hat{y}^{(i)})]$

- The **loss function** computes the error for a single training example, while the **cost function** is the average of the loss functions of the entire training set.

## Gradient Descent
* Want to find $w, b$ that minimize $J(w,b)$

$w := w - \alpha \frac{dJ(w)}{dw}$

Where $\alpha$ is the learn rate and controls the steps we take on each iteragion or gradient descent, while the derivative term represents the slope of the function.

$J(w,b) = \frac{1}{m} \sum^m_{i=1} L(\hat{y}^{(i)}, y^{(i)} = -\frac{1}{m}\sum^m_{i=1}(y^{(i)}log\hat{y}^{(i)} + (1-y^{(i)})log(1-\hat{y}^{(i)}))$

## Derivatives

* On a straight line, the function's derivative doesn't change.

* The slope of the function can be different to different points in the function

## Computation graph

* It's the left-right calculation direction of **foward propagation**
* One step of **backward propagation** on a computation graph yields derivative of final output.

## Derivatives with a computation graph

## Logistic regression gradient descent

## Vectorization


In [1]:
import numpy as np

a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [6]:
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b)
toc = time.time()

print(c)
print("Vectorization version: " + str(1000*(toc-tic)) + " ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i] * b[i]
toc = time.time()

print(c)
print("For loop: " + str(1000*(toc-tic)) + " ms")

249997.58501630303
Vectorization version: 1.4030933380126953 ms
249997.58501630262
For loop: 619.0812587738037 ms


In [3]:
%time

a = np.random.rand(1000000)
b = np.random.rand(1000000)
c = np.dot(a,b)

print(c)

CPU times: user 2 µs, sys: 2 µs, total: 4 µs
Wall time: 5.96 µs
249995.94173242358


In [5]:
%time

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i] * b[i]
    
print(c)

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 6.2 µs
249995.94173242492
