## Cost function

In a previous lab, you developed the *logistic loss* function. Recall, loss is defined to apply to one example. Here you combine the losses to form the **cost**, which includes all the examples.


Recall that for logistic regression, the cost function is of the form 

$$ J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) \right] \tag{1}$$

where
* $loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)})$ is the cost for a single data point, which is:

    $$loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \tag{2}$$
    
*  where m is the number of training examples in the data set and:
$$
\begin{align}
  f_{\mathbf{w},b}(\mathbf{x^{(i)}}) &= g(z^{(i)})\tag{3} \\
  z^{(i)} &= \mathbf{w} \cdot \mathbf{x}^{(i)}+ b\tag{4} \\
  g(z^{(i)}) &= \frac{1}{1+e^{-z^{(i)}}}\tag{5} 
\end{align}
$$

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import copy

In [14]:
x = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])  #(m,n)
y = np.array([0, 0, 0, 1, 1, 1])                                           #(m,)
w = np.array([1,1])
b = -3

In [18]:
def sigmoid(z):
    g = 1.0/(1.0+np.exp(-z))
    return g

In [21]:
def compute_logistic_loss(x,y,w,b):

    m = x.shape[0]
    cost = 0

    for i in range(m):
        z_i = np.dot(x[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost += -y[i] * np.log(f_wb_i) - (1-y[i]) * np.log(1-f_wb_i)

    cost = cost / m

    return cost

## Logistic Gradient Descent


Gradient descent algorithm utilizes the gradient calculation:
$$\begin{align*}
&\text{repeat until convergence:} \; \lbrace \\
&  \; \; \;w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1}  \; & \text{for j := 0..n-1} \\ 
&  \; \; \;  \; \;b = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\
&\rbrace
\end{align*}$$

Where each iteration performs simultaneous updates on $w_j$ for all $j$, where
$$\begin{align*}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} 
\end{align*}$$

* m is the number of training examples in the data set      
* $f_{\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target
* For a logistic regression model  
    $z = \mathbf{w} \cdot \mathbf{x} + b$  
    $f_{\mathbf{w},b}(x) = g(z)$  
    where $g(z)$ is the sigmoid function:  
    $g(z) = \frac{1}{1+e^{-z}}$   
    

In [26]:
def compute_logistic_gradient(x,y,w,b):
    m, n = x.shape
    dj_dw = np.zeros((n,))
    dj_db = 0.0

    for i in range(m):
        z_i = np.dot(x[i], w) + b
        f_wb_i = sigmoid(z_i)
        err = f_wb_i - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] +  err * x[i,j]
        dj_db = dj_db + err 

    dj_db = dj_db/m
    dj_dw = dj_dw/m

    return dj_dw, dj_db

In [28]:
def gradient_descent(X,y,w_in,b_in,alpha,num_iters):
    J_history = []
    w = copy.deepcopy(w_in)
    b = b_in

    for i in range(num_iters):
        dj_dw, dj_db = compute_logistic_gradient(X,y,w,b)

        w = w - alpha * dj_dw
        b = b - alpha * dj_db

        if i<100000000:
            J_history.append(compute_logistic_gradient(X,y,w,b))
        
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]}   ")
    
    return w,b,J_history



In [30]:
w_tmp  = np.zeros_like(x[0])
b_tmp  = 0.
alph = 0.1
iters = 10000

w_out, b_out, _ = gradient_descent(x, y, w_tmp, b_tmp, alph, iters) 
print(f"\nupdated parameters: w:{w_out}, b:{b_out}")

Iteration    0: Cost (array([-0.22449108, -0.14601523]), 0.014924742767851254)   
Iteration 1000: Cost (array([-0.01154724, -0.01179442]), 0.032922637855873425)   
Iteration 2000: Cost (array([-0.00639134, -0.0064637 ]), 0.017933890941894386)   
Iteration 3000: Cost (array([-0.00437133, -0.00440444]), 0.01219290340280727)   
Iteration 4000: Cost (array([-0.00331049, -0.00332914]), 0.009205288270528272)   
Iteration 5000: Cost (array([-0.00266019, -0.00267204]), 0.007382936755160681)   
Iteration 6000: Cost (array([-0.00222181, -0.00222997]), 0.0061583046606211835)   
Iteration 7000: Cost (array([-0.00190669, -0.00191261]), 0.005279886586599503)   
Iteration 8000: Cost (array([-0.00166943, -0.00167391]), 0.004619558128642703)   
Iteration 9000: Cost (array([-0.00148444, -0.00148794]), 0.0041053232770014575)   

updated parameters: w:[5.28123029 5.07815608], b:-14.222409982019837
