# ***TO USE***

Go to File->Save a copy in Drive. This is the master copy, which non-admin members are not allowed to edit.

***Before starting, use the shortcut Ctrl+] to collapse all cells in the notebook.*** This ensures that solutions will not be spoiled before you answer the coding questions in this notebook. You can safely unhide the outermost cells; solutions will still be hidden.

# **Building a logistic regression model**

This notebook will walk you through building a logistic regression model in Python.

# Importing

`numpy` is a library (package of functions and methods) for scientific computing. For the most part, we'll be using it to work with matrices in Python.

The `as np` part of the statement just changes the name we use to refer to `numpy`. Machine learning practitioners prefer the shorthand because it's less to type.

\

Unhide the following cell, and run the code by either:
- clicking the "run" button on the left of the cell
- using the shortcut **Ctrl+Enter**

In [None]:
import numpy as np

# Before we start...

What are the dimensions of $X$ and $y$ again?

(Use the presentation's convention that $m$ is the number of instances and $n$ is the number of features.)

### Solution:

$X$ is $(n, m)$. $y$ is $(1, m)$.

$$X = \begin{bmatrix}
| & | & | & ... & | \\
| & | & | & ... & | \\
x^{(1)} & x^{(2)} & x^{(3)} & ... & x^{(m)} \\
| & | & | & ... & | \\
| & | & | & ... & |
\end{bmatrix}$$

\\

$$y = \begin{bmatrix}
y^{(1)} &  y^{(2)} &  y^{(3)} &  ... & y^{(m)}
\end{bmatrix}$$

# Initializing $w$ and $b$

We will initialize $w$ to random values in the range $(0,1)$. (This means that all the random values are between 0 and 1, not including 0 or 1.) We will initialize $b$ to 0.

(This is technically not necessary for logistic regression, but when we cover neural networks in a later lesson, this idea becomes important. Check out this video by Dr. Andrew Ng for more: https://www.youtube.com/watch?v=6by6Xas_Kho)

\

#### **Introduction to library function:** [`np.random.rand(x0, x2)`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html)

*Links above go to official Numpy documentation.*

We will use the `rand()` function from `np.random` to generate random values. The following statement creates a matrix with dimensions $(\textrm{d0},\textrm{d1})$ where each element is a random value:

```np.random.rand(d0, d1)```



In [None]:
# Use np.random.rand(..., ...) to initialize w.

def initialize_weights_and_bias(n):

  # START CODING HERE
  w = None
  b = None
  # END CODING HERE

  return w, b

### Solution:

In [None]:
def initialize_weights_and_bias_sol(n):

  w = np.random.rand(n, 1)
  b = 0

  return w, b

# The sigmoid function

Recall that the sigmoid or logistic function is defined as

$$a = \frac{1}{1 + e^{-z}}$$

#### **Introduction to library function:** [`np.exp(x)`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html)

*Link above goes to official Numpy documentation.*

`np.exp(x)` simply returns $e^x$. You will get an error if you try to write `e ** x`, so use the library funciton.

\

We will use `sigmoid` to calculate $a$ for every example. You can simply pass in the vector $z$ as if it is a scalar value; Numpy uses *broadcasting* to simplify writing code with vectors and matrices.

Essentially, *broadcasting* applies operations on all elements in a vector or matrix. For example, if you have a matrix `x` set to

$$\begin{bmatrix} 1 & 1 \\ 2 & 5 \end{bmatrix}$$

and you set `y = x + 1`, `y` will be set to

$$\begin{bmatrix} 2 & 2 \\ 3 & 6 \end{bmatrix}$$

Importantly, we can also do this with compatible vectors and matrices. So if we have `a` set to

$$\begin{bmatrix} 1 & 9 & 8 \\ 4 & 0 & 0 \end{bmatrix}$$

and `b` set to

$$\begin{bmatrix} 3 & 5 & 1 \end{bmatrix}$$

then set `c = a + b`, `c` will be set to

$$\begin{bmatrix} 4 & 14 & 9 \\ 7 & 5 & 1 \end{bmatrix}$$

If you want more clarification, you can check out either [this Andrew Ng video](https://www.youtube.com/watch?v=tKcLaGdvabM) or the [official Numpy documentation](https://numpy.org/devdocs/user/theory.broadcasting.html).

In [None]:
# Numpy broadcasting means you can write the sigmoid function in just one line

def sigmoid(z):

  # START CODING HERE
  a = None
  # END CODING HERE

  return a

### Solution:

In [None]:
def sigmoid_sol(z):

  a = 1 / (1 + np.exp(-1 * z))

  return a

# Forward propagation: calculating $Z$ and $A$

Recall that $z$ is calculated as

$$w^TX + b$$

To get the transpose of a matrix `a`, simply use `a.T`.

In Numpy, we use the `@` operator to perform matrix multiplication. (In ancient times, we used a function called [`np.dot(a, b)`](https://numpy.org/doc/stable/reference/generated/numpy.dot.html), but the `@` operator was introduced to make code more readable.)


In [None]:
# use the transpose method and the @ operator to calculate z
# Use a previously defined function to calculate a

def forward_prop(X, w, b):

  # START CODING HERE
  z = None
  a = None
  # END CODING HERE

  return z, a

### Solution:

In [None]:
def forward_prop_sol(X, w, b):

  z = w.T @ X + b
  a = sigmoid(z)

  return z, a

# Calculating the cost

Recall that the cost function $J$ is

$$\frac{1}{m} \sum_{i=0}^{m} -[y\log(a) + (1-y)\log(1-a)]$$

Numpy broadcasting allows us to quickly write code to compute the costs on individual instances. Try to figure out how on your own! (Hint: you don't need to separately take the logarithm for all $m$ outputs $y$.)

Once we compute the costs on individual instances, we can use a library function to sum the elements in the resulting vector:

#### **Introduction to library function:** [`numpy.sum(a)`](https://numpy.org/doc/stable/reference/generated/numpy.sum.html)

Return the sum of the elements a vector (or a matrix along an axis -- discussed in neural network lesson). If `a` is set to

$$\begin{bmatrix} 1 & 2 & 3 & 4 \end{bmatrix}$$

then `np.sum(a)` simply returns $10$.

\

We can divide the sum from `np.sum(...)` by $m$ to get our final cost.

In [None]:
# Use broadcasting and np.sum(...) to calculate cost
# Note that the value of m is already calculated from the dimensions of Y

def calculate_cost(y, a):

  m = y.shape[1]

  # START CODING HERE
  cost = None
  # END CODING HERE

  return cost

### Solution:

In [None]:
def calculate_cost_sol(y, a):

  m = y.shape[1]

  cost = np.sum(-(y * np.log(a) + (1-y) * np.log(1-a)) ) / m

  return cost

# Back propagation: calculating derivatives and updating weights and bias

Recall that since all the derivatives we take are with respect to $J$, we just use the denominators as our variable names (i.e. for today `dz`, `dw`, `db`).

Recall the following definitions:

$$dz = y - a$$

$$dw = \frac{1}{m} (Xdz^T)$$

$$db = \frac{1}{m} \sum_{i=0}^{m} dz^{(i)}$$

Finish the back propagation function by updating $w$ and $b$. Don't forget to multiply by the learning rate $\alpha$.

In [None]:
def back_prop(y, a, z, w, b, learning_rate=0.0001):

  m = y.shape[1]

  # START CODING HERE
  # calculating derivatives
  dz = None
  dw = None
  db = None

  # updating w and b
  w = None
  b = None
  # END CODING HERE

### Solution:

In [None]:
def back_prop_sol(y, a, z, w, b, learning_rate=0.0001):

  m = y.shape[1]

  # calculating derivatives
  dz = y - a
  dw = X @ dz.T / m
  db = np.sum(dz) / m

  # updating w and b
  w = w - learning_rate * dw
  b = b - learning_rate * db

# Implementing the entire model

Use the functions you have wrote previously to:
- initialize $w$ and $b$,
- run forward propagation to get values of $z$ and $a$,
- calculate the cost $J$ based on current values of $w$ and $b$, and
- run back propagation to update $w$ and $b$ using gradient descent.

This will run in a loop several times so that $w$ and $b$ are set to the correct values to fit the input data $X$.

Congratulations! You have successfully implemented logistic regression.

In [None]:
def run(X, y, learning_rate=0.0001, num_iters=10000):

  m = X.shape[1]
  n = X.shape[0]

  for i in range(num_iters):
    # START CODE HERE
    w, b = None
    z, a = None
    cost = None
    None # what function do you still need to run?
    # END CODE HERE

### Solution:

In [None]:
def run_sol(X, Y, learning_rate=0.0001, num_iters=10000):

  m = X.shape[1]
  n = X.shape[0]

  for i in range(num_iters):
    w, b = initialize_weights_and_bias(n)
    z, a = forward_prop(X, w, b)
    cost = calculate_cost(y, a)
    back_prop(y, a, z, w, b, learning_rate)