# Midterm Project: Regularized Logistic Regression with Real Dataset

An extension of Logistic Regression with a real dataset

## Goals
    1. Upload and transform real valued dataset
    2. Scale the dataset using z-score normalization
    3. Implement regularization (extending compute_cose and compute_gradient functions)
    4. Plot the learning curve (cost vs iterations)

### Packages
First, we must import the required packages

In [1]:
import copy, math
import numpy as np
import matplotlib.pyplot as plt
import time

### The Dataset
This dataset is a dataset of diagnostic breast cancer data which includes 30 real values input features. These features encompass many medical attributes about each patient's tumor. Specifically, the 30 real valued features are computed from 10 attributes about each cell in the tumor. For each of the 10 attributes, the mean, standard error, and mean of the three largest values are recorded, resulting in 30 input features for our regression model. In addition to these features, each data point has an id number and a result, whether the tumor was malignant or benign.

Here we load this dataset:
  - `X_train` contains the 30 real valued input features
  - `y_train` is the diagnostic decision
      - `y_train = 1` if the patient's tumor was malignant 
      - `y_train = 0` if the patient's tumor was benign
  - Both `X_train` and `y_train` are numpy arrays.

In [27]:
def load_data(filename):
    data = []
    #with open(filename) as fin:
    #    data = [line.strip('\n').split(',') for line in fin.readlines()]
    data = np.loadtxt(filename, dtype=str, delimiter=',')

    X = data[:,2:].astype(np.float)
    y = data[:,1]

    return X, y

In [29]:
# load dataset

X_train, y_train = load_data("./data/wdbc.data")

### Notation
Here is a summary of some of the notation you will encounter, updated for multiple features.  

|General <img width=70/> <br />  Notation  <img width=70/> | Description<img width=350/>| Python (if applicable) |
|: ------------|: ------------------------------------------------------------||
| $a$ | scalar, non bold                                                      ||
| $\mathbf{a}$ | vector, bold                                                 ||
| $\mathbf{A}$ | matrix, bold capital                                         ||
| **Regression** |         |    |     |
|  $\mathbf{X}$ | training example maxtrix                  | `X_train` |   
|  $\mathbf{y}$  | training example  targets                | `y_train` 
|  $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|
| m | number of training examples | `m`|
| n | number of features in each example | `n`|
|  $\mathbf{w}$  |  parameter: weight,                       | `w`    |
|  $b$           |  parameter: bias                                           | `b`    |     
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | The result of the model evaluation at $\mathbf{x^{(i)}}$ parameterized by $\mathbf{w},b$: $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)}+b$  | `f_wb` | 


### Normalization
Here we will use the z score normalization technique to normalize the dataset.

After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.

To implement z-score normalization, adjust your input values as shown in this formula:
$$x^{(i)}_j = \dfrac{x^{(i)}_j - \mu_j}{\sigma_j} \tag{4}$$ 
where $j$ selects a feature or a column in the $\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\sigma_j$ is the standard deviation of feature (j).
$$
\begin{align}
\mu_j &= \frac{1}{m} \sum_{i=0}^{m-1} x^{(i)}_j \tag{5}\\
\sigma^2_j &= \frac{1}{m} \sum_{i=0}^{m-1} (x^{(i)}_j - \mu_j)^2  \tag{6}
\end{align}
$$


In [30]:
def zscore_normalize_features(X):
    """
    computes  X, zcore normalized by column
    
    Args:
      X (ndarray (m,n))     : input data, m examples, n features
      
    Returns:
      X_norm (ndarray (m,n)): input normalized by column
      mu (ndarray (n,))     : mean of each feature
      sigma (ndarray (n,))  : standard deviation of each feature
    """
    # find the mean of each column/feature
    mu     = np.mean(X, axis=0)                 # mu will have shape (n,)
    # find the standard deviation of each column/feature
    sigma  = np.std(X, axis=0)                  # sigma will have shape (n,)
    # element-wise, subtract mu for that column from each example, divide by std for that column
    X_norm = (X - mu) / sigma      

    return X_norm
 

In [33]:
# normalize the original features
X_norm = zscore_normalize_features(X_train)

### Regularization
Here we will extend the compute cost and compute gradient functions to utilize regularization techniques to avoid overfitting.

Cost function and regression function for Regularized Logistic Regression:
$$
\begin{aligned}
&J(\vec{w}, b)=-\frac{1}{m} \sum_{i=1}^m\left[y^{(i)} \log \left(f_{\vec{w}, b}\left(\vec{x}^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-f_{\vec{w}, b}\left(\vec{x}^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^n w_j^2\\
&\text { repeat }\{\\
&w_j=w_j-\alpha\left[\frac{1}{m} \sum_{i=1}^m\left[\left(f_{\vec{w}, b}\left(\vec{x}^{(i)}\right)-y^{(i)}\right) x_j^i\right]+\frac{\lambda}{m} w_j\right]\\
&b=b-\alpha\left[\frac{1}{m} \sum_{i=1}^m\left(f_{\vec{w}, b}\left(\vec{x}^{(i)}\right)-y^{(i)}\right)\right]\\
&\}\\
&\text { Where } f_{\vec{w}, b}(\vec{x})=\frac{1}{1+e^{-(\vec{w} \cdot \vec{x}+b)}}
\end{aligned}
$$

In [35]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Args:
        z (ndarray): A scalar, numpy array of any size.

    Returns:
        g (ndarray): sigmoid(z), with the same shape as z
         
    """
    g = (1/(1+np.exp(-z)))
    return g

def compute_cost(X, y, w, b, lambda_= 1):
    """
    Computes the cost over all examples
    Args:
      X : (ndarray Shape (m,n)) data, m examples by n features
      y : (array_like Shape (m,)) target value 
      w : (array_like Shape (n,)) Values of parameters of the model      
      b : scalar Values of bias parameter of the model
      lambda_: unused placeholder
    Returns:
      total_cost: (scalar)         cost 
    """

    m, n = X.shape

    total_cost = ((-1/m)*sum_losses(X, y, w, b, m))+((lambda_/(2*m))*sum_of_squared_features(w, n))
    
    
    return total_cost

def sum_of_squared_features(w, n):
    return sum([w[j]**2 for j in range(n)])

def sum_losses(X, y, w, b, m):
    return sum([loss(sigmoid(np.dot(w, X[i])+b), y[i]) for i in range(m)])


def loss(fwbx, y):
    return (y*np.log(fwbx)) + (1-y)*np.log(1-fwbx)