[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nhartman94/PALMS2024/blob/main/ML-intro.ipynb)

# Intro to Machine Learning: Linear Models 

Nicole Hartman, PALMS2024

**Data detective:**  We're going to give you a "mystery dataset" and your challenge is to find the 
$\theta = [\theta_0,\theta_1,\theta_2,\theta_3]$ 
if we fit linear model to this dataset where the "basis" dimensions are a $3^{rd}$ degree polynomial.


In [None]:
import numpy as np
import pandas as pd # just for file loading
import matplotlib.pyplot as plt

## Step 0: Load in the data

In [None]:
# Load the data
df = pd.read_hdf('data.h5')

In [None]:
# Look @ the data
df

In [None]:
# Put this data in numpy arrays
x = df['x'].values
y = df['y'].values

In [None]:
# Plot the data
plt.scatter(x,y,marker='x',color='crimson',label='data')

plt.xlabel('$x$',fontsize=20)
plt.ylabel('$y$',fontsize=20)
plt.legend(fontsize=16)

## Step 1:  Code up a linear model for a $3^{rd}$ degree polynomial.

**Recall:** A linear model is a model that can be written in the form

$$h_\theta(\vec{x}) = \theta^T \vec{x},$$

where $\vec{x} = \left[x_0, x_1, \ldots, x_d\right]$.

In this case, we'll take the dimensions of $x$ to be a third order polynomial (d=3):
$\vec{x} = \left[1, x, x^2, x^3\right]$

In [None]:
# Initialize theta to some random values
# theta = (theta0, theta_1, theta_2, theta_3)

theta = np.random.randn(4,)

In [None]:
def get_predicion(x,theta):
    '''
    Evaluate a linear model that is a 3rd order polynomial

    fx = theta[0] + theta[1] * x + theta[2] * x^2 + theta[3] * x^3 

    Input:
    - x: shape n
    - theta: 4

    Output:
    - fx: shape n
    '''
    
    assert theta.shape[0] == 4 # sanity check for 3rd order polynomial
    
    '''
    YOUR CODE HERE
    '''
    fx = 

    return fx

## Step 2a) Calculate the loss

In [None]:
def get_loss(y_pred, y_true):
    '''
    Calculate the Mean Squared Error (MSE) loss over N examples

    Inputs:
    - y_pred: (n,) array
    - y_true: (n,) array

    Outputs:
    - loss: scalar
    '''

    assert len(y_pred) == len(y_true) # sanity check the inputs make sense

    '''
    YOUR CODE HERE
    '''

    return loss

In [None]:
y_pred = get_predicion(x, theta) 
loss = get_loss(y_pred,y)

In [None]:
y_pred

In [None]:
# data
plt.scatter(x,y,marker='x',color='crimson',label='data')

# Show the "initial guess" for the prediction
xx = np.linspace(-3,3)
f_theta = get_predicion(xx,theta)
plt.plot(xx,f_theta, label='pred' )

# These lines are the h_theta(x) for the data samples too
# y_pred = get_predicion(x,theta)
# plt.scatter(x,y_pred,marker='o',color='navy',label=r'$h_\theta(x)$')

ax = plt.gca()
plt.text(.5,.9,f'loss = {loss:2.2f}',fontsize=16,
         ha='center',va='top',transform=ax.transAxes)

plt.xlabel('$x$',fontsize=20)
plt.ylabel('$y$',fontsize=20)

plt.legend(fontsize=16)

## Step 2b) Gradient of the loss with respect to $\theta$

**Recall:** In the exercise we dervied:

$$ \frac{\partial \mathcal{L}}{\partial \theta} = \frac{2}{n} \sum_{i = 1}^{n} (\theta^Tx^{(i)} - y^{(i)}) x^{(i)}_j$$

In python it's really important for speed to write computations in a vectorized way as much as possible (i.e, without for loops).

We noted in the lecture that we could get the full vector for gradient by noting

$$X =  \begin{pmatrix}
- x^{(1)} - \\ \vdots \\ - x^{(n)} - 
\end{pmatrix} \in \mathbb{R}^{n \times d},$$

where for the $3^{rd}$ order polynomial, $d=3$.

Then $\theta \in \mathbb{R}^d$, $\implies X\theta \in \mathbb{R}^n$ and $y \in \mathbb{R}^n$, so we can write more succinctly as:

$$\nabla_\theta \mathcal{L} = \frac{2}{n} X^T (X \theta - y).$$


In [None]:
def get_grad(x,y,theta):
    '''
    Code up the gradient of the loss with respect to theta.

    Input:
    - x: array (n,)
    - y: array (n,)
    - theta: (d,)

    Output:
    - dtheta: (d,)
    '''

    assert x.shape[0] == y.shape[0]
        
    '''
    YOUR CODE HERE

    Hint: Construct matrix X: (n,d) from examples `x`
    '''


    
    return dtheta


## Step 3: Set up a training loop to infer the parameters

Recall: To update the parameters of the loss by stochastic gradient descent, we simply update the parameters by

$$\theta \leftarrow \theta - \alpha \nabla_\theta\mathcal{L}$$

and then iterate over these update steps.

Since you've coded up all of the above functions, we'll give you the training loop code (see below):


In [None]:
alpha=.01

In [None]:
losses = []

for i in range(11):

    # Calculate y_pred
    y_pred = get_predicion(x,theta)
    
    # Get the loss
    loss = get_loss(y_pred,y)

    # Calculate the gradient
    dtheta = get_grad(x,y,theta)
    
    # Update the parameters
    theta -= alpha * dtheta
    
    # Save the values
    losses.append(loss)

In [None]:
plt.plot(losses)
plt.xlabel('iterations',fontsize=16)
plt.ylabel('Loss',fontsize=16)

In [None]:
losses[-10:]

In [None]:
# Overlay the "initial guess" of the data prediction
y_pred = get_predicion(X,theta)

plt.scatter(x,y,marker='x',color='crimson',label='data')

# plt.scatter(x,y_pred,marker='o',color='navy',label=r'$h_\theta(x)$')

ax = plt.gca()
plt.text(.5,.9,f'loss = {loss:2.2f}',fontsize=16,
         ha='center',va='top',transform=ax.transAxes)

plt.xlabel('$x$',fontsize=20)
plt.ylabel('$y$',fontsize=20)

# Get the continuous version of the prediction
f_theta = get_predicion(xx,theta)
plt.plot(xx,f_theta, label='pred' )

plt.legend(fontsize=16)

Are you happy with your model fit?

(Note you might need to play with the learning rate and the number of iterations until you get convergence.)


In [None]:
theta

## Step 4:; Sharing the discovery

When you found a model you're happy with, got to :

[PollEv.com/nicolehartman968](PollEv.com/nicolehartman968)

and enter the $\theta_0, \theta_1,\theta_2,\theta_3$ values that you get.

(Hint, they should be integers, so round the answers you submit 😄)