# Multiple Linear Regression In-class Exercise

Before starting to the lab, the following exercise, we will do a very simple example of fitting a model on synthetic data.  In doing this exercise, you will learn to:

* Evaluate methods using synthetic data
* Describe nonlinear basis functions for a model
* Define the transformations to the basis functions with a `transform` method
* Fit the linear model using the transformed features

We begin by loading the packages we will need.

In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from sklearn import linear_model

## Create Synthetic Data

Synthetic data means data we create artificially for the purpose of testing as opposed to actual data.  In this case, we will simulate that we have data that has some *true* relationship,

    y = f(t) +  w
    
for some function `f(t)` and noise `w`.  We will *pretend* that the estimator does not know the true function `f(t)` and must learn this from data.  We first generate some random samples as follows:

In [2]:
nsamp = 200
t = np.random.uniform(0,5,nsamp)
y = 2*t*np.exp(-1.3*t) + 1 + np.random.normal(0,0.02,nsamp)

Create a scatter plot of `y` vs `t`.  

In [3]:
# TODO

## Transforming the data
You should see that the relation `t` and `y`. is not linear.  So we will try to fit a model of the form:

    y \approx yhat = b + sum_j w[j]*cos(pi*(j+1)*t/tmax),  tmax = 5
    
This is sometimes called a discrete cosine transform (DCT).  The model is *nonlinear* in `t` but linear in the parameters 

To perform the transform, it is useful to create a transform function as below that maps the vector `t` to a matrix `X` with columns `X[:,j] = cos(pi*(j+1)*t/tmax)`.   Complete this function

In [4]:
def transform(t):
    """
    Creates a matrix with the transformed cosine terms:
    
    X[:,j] = cos(pi*(j+1)*t/tmax)
    """
    p = 10
    tmax = 5
    nt = len(t)
    
    # TODO: 
    #  X = ...
        
    return X

## Fitting the Data using the Transformed Model

We are now ready to fit the model using the transform.  First split `t` and `y` into training and test, `ttr,ytr` and `tts,yts`.  Use the first 100 samples for test and the last 100 samples for test. 

In [5]:
ntr = 100

# TODO
# ttr = ...
# tts = ...
# ytr = ...
# yts = ...

Using the `transform()` method above, create the transformed data `Xtr,Xts` from `ttr,tts`.

In [6]:
# TODO
#  Xtr = ...
#  Xts= ...

Create a `LinearRegression` object, `regr` and fit it using the training data.

In [7]:
# TODO
#  regr = ...
#  regr.fit(...)

## Evaluate Predictions on Test Data

Use the `regr.predict()` method to find `yhat`, the predicted value of 
`y` on the test data values `tts`.  

In [8]:
# TODO
#   yhat = ...

One one plot, plot scatter plots of `yts vs tts` and `yhat vs tts`.  Create a legend for your graph and add grid lines.

In [9]:
# TODO

Plot the predicted values of `y`for values `t` in the interval `[0,5]`.  

In [10]:
# TODO