# Estimating $\beta$ weights: OLS vs Ridge
This week we will discuss the difference between Ordinary Least Squares (OLS) regression and Ridge regression as ways to estimate model parameters ($\beta$ weights)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from stat_utils import column_corr
%matplotlib inline

First, we will create some fake data, estimate responses to it, and predict responses in a new data set.

For now, for simplicity, we will not worry about the HRF, and we will consider a more general design that does not involve a design matrix of ones and zeros.

## Exercise 01: Make some fake data!
* Create a design matrix with 100 different channels in it, and with 290 time points (TRs, if that's clearer.)
* Generate 50 random weights for each column for each of 10 voxels
* Generate data timecourses (Y variables) for all 10 voxels (each with a different set of $\beta$ weights)
* Split the design matrix and data into training and validation sets by taking 200 time points for training and 90 time points for validation

In [None]:
# Answer

# Parameters
n_wts = 50
n_tps_trn = 200
n_tps_val = 90
n_vox = 10 
noise_magnitude = 10

X = np.random.randn(n_tps_trn + n_tps_val, n_wts)
B = np.random.randn(n_wts, n_vox)
# Some noise! Otherwise it's no fun.
E = np.random.randn(n_tps_trn + n_tps_val, n_vox) * noise_magnitude
Y = X.dot(B) + E
Xtrn = X[:n_tps_trn, :]
Xval = X[n_tps_trn:, :]
Ytrn = Y[:n_tps_trn, :]
Yval = Y[n_tps_trn:, :]

## Exercise 02: Put the OLS function into python and estimate your weights!

The normal equation for OLS is: 

## $\beta = (X^TX)^{-1}X^TY$

Define a function: 

```python
def ols(X, Y): 
    B = ....
    return B
```
to do OLS estimation of weights for you!

Hint: to do matrix inversion, use `np.linalg.inv()`

In [None]:
# Answer



In [None]:
# Answer
def ols(X, Y):
    B = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(Y))
    return B

See how well OLS estimates $\beta$ weights!

In [None]:
# Plot estimated beta weights against true beta weights
# We will use this several times, so let's define a function:
def plot_beta_comparison(B, Be):
    fig, axs = plt.subplots(5, 2, figsize=(8,6))
    # Use only training data to estimate weights
    B_est = ols(Xtrn, Ytrn)
    for b, be, ax in zip(B.T, Be.T, axs.flatten()):
        ax.plot(b)
        ax.plot(be, 'r.')
    plt.tight_layout()

In [None]:
plot_beta_comparison(B, B_est)

# Make predictions

Compute predictions by multiplying the design matrix for the validation data (`Xval`) by the estimated weights!

In [None]:
Y_pred = Xval.dot(B_est)
Y_pred.shape
r = column_corr(Yval, Y_pred)
plt.plot(r, 'o')
plt.ylim([0, 1])
# Not bad!

# How can we mess this up? 
A simple way is to add noise! Amp up the noise above and see what happens to the estimates of the $\beta$ weights and to the predictions.

[Go do it!]

Another way to mess up the estimation of regressors is to add correlations between regressors. This is a particular problem if the correlation between your regressors is different in the training data and in the validation data (i.e., if your training data is not representative of the real world). Let's simulate this situation by creating an `Xtrn` matrix that has correlated columns (while `Xval` does not. 

## Exercise 03: Make a design matrix with correlated columns!

Call it Xc_trn (for "X correlated")

use `np.corrcoef` to compute the correlations between columns to see if you've succeeded!

In [None]:
# Answer
n_wts = 50
ntot = n_tps_trn + n_tps_val
X0 = np.random.randn(n_tps_trn, 1)
corr_disruption = 0.3
n_correlated = 10
Xc_trn = np.hstack([np.random.randn(n_tps_trn, 1) * corr_disruption + X0 for q in range(n_correlated)])
Xc_trn = np.hstack([Xc_trn, np.random.randn(n_tps_trn, n_wts-n_correlated)])

# Demonstrate what structure looks like
plt.imshow(Xc_trn, aspect='auto')

In [None]:
# Test whether you have succeeded 
plt.imshow(np.corrcoef(Xc_trn.T), vmin=-1, vmax=1, cmap='RdBu_r')
plt.colorbar();
# The upper left corner of this plot should be red, indicating correlated columns of Xc_trn!

In [None]:
# Now, re-generate your training and validation data from this correlated Xc

Yc_trn = Xc_trn.dot(B) + E[:n_tps_trn, :] #  Use same E as above

Xc_val = np.random.randn(n_tps_val, n_wts)
Yc_val = Xc_val.dot(B) + E[n_tps_trn:, :] #  Use same E as above

In [None]:
B_est_c = ols(Xc_trn, Yc_trn)

In [None]:
plot_beta_comparison(B, B_est_c)

In [None]:
# Make predictions
Y_pred_c = Xc_val.dot(B_est_c)
rc = column_corr(Yc_val, Y_pred_c)
plt.plot(r, 'ro', label='Un-correlated columns')
plt.plot(rc, 'bo', label='Correlated columns')
plt.ylim([0, 1])
plt.legend()
# Messing up a little more...

## Exercise 05: Implement ridge regression 

The normal equation for ridge regression is: 
    
## $\beta = (X^TX + \lambda I)^{-1}X^TY$

In [None]:
# Answer



In [None]:
Xc_trn.shape[1]

In [None]:
Xc_trn.T.dot(Xc_trn).shape

In [None]:
# Answer
def ridge(X, Y, lam=100):
    nt = X.shape[1]
    B = np.linalg.inv(X.T.dot(X) + lam * np.eye(nt)).dot(X.T.dot(Y))
    return B

In [None]:
B_est_c_ridge = ridge(Xc_trn, Yc_trn, lam=1000)

In [None]:
# The weights are smaller!
plot_beta_comparison(B, B_est_c_ridge)

In [None]:
# Make predictions
Y_pred_c_ridge = Xc_val.dot(B_est_c_ridge)
rc_ridge = column_corr(Yc_val, Y_pred_c_ridge)
plt.plot(r, 'ro', label='Un-correlated columns')
plt.plot(rc, 'bo', label='Correlated columns')
plt.plot(rc_ridge, 'g*', label='Correlated columns, ridge')
plt.ylim([0, 1])
plt.legend()
# Messing up a little more...

In *MOST* Cases, this improves your prediction accuracy. 

## Exercise: How would you go about choosing a lambda parameter? 


Try different ones! See which works best! 

```python
# Answer
```


Write out the answer to this exercise in pseudo-code!



# If time: More demos

In [None]:
# ...