# Python Practice Lecture 10 MATH 342W Queens College - Multivariate Linear Regression with the Hat Matrix
## Author: Amir ElTabakh
## Date: March 3, 2021

## Agenda:
* Multivariate Linear Regression with the Hat Matrix

## Multivariate linear regression with the Hat Matrix

First let's do the null model to examine what the null hat matrix looks like. In this exercise, we will see that $g_0 = \bar{y}$ is really the OLS solution in the case of no features, only an intercept i.e. $b_0 = \bar{y}$.

We'll load in the Boston Housing data.

In [1]:
# Lines below are just to ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Importing dependencies
from sklearn import datasets
import pandas as pd
import numpy as np

# Load the Boston Housing dataset as bh
bh = datasets.load_boston()

# Initialize target variable
y = bh.target
y

# Create Boston Housing df
df = pd.DataFrame(data = bh.data, columns = bh.feature_names)

# Create MEDV column in df
df['MEDV'] = y

# Load the first 5 rows of df
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


Let's build a linear model of just the intercept column.

In [2]:
# Importing dependencies
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score # RMSE, R^2

# Intitialize model
model_intercept = LinearRegression()

# define 1-vector
ones = ([[1] for i in range(len(df))])

# convert to a numpy array
ones = np.asarray(ones, dtype=np.float64)

# Fit y on the 1-vector
model_intercept.fit(ones, y)

# get yhat
yhat = model_intercept.predict(ones)

# print first 20 predictions
yhat[0:20]

array([22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632])

In [3]:
# print the mean of y
print(np.mean(y))

22.532806324110677


Let's do a simple example of projection. Let's project $y$ onto the intercept column, the column of all 1's. What do you think will happen?

In [4]:
H = ones @ ones.transpose() / sum(ones**2)
H[1:5, 1:5]

array([[0.00197628, 0.00197628, 0.00197628, 0.00197628],
       [0.00197628, 0.00197628, 0.00197628, 0.00197628],
       [0.00197628, 0.00197628, 0.00197628, 0.00197628],
       [0.00197628, 0.00197628, 0.00197628, 0.00197628]])

In [5]:
# output shape of the Hat matrix
H.shape

(506, 506)

In [6]:
# Output unique values of Hat matrix
np.unique(H)

array([0.00197628])

In [7]:
# In fact
print(1 / 506)

0.001976284584980237


The whole matrix is just one single value for each element! What is this value? It's $\frac{1}{506}$ where 506 is $n$. So what's going to happen?

In [8]:
# Getting y projected on ones
y_proj_one = H @ y
y_proj_one[0:20]

array([22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632,
       22.53280632, 22.53280632, 22.53280632, 22.53280632, 22.53280632])

Projection onto the space of all ones makes the null model ($g = \bar{y}$). It's the same as the model of response = intercept + error, i.e. $y = \mu_y + \epsilon$. The OLS intercept estimate is clearly $\bar{y}$. Let's build a multivariate linear regression model with the `LinearRegression` sklearn module.

In [9]:
# Let's get our b vec
y = df['MEDV']
X = df
del X['MEDV']

# intitialize model
model = LinearRegression(fit_intercept = True)

# fit
model.fit(X, y)

# print intercept b0
print(model.intercept_)

# print coefficients
print(model.coef_)

36.45948838508992
[-1.08011358e-01  4.64204584e-02  2.05586264e-02  2.68673382e+00
 -1.77666112e+01  3.80986521e+00  6.92224640e-04 -1.47556685e+00
  3.06049479e-01 -1.23345939e-02 -9.52747232e-01  9.31168327e-03
 -5.24758378e-01]


Now we'll do the same using our linear algebra.

In [10]:
# Let's get our b vec

# add intercept column
X.insert(0, 'INTERCEPT', [1 for i in range(len(X))])

# linear algebra
Xt = X.transpose()
XtXinv = np.linalg.inv(Xt @ X)
b = XtXinv @ Xt @ y
b

0     36.459488
1     -0.108011
2      0.046420
3      0.020559
4      2.686734
5    -17.766611
6      3.809865
7      0.000692
8     -1.475567
9      0.306049
10    -0.012335
11    -0.952747
12     0.009312
13    -0.524758
dtype: float64

We calculated the same intercept and coefficient values. Let's use the Hat matrix to calculate all predictions.

In [11]:
Xt.shape

(14, 506)

In [12]:
# get Hat matrix

# The @ returns an error
#H = X @ XtXinv @ Xt

# The `.dot()` method works fine
H = X.dot(XtXinv.dot(Xt))

print(H.shape)

(506, 506)


In [13]:
# Calculate your predictions
yhat = H @ y
yhat[0:10]

0    30.003843
1    25.025562
2    30.567597
3    28.607036
4    27.943524
5    25.256284
6    23.001808
7    19.535988
8    11.523637
9    18.920262
dtype: float64

Can you tell this is projected onto a 13 dimensionsal space from a 506 dimensional space? Not really... but it is...

Now let's project over and over...

In [14]:
(H @ H @ H @ H @ H @H @ H @ H @ H @ y)[0:10]

0    30.003843
1    25.025562
2    30.567597
3    28.607036
4    27.943524
5    25.256284
6    23.001808
7    19.535988
8    11.523637
9    18.920262
dtype: float64

Same thing! Once you project, you're there, you can't project to another different space. That's the idempotency of $H$.

Let's make sure that it really does represent the column space of $X$. Let's try to project different columns of $X$:

In [15]:
# H @ intercept
print(X.iloc[0:5, 0])
print((H @ X.iloc[:, 0])[0:5])

0    1
1    1
2    1
3    1
4    1
Name: INTERCEPT, dtype: int64
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64


In [16]:
# H @ CRIM
print(X.iloc[0:5, 1])
print((H @ X.iloc[:, 1])[0:5])

0    0.00632
1    0.02731
2    0.02729
3    0.03237
4    0.06905
Name: CRIM, dtype: float64
0    0.00632
1    0.02731
2    0.02729
3    0.03237
4    0.06905
dtype: float64


In [17]:
# H @ ZN
print(X.iloc[0:5, 2])
print((H @ X.iloc[:, 2])[0:5])

0    18.0
1     0.0
2     0.0
3     0.0
4     0.0
Name: ZN, dtype: float64
0    1.800000e+01
1   -2.615685e-13
2   -3.570477e-13
3   -2.192690e-13
4   -2.493561e-13
dtype: float64


In [18]:
# H @ INDUS
print(X.iloc[0:5, 3])
print((H @ X.iloc[:, 3])[0:5])

0    2.31
1    7.07
2    7.07
3    2.18
4    2.18
Name: INDUS, dtype: float64
0    2.31
1    7.07
2    7.07
3    2.18
4    2.18
dtype: float64


In [19]:
# H @ CHAS
print(X.iloc[0:5, 4])
print((H @ X.iloc[:, 4])[0:5])

0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
Name: CHAS, dtype: float64
0   -9.315465e-16
1   -1.655360e-15
2   -1.929013e-15
3   -7.771561e-16
4   -8.777701e-16
dtype: float64


We can calculate the residual error using the Hat matrix as well.

In [20]:
e = y - yhat
e[0:10]

0   -6.003843
1   -3.425562
2    4.132403
3    4.792964
4    8.256476
5    3.443716
6   -0.101808
7    7.564012
8    4.976363
9   -0.020262
dtype: float64

In [21]:
I = np.identity(len(X))
e = (I - H) @ y
e[0:10]

0   -6.003843
1   -3.425562
2    4.132403
3    4.792964
4    8.256476
5    3.443716
6   -0.101808
7    7.564012
8    4.976363
9   -0.020262
dtype: float64

Let's do that projection over and over onto the complement of the column space of $X$:

In [22]:
((I - H) @  (I - H) @ (I - H) @ (I - H) @ (I - H) @ (I - H) @ y)[0:10]

0   -6.003843
1   -3.425562
2    4.132403
3    4.792964
4    8.256476
5    3.443716
6   -0.101808
7    7.564012
8    4.976363
9   -0.020262
dtype: float64