## OLS Regression Example Using Matrix Operations

$$ \begin{vmatrix}
State & Per Capita Income & Percent Gov. Employees\\ 
--&--&--\\
Alabama & 24028 & 19.2\\ 
Florida & 30446 & 14.5\\ 
Georgia & 29442 & 16.4\\ 
Mississippi & 23448 & 21.8\\ 
North Carolina & 29235 & 17.3\\ 
South Carolina & 26132 & 18.2\\ 
Tennessee & 28455 & 15.5
\end{vmatrix} $$

In [69]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

np.set_printoptions(suppress=True)

## Create X

$$X=\begin{bmatrix}
1 & 24028\\ 
1 & 30446\\ 
1 & 29442\\ 
1 & 23448\\ 
1 & 28235\\ 
1 & 26132\\ 
1 & 28445
\end{bmatrix}$$

In [71]:
X = np.matrix([[1, 24028], 
               [1, 30446],
               [1, 29442],
               [1, 23448],
               [1, 28235],
               [1, 26132],
               [1, 28445]])

## Create Y

$$ \boldsymbol{y} = \begin{bmatrix}
19.2\\ 
14.5\\ 
16.4\\ 
21.8\\ 
17.8\\ 
18.2\\ 
15.5
\end{bmatrix}$$

In [72]:
y = np.matrix([[19.2], 
               [14.5],
               [16.4],
               [21.8],
               [17.3],
               [18.2],
               [15.5]])

## Create $X^{T}X$

$$X^{T}X = \begin{bmatrix}
7 & 190356\\ 
190356 & 5218840922
\end{bmatrix}$$

In [73]:
X.T * X

matrix([[         7,     190176],
        [    190176, 5210158442]])

## Create $X^{T}\boldsymbol{y}$

$$X^{T}\boldsymbol{y}=\begin{bmatrix}
1 & 24028\\ 
1 & 30446\\ 
1 & 29442\\ 
1 & 23448\\ 
1 & 28235\\ 
1 & 26132\\ 
1 & 28445
\end{bmatrix}\begin{bmatrix}
19.2\\ 
14.5\\ 
16.4\\ 
21.8\\ 
17.3\\ 
18.2\\
15.5 
\end{bmatrix} = \begin{bmatrix}
122.9\\ 
3301785.2
\end{bmatrix} $$

In [74]:
(X.T * y)

matrix([[     122.9],
        [ 3301785.2]])

## Create $(X^{T}X)^{-1}$ 

$$(X^{T}X)^{-1} = \begin{bmatrix}
17.603 & -0.001\\ 
-0.001 & 0.000
\end{bmatrix}$$

In [75]:
np.linalg.inv(X.T * X)

matrix([[ 17.12751702,  -0.00062517],
        [ -0.00062517,   0.00000002]])

## Create $(X^{T}X)^{-1}X^{T}\boldsymbol{y}$

$$\boldsymbol{\hat{\beta} } = (X^{T}X)^{-1}X^{T}\boldsymbol{y}$$

$$ \boldsymbol{\hat{\beta} } = \begin{bmatrix}
\beta \\ 
\alpha 
\end{bmatrix} = \begin{bmatrix}
17.603 & -0.001\\ 
 -0.001 & 0.000
\end{bmatrix} = \begin{bmatrix}
122.9\\ 
3301785.2
\end{bmatrix}=\begin{bmatrix}
40.790\\ 
-0.001
\end{bmatrix} $$

In [76]:
betas = np.linalg.inv(X.T * X) * (X.T * y)
betas

matrix([[ 40.78976691],
        [ -0.00085515]])

In [154]:
X1 = [24028,30446,29442,23448,28235,26132,28445] 
Y1 = [19.2,14.5,16.4,21.8,17.3,18.2,15.5]

data = pd.DataFrame()

data['X1'] = X1
data['Y1'] = Y1

est = smf.ols('Y1 ~ X1', data).fit()
est.params

Intercept    40.789767
X1           -0.000855
dtype: float64