### Linear Regression by OLS
Finding $\beta$ and calculating F-value(about model), t-value(about parameter such as $\beta_{1}$) by numpy

In [258]:
import numpy as np
import pandas as pd
import time

In [275]:
#Create random variable X,y
N=10000
np.random.seed(2020)
X = np.random.normal(10,10,[N,3])
X = np.c_[np.ones(N),X] # first column is ones
#print(X)
e = np.random.normal(0,10,N)
y = 5*X[:,0] - 3*X[:,1] + 1*X[:,2] + 0.1*X[:,3]+ e

$y = X\beta + e$, minimizing sum of $e^2$ is equal to minimizing $(y^T-\beta ^T X^T)(y-X\beta)$. So $$\hat\beta = (XX^T)^{-1}Xy$$ by differential

In [276]:
#OLS algorithm
start = time.time()
beta = np.linalg.inv(X.T@X) @ X.T @ y # @ means matrix multiplication
print(time.time()-start) # take time

0.0024149417877197266


In [277]:
print(beta) #beta 0 to 3

[ 5.07198234 -3.01765558  1.00544021  0.10879204]


In [278]:
#yhat = X[:,0]*beta[0] + X[:,1]*beta[1] + X[:,2]*beta[2] + X[:,3]*beta[3]
yhat = X @ beta

[What is SST,SSE,SSR?](https://stats.stackexchange.com/questions/207841/why-is-sst-sse-ssr-one-variable-linear-regression)

In [279]:
SSR = sum((yhat-np.mean(y))**2)
SSE = sum((y-yhat)**2)
SST = sum((y-np.mean(y))**2)

In [280]:
#R-square
SSR/SST

0.9088976139226401

### About F,t-value
[Linear_regression_pdf](http://mezeylab.cb.bscb.cornell.edu/labmembers/documents/supplement%205%20-%20multiple%20regression.pdf)

In [282]:
#F-value
k=3 # the number of parameters is 3
(SSR/k)/(SSE/(N-k-1))

33242.23414981111

In [283]:
sigma2 = ((y-yhat).T @ (y-yhat))/N

In [284]:
for i in range(4):
    print(beta[i]/((sigma2*np.linalg.inv((X.T @ X))[i,i])**(1/2)))

25.078447737687416
-300.3384898887673
99.2527044619302
10.784586102534847


### OLS using python library

In [285]:
import statsmodels.api as sm

In [286]:
lm = sm.OLS(y,X)

In [287]:
start = time.time()
lm = lm.fit()
print(time.time()-start)

0.00474095344543457


In [288]:
lm.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.909
Model:,OLS,Adj. R-squared:,0.909
Method:,Least Squares,F-statistic:,33240.0
Date:,"Mon, 24 Aug 2020",Prob (F-statistic):,0.0
Time:,17:49:18,Log-Likelihood:,-37220.0
No. Observations:,10000,AIC:,74450.0
Df Residuals:,9996,BIC:,74480.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.0720,0.202,25.073,0.000,4.675,5.469
x1,-3.0177,0.010,-300.278,0.000,-3.037,-2.998
x2,1.0054,0.010,99.233,0.000,0.986,1.025
x3,0.1088,0.010,10.782,0.000,0.089,0.129

0,1,2,3
Omnibus:,0.094,Durbin-Watson:,1.98
Prob(Omnibus):,0.954,Jarque-Bera (JB):,0.106
Skew:,-0.006,Prob(JB):,0.948
Kurtosis:,2.991,Cond. No.,40.5


Total spending time of two method is similar. And R-squared, t-value,..etc is almost same.(They are not absolutely same because of calculation error)