<a href="https://colab.research.google.com/github/nurfnick/Mathematical_Musings/blob/main/GradDescent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gradient Descent for Training Linear Regression

In multiple linear regression, the goal is to minimize the sum of the squared errors.

$$ RSS = \sum_{i=1}^n \left(y_i-\hat y_i\right)^2 = (\vec y - X \vec \beta)^T(\vec y - X\vec \beta) $$

This first equation is in the traditional format (where $\hat y$ is the prediction $\vec x \cdot \vec \beta$) and the second in a matrix format.

We know the solution to this problem analytically, $\hat \beta = \left(X^TX\right)^{-1}X^T\vec y$.  What I am interested in is if I can code to find an approximation of this solution by following the gradient descent.

The gradient descent is an iterative process where by we initialize a solution (for $\beta$) and update it by using $\beta - w \nabla RSS$ where $w$ is some user chosen weight and 
$$
\nabla RSS = -2X^T\left(\vec y - X\vec \beta\right)
$$

Okay enough with the theory, let's get our hands dirty with some coding!  I'll start with some data and show the solution for $\hat \beta$.

In [1]:
import numpy as np
import pandas as pa
from sklearn.linear_model import LinearRegression

df = pa.read_csv('https://raw.githubusercontent.com/nurfnick/Applied_Stats_Jupyter_Notebooks/master/blues.csv')

In [4]:
df.head()

Unnamed: 0,Rk,Player,From,To,Yrs,Lg,GP,G,A,PTS,+/-,PIM,EV,PP,SH,GW,EV.1,PP.1,SH.1,S,S%,TOI,ATOI
0,1,Bruce Affleck\afflebr01,1975,1979,5,NHL,274,14,65,79,-81,86,10,4,0,2,48,14,3,363,3.9,,
1,2,Kenny Agostino\agostke01,2017,2017,1,NHL,7,1,2,3,0,2,1,0,0,0,1,1,0,17,5.9,89.0,12:47
2,3,Glenn Anderson*\andergl01,1995,1996,2,NHL,51,14,16,30,-2,43,12,2,0,3,13,3,0,89,15.7,,
3,4,Perry Anderson\anderpe01,1982,1985,4,NHL,144,22,18,40,-14,355,22,0,0,2,18,0,0,168,13.1,,
4,5,Ron Anderson\anderro01,1970,1970,1,NHL,59,9,9,18,9,36,8,1,0,0,8,1,0,107,8.4,,


This dataset is (most?) of the players for my favorite hockey team.  Let's try to predict **GP** from **G**, **A**, **PTS**, and **+/-**.  If you aren't familiar with these you can gain some content knowledge about hockey [here](https://en.wikipedia.org/wiki/Ice_hockey)

In [73]:
X= np.array(df[['G', 'A', '+/-']])
y = np.array(df['GP'])


In [74]:
X = np.insert(X,3,1,axis = 1)#Adding 1's to get the intercept too!

By the above equation...

In [75]:
XX = np.dot(X.transpose(),X)
z = np.linalg.inv(XX)
w = z@X.transpose()
np.dot(w,y)

array([ 0.32673324,  1.79968341,  0.21442554, 50.11143093])

Let's double check with a built-in call.

In [76]:
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()#define it as a regular linear regression
linreg.fit(X,y)#This fits it!
linreg.coef_

array([0.32673324, 1.79968341, 0.21442554, 0.        ])

The final 0 is just because it casts the intercet differently then we have, it is displayed next.

In [77]:
linreg.intercept_

50.11143092932841

So we know the answer, great, let's try to estimate it!

In [115]:
b = np.random.normal(0,5,4)#intitialize
w = 0.01 #a weight for the descent
def grad(b):
  return -2*(X.transpose()@(y-X@b))

for i in range(10):
  b = b - w*grad(b)
  print(b)

[222312.94897922 296427.788518    21184.9772416    2315.69969114]
[-1.75548436e+10 -2.63866655e+10 -4.88873770e+08 -1.76298991e+08]
[1.49862903e+15 2.26271927e+15 3.64155313e+13 1.51526300e+13]
[-1.28317963e+20 -1.93772107e+20 -3.09457384e+18 -1.29775237e+18]
[1.09881749e+25 1.65932478e+25 2.64886480e+23 1.11130668e+23]
[-9.40946702e+29 -1.42092416e+30 -2.26823996e+28 -9.51641818e+27]
[8.05757789e+34 1.21677530e+35 1.94235172e+33 8.14916309e+32]
[-6.89991911e+39 -1.04195718e+40 -1.66328750e+38 -6.97834596e+37]
[5.90858498e+44 8.92255758e+44 1.42431749e+43 5.97574399e+42]
[-5.05967909e+49 -7.64062431e+49 -1.21968110e+48 -5.11718915e+47]


In [91]:
b

array([-2.71347038e+49, -4.09761318e+49, -6.54106413e+47, -2.74431262e+47])

In [113]:
b = 20
w = 0.4
for i in range(10):
  b = b - w*2*b
  print(b)

4.0
0.7999999999999998
0.15999999999999992
0.03199999999999997
0.0063999999999999925
0.0012799999999999982
0.0002559999999999995
5.119999999999989e-05
1.0239999999999976e-05
2.0479999999999946e-06


In [108]:
b


3.0719999999999915e-07