## **MATHEMATICAL INTUTION FOR THE RIDGE REGRESSION OF MULTI DIMENSIONAL DATA**

Suppose we have (n+1) columns and m rows. 

The loss function for the m data is :
$$
L=\sum_{i=1}^n (Y_{i}-\hat{Y_{i}})^2\\
\text{In the matrix from}\\
=(XW-Y)^{T}(XW-Y)\\
\text{where X,W and Y are as follows:}\\

Y=\begin{bmatrix}
    Y_1\\
    Y_2\\
    Y_3\\
     . \\
      . \\
     Y_{n}\end{bmatrix}\\
W=\begin{bmatrix}
W_0\\
W_1\\
W_2\\
.\\
.\\
W_{n}
\end{bmatrix}
X=\begin{bmatrix}
1 & X_{11} & X_{12} & ... & X_{1n}\\
1 & X_{21} & X_{22} & ... & X_{2n}\\
1 & X_{31} & X_{32} & ... & X_{3n}\\
. & . & . &. & .\\
. & . & . &. & .\\
1 & X_{m1} & X_{m2} & ... & X_{mn}
\end{bmatrix}
$$


$$
L=(XW-Y)^{T}(XW-Y)-\lambda ||W||^2\\
L=(XW-Y)^{T}(XW-Y)-\lambda W^{T}W\\
L=[(XW)^{T}-Y^{T}](XW-Y)+\lambda W^{T}W\\
L=(W^{T}X^{T}-Y^{T})(XW-Y)+\lambda W^{T}W\\
L=(W^{T}X^{T}XW-W^{T}X^{T}Y-Y^{T}XW+Y^{T}Y)+\lambda W^{T}W\\
L=(W^{T}X^{T}XW-2W^{T}X^{T}Y+Y^{T}Y)+\lambda W^{T}W\\
$$
$\text{Now differentiating,}\\$
$$
\dfrac{dL}{dW}=2X^{T}XW-2X^{T}Y+0+2\lambda W=0\\
X^{T}XW+\lambda W=X^{T}Y\\
(X^{T}X+\lambda I)W=X^{T}Y\\
W=(X^{T}X+\lambda I)^{-1}.X^{T}Y\\
$$

### **Code demonstration**

In [1]:
from sklearn.datasets import load_diabetes
from sklearn.metrics import r2_score
import numpy as np

In [2]:
X,y=load_diabetes(return_X_y=True)

In [3]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.3)

In [4]:
from sklearn.linear_model import Ridge
reg=Ridge(alpha=0.1,solver='cholesky')#cholsky is the solver algorithm efficient for the small sized data
reg.fit(X_train,y_train)#fitting the datas into the model

In [5]:
print(reg.coef_)
print(reg.intercept_)

[  39.6635292  -213.84688049  505.91429153  341.71447431 -108.80630119
  -70.57580966 -211.90657957  160.19354049  332.77354206   77.68045166]
151.0525340484514


In [6]:
y_pred=reg.predict(X_test)#predicating the values
print('R2_score:',r2_score(y_test,y_pred))

R2_score: 0.48031838456367226


In [48]:
class OwnRidge:
    def __init__(self,alpha):
        self.alpha=alpha
        self.coef=None
        self.intercept=None
    def fit(self,X_train,y_train):
        #print(X_train.shape)
        '''
        Insert 1 at the 0th position of the each rows as the column 
        which is represented by axis =1
        '''
        X=np.insert(X_train,0,1,axis=1)
        #print(X.shape)
        I=np.identity(X_train.shape[1]+1)
        result=np.linalg.inv(X.T@X+self.alpha*I)@X.T@y_train.reshape(-1,1)
        self.intercept=result[0]#the w0 component is the intercept 
        self.coef=result[1:].reshape(10)#all the components below w0 are coefficient as in the above matrices
        print(self.coef,self.intercept)
    def predict(self,X_test):
        return X_test@self.coef+self.intercept

In [49]:
reg2=OwnRidge(alpha=0.1)
reg2.fit(X_train,y_train)


[  39.66828821 -213.8419192   505.957649    341.70560555 -108.792473
  -70.5884286  -211.88959636  160.1973241   332.77576891   77.7254541 ] [151.00339677]


In [50]:
y_pred2=reg2.predict(X_test)

In [None]:
print("r2_score:",r2_score(y_test,y_pred2))

r2_score: 0.4802561896539246


In [None]:
'''
We are getting nearly the same results in both the case. Which means our mathematical approach is correct.'''