In [1]:
import numpy as np
import pandas as pd

In [2]:
readdata = pd.read_csv("../dataset/real_estate.csv")
data = readdata.to_numpy()

labels = data[:,-1]
data = data[:,1:-1]

### Linear Regression
Normal equation approach to Linear Regression derives &Theta; such that the sum of squared errors between the predicted values *Y_pred* and the true values *Y_true* is minimised i.e.  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;argmin<sub>**&Theta;**</sub> (*X_test* **.** **&Theta;** - *Y_test*)<sup>2</sup>  

Taking the derivative w.r.t **&Theta;**:  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**X**<sup>T</sup> **.** **X** **.** **&Theta;** = **X**<sup>T</sup> **.** **Y**  

Solving for **&Theta;**:  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**&Theta;** = (**X**<sup>T</sup> **.** **X**)<sup>-1</sup> **.** **X**<sup>T</sup> **.** **Y**

In [3]:
def linear_regression(X,Y):
    theta = np.linalg.inv(X.T.dot(X)) @ (X.T.dot(Y))
    return theta

### R<sup>2</sup> score
R<sup>2</sup> = 1 - RSS/TSS  

RSS = **&Sigma;**<sub>i=1</sub><sup>n</sup> (*y<sub>i</sub> - b<sub>0</sub> - b<sub>1</sub>x<sub>i</sub>*)<sup>2</sup>  

TSS = **&Sigma;**<sub>i=1</sub><sup>n</sup> (*y<sub>i</sub> - y*)<sup>2</sup> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where *y* represents mean of true values

In [4]:
def r2(true,pred):
    rss = np.sum((true - pred)**2)
    tss = np.sum((true - np.mean(true))**2)
    return (1-rss/tss)

### RMSE
RMSE = &radic;(**&Sigma;**<sub>i=1</sub><sup>n</sup> (*y<sub>i</sub><sup>Actual</sup> - y<sub>i</sub><sup>Predicted</sup>*)<sup>2</sup> / n)

In [5]:
def rmse(true,pred):
    return np.sqrt(np.mean((true-pred)**2))

In [6]:
# train-test split
split = 0.8
X = data[:int(split*len(data)),:]
X_test = data[int(split*len(data)):,:]
Y = labels[:int(split*len(data))]
Y_test = labels[int(split*len(data)):]

theta = linear_regression(X,Y)

pred = X_test.dot(theta)

r2_score = r2(Y_test,pred)
print("R2 score: ",r2_score)

rmse_score = rmse(Y_test,pred)
print("RMSE: ",rmse_score)

R2 score:  0.6056999254826609
RMSE:  7.745523524526421
