## Linear Regression - Normal Equation

Earlier we did linear regression by ***Gradient Descent*** method, and  came up with the theta vector which ***minimizes the cost function***.  Now there is a more simpler way of getting the theta vector in just one step using the ***Normal Equation***.

   ***$\theta$ = $(X^{T} X)^{-1} X^{T} y$***

In [0]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

We wil work with the housing price data. 

In [2]:
data = pd.read_csv('house_price_data1.csv', header=None, names=['size','no. of rooms','price'])
print(data.head())
print(data.shape)

   size  no. of rooms   price
0  2104             3  399900
1  1600             3  329900
2  2400             3  369000
3  1416             2  232000
4  3000             4  539900
(47, 3)


Now let us write the function which will perform the matrix multiplications and provide us with the final matrix ***theta***.

In [0]:
def Normal_Equation_get_theta(x, y):
  m = x.shape[0]
  x1 = np.ones((m,1))
  x = np.hstack((x1,x))
  y = y.reshape(m,1)
  
  inverse = np.linalg.inv(np.dot(np.transpose(x),x))
  theta = np.dot( a = (np.dot(inverse,np.transpose(x))), b= y)
  return theta

In [19]:
x = data[['size','no. of rooms']].values
y = data['price'].values
theta = Normal_Equation_get_theta(x,y)
print(theta)

[[89597.9095428 ]
 [  139.21067402]
 [-8738.01911233]]


In [0]:
def predict_price(size, no_rooms):
  x1 = np.array([1,size,no_rooms]).reshape(3,1)
  price = np.dot(np.transpose(theta), x1)
  return price[0,0]

Let us test our model.

In [9]:
size = 1650
no_rooms = 3
print('For plot of size',size,'and no. of rooms',no_rooms,
      ', predicted price is',predict_price(size,no_rooms))

For plot of size 1650 and no. of rooms 3 , predicted price is 293081.4643348959


Now let us test the accuracy of  our model. Note that as before we can't use ***sklearn.metircs.accuracy_score( .. )*** function here, because it is meant only for classification problems while this one is regression. Henc we will use the r2_score function instead.

In [25]:
from sklearn.metrics import r2_score

y_pred = np.zeros(47)
for i in range(47):
  y_pred[i] = predict_price(x[i,0], x[i,1])

print(r2_score(y,y_pred))

0.7329450180289141
