# Linear Regression with multiple features

Your neighbor is a real estate agent and wants some help predicting housing prices for a particular region. Given the test input, predict the house price using "Gradient Descent" and "Normal equation" and check the results

Dataset - ex1data2.txt file
The data contains 3 columns - Housing area, Number of bedrooms and Price of house

Method 1 - Normal Equation
Given X(features), y(target)
theta(coeff for predicting price of house) = inverse(X'X)X'y

Method 2 - Gradient descent
For a fixed learning rate (alpha), find the values of theta for which the cost function is minimum.

In [1]:
import pandas as pd
import numpy as np

### Loading the dataset

In [3]:
data = pd.read_csv('ex1data2.txt',sep = ",",header=None, names = ['House Area','No of Bedrooms','Price'])

In [4]:
data.head(5)

Unnamed: 0,House Area,No of Bedrooms,Price
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   House Area      47 non-null     int64
 1   No of Bedrooms  47 non-null     int64
 2   Price           47 non-null     int64
dtypes: int64(3)
memory usage: 1.2 KB


In [6]:
data.describe()

Unnamed: 0,House Area,No of Bedrooms,Price
count,47.0,47.0,47.0
mean,2000.680851,3.170213,340412.659574
std,794.702354,0.760982,125039.899586
min,852.0,1.0,169900.0
25%,1432.0,3.0,249900.0
50%,1888.0,3.0,299900.0
75%,2269.0,4.0,384450.0
max,4478.0,5.0,699900.0


### Data Separation

Now separate the features (X) and target (y) from dataframe.  
Add a new column (a column of ones) to the beginning of X

In [7]:
X = data[['House Area','No of Bedrooms']]
y = data['Price']

In [8]:
new_col = np.ones((X.shape[0],1))
X.insert(loc = 0, column = 'weights', value = new_col)

### Normal equation:

The function to find theta using normal equation method is defined in the file "normalEquation.py)  
Import the file and calculate the values for the set(X,y)

In [9]:
## Lets compute the value of theta from normal equation
import normalEquation as ne

In [10]:
theta = ne.normaleq(X,y)

In [11]:
theta

array([89597.90954361,   139.21067402, -8738.01911255])

### Predicting the house price

Consider an area of 1650 sq m and assume to construct 3 bedroom house.  
Can you predict the house price ??

In [12]:
# now lets estimate house price for a area of 1650 and 3 bedroom
X_test = [1,1650,3]
predicted_val = np.array(X_test).dot(theta)

In [13]:
predicted_val

293081.46433498873

The estimated house price is around 293k $  
Let's check if the predicted price is same if the method used is gradient descent

### Gradient Descent:

This method requires the features to be normalized.

In [14]:
# Step 1: Normalize the features
X_new = data[['House Area','No of Bedrooms']]

In [15]:
mean = X_new.mean()
sigma = np.std(X_new)
X_norm = (X_new - mean) / sigma # all features lie between -3 to 3 or -1/3 to 1/3

Add the intercept column(all ones) to the beginning of normalised X

In [16]:
# Step 2: Add intercept to X_norm
X_norm.insert(loc = 0, column = 'weights', value = new_col)

The function is implemented in the file "gradientDescent.py". Import the file to compute the value of theta for which the cost function is minimum.  
The learning rate (alpha) is fixed and assume the iterations to be around 400.  
As the iteration progresses, cost function decreases and gradually settles at minima.

In [17]:
# Step 3: Gradient descent
import gradientDescent as gd

In [18]:
# choose value of alpha and number of iterations. Initialize theta
alpha, num_iters, theta_g = 0.01, 400, np.zeros(X_norm.shape[1])

In [19]:
theta_g, J = gd.descent(X_norm, y, theta_g, alpha, num_iters)

Iteration: 0 and cost: 65591548106.45744
Iteration: 1 and cost: 64297776251.62011
Iteration: 2 and cost: 63031018305.52132
Iteration: 3 and cost: 61790694237.53249
Iteration: 4 and cost: 60576236901.991035
Iteration: 5 and cost: 59387091739.9886
Iteration: 6 and cost: 58222716488.38939
Iteration: 7 and cost: 57082580895.8954
Iteration: 8 and cost: 55966166445.97885
Iteration: 9 and cost: 54872966086.50778
Iteration: 10 and cost: 53802483965.89506
Iteration: 11 and cost: 52754235175.605446
Iteration: 12 and cost: 51727745498.85994
Iteration: 13 and cost: 50722551165.380974
Iteration: 14 and cost: 49738198612.02588
Iteration: 15 and cost: 48774244249.16025
Iteration: 16 and cost: 47830254232.6268
Iteration: 17 and cost: 46905804241.168976
Iteration: 18 and cost: 46000479259.1725
Iteration: 19 and cost: 45113873364.59137
Iteration: 20 and cost: 44245589521.92844
Iteration: 21 and cost: 43395239380.14428
Iteration: 22 and cost: 42562443075.37121
Iteration: 23 and cost: 41746829038.31239
It

Iteration: 227 and cost: 2850998386.1370893
Iteration: 228 and cost: 2837112194.3809876
Iteration: 229 and cost: 2823483933.246866
Iteration: 230 and cost: 2810108606.6541967
Iteration: 231 and cost: 2796981317.51381
Iteration: 232 and cost: 2784097265.7379103
Iteration: 233 and cost: 2771451746.29061
Iteration: 234 and cost: 2759040147.2781234
Iteration: 235 and cost: 2746857948.0778456
Iteration: 236 and cost: 2734900717.505465
Iteration: 237 and cost: 2723164112.0193577
Iteration: 238 and cost: 2711643873.9614835
Iteration: 239 and cost: 2700335829.834021
Iteration: 240 and cost: 2689235888.6110334
Iteration: 241 and cost: 2678340040.0844073
Iteration: 242 and cost: 2667644353.24338
Iteration: 243 and cost: 2657144974.686974
Iteration: 244 and cost: 2646838127.0686264
Iteration: 245 and cost: 2636720107.572394
Iteration: 246 and cost: 2626787286.420038
Iteration: 247 and cost: 2617036105.408408
Iteration: 248 and cost: 2607463076.476443
Iteration: 249 and cost: 2598064780.3012342
It

In [20]:
theta_g

weights           334302.063993
House Area         99411.449474
No of Bedrooms      3267.012854
dtype: float64

### Predicting the house price

In [21]:
# Step 5: lets estimate house price for a area of 1650 and 3 bedroom
area_norm = (1650 - mean[0])/sigma[0]
bed_norm = (3 - mean[1]) / sigma[1]
test_norm = [1, area_norm, bed_norm]

In [22]:
predicted_price = np.array(test_norm).dot(theta_g)
predicted_price

289221.5473712181

The estimated house price is around 289k $

### Comparison between Gradient descent and Normal equation

In [23]:
print('theta using Normal equation: ', theta)
print('theta using Gradient descent: \n', theta_g)

theta using Normal equation:  [89597.90954361   139.21067402 -8738.01911255]
theta using Gradient descent: 
 weights           334302.063993
House Area         99411.449474
No of Bedrooms      3267.012854
dtype: float64


In [24]:
print('House price estimation using Normal equation: ', predicted_val)
print('House price estimation using Gradient descent: ', predicted_price)

House price estimation using Normal equation:  293081.46433498873
House price estimation using Gradient descent:  289221.5473712181
