Machine Learning With Python: Linear Regression Multiple Variables
Sample problem of predicting home price in monroe, new jersey (USA)
Below is the table containing home prices in monroe twp, NJ. Here price depends on area (square feet), bed rooms and age of the home (in years). Given these prices we have to predict prices of new homes based on area, bed rooms and age.
![](homeprices.jpg)
Given these home prices find out price of a home that has,

3000 sqr ft area, 3 bedrooms, 40 year old

2500 sqr ft area, 4 bedrooms, 5 year old

We will use regression with multiple variables here. Price can be calculated using following equation,

Here area, bedrooms, age are called independant variables or features whereas price is a dependant variable
![](equation.jpg)

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model
df = pd.read_csv('homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


Data Preprocessing: Fill NA values with median value of a column

In [2]:
df.bedrooms.median()

4.0

In [3]:
df.bedrooms = df.bedrooms.fillna(df.bedrooms.median())
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [6]:
reg = linear_model.LinearRegression()
reg.fit(df.drop('price', axis='columns'), df.price)

In [7]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

In [8]:
reg.intercept_

221323.00186540384

Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old

In [11]:
reg.predict([[3000, 3, 40]])



array([498408.25158031])

In [12]:
112.06244194*3000 + 23388.88007794*3 + -3231.71790863*40 + 221323.00186540384


498408.25157402386

Find price of home with 2500 sqr ft area, 4 bedrooms, 5 year old

In [14]:
reg.predict([[2500, 4, 5]])



array([578876.03748933])

In [15]:
hire_df=pd.read_csv('hiring.csv')
hire_df.head()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


In [34]:
import math
median=hire_df['test_score(out of 10)'].median()
median= math.floor(median)

In [35]:
hire_df['test_score(out of 10)']=hire_df['test_score(out of 10)'].fillna(median)
hire_df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,8.0,7,72000
7,11,7.0,8,80000


In [36]:
from word2number import w2n
hire_df.experience=hire_df.experience.fillna('zero')
hire_df


Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,8.0,7,72000
7,11,7.0,8,80000


In [33]:

hire_df.experience=hire_df.experience.apply(w2n.word_to_num)
hire_df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,8.0,7,72000
7,11,7.0,8,80000


In [39]:
exe_reg=linear_model.LinearRegression()
exe_reg.fit(hire_df[['experience','test_score(out of 10)','interview_score(out of 10)']], hire_df['salary($)'])

In [40]:
exe_reg.predict([[2,9,6]])



array([53205.96797671])

In [41]:
reg.predict([[12,10,10]])



array([424239.37286177])