<h2 style="color:green" align="center"> Machine Learning With Python: Linear Regression Multiple Variables</h2>

### Predicting home price

Below is the table containing home prices. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

<img src="https://github.com/rahulvansh66/Machine-Learning-Course-Work/blob/main/2_linear_reg_multivariate/homeprices.jpg?raw=1" style='height:200px;width:350px'>

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

We will use regression with multiple variables here. Price can be calculated using following equation,

<img src="https://github.com/rahulvansh66/Machine-Learning-Course-Work/blob/main/2_linear_reg_multivariate/equation.jpg?raw=1" >

Here area, bedrooms, age are called independant variables or **features** whereas price is a dependant variable

In [None]:
import pandas as pd
import numpy as np
from sklearn import linear_model
import math

In [None]:
from google.colab import drive
drive.mount('/content/drive')
%cd 'drive/MyDrive/Colab-Notebooks/ML_Practice/Machine-Learning-Course-Work/2_linear_reg_multivariate'

**Data Preprocessing: Fill NA values with median value of a column**

In [None]:
df = pd.read_csv("homeprices.csv")
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [None]:
median_bedrooms = math.floor(df.bedrooms.median()) #df.bedrooms returns pandas series
median_bedrooms

4

In [None]:
df.bedrooms = df.bedrooms.fillna(median_bedrooms)
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [None]:
reg = linear_model.LinearRegression()
# x are independent var, features
# y parameter that we want to predict ie. price, based on dependent var x
reg.fit(df[['area', 'bedrooms', 'age']].values , df.price) 

LinearRegression()

In [None]:
m = reg.coef_ #slope
b = reg.intercept_ # y intercept
print('slope of each feature : ',m, '\n y intercept : ',b)

slope of each feature :  [  112.06244194 23388.88007794 -3231.71790863] 
 y intercept :  221323.00186540396


**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

In [None]:
area1 = 3000
bedrooms1 = 3
age1 = 40

print('price of such home will be: ', reg.predict([[area1, bedrooms1, age1]]), 'Rs')

price of such home will be:  [498408.25158031] Rs


In [None]:
#let's check manually 
print('price of home with 3000 sqr ft area, 3 bedrooms, 40 year old will be :  ')
print(m[0]*area1 + m[1]*bedrooms1 + m[2]*age1  + b)
#or  m * features.transpose

price of home with 3000 sqr ft area, 3 bedrooms, 40 year old will be :  
498408.2515803067


In [None]:
112.06244194*3000 + 23388.88007794*3 + -3231.71790863*40 + 221323.00186540384
#observe all 3 ways gives same price prediction

498408.25157402386

**Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old**

In [None]:
reg.predict([[2500, 4, 5]])

array([578876.03748933])

### Exercise : Make a model that recommend salary based on experience, test core and interview score

In exercise folder (same level as this notebook on github) there is **hiring.csv**. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for **HR department** that can **help them decide salaries** for future candidates. Using this predict salaries for following candidates,


**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**


In [None]:
df = pd.read_csv('Exercise/hiring.csv')
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [None]:
#!pip install word2number
from word2number import w2n

df.experience = df.experience.fillna('zero')

for i in range(len(df.experience)):
  df.experience.iloc[i] = w2n.word_to_num(df.experience.iloc[i])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


In [None]:
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(df['test_score(out of 10)'].mean())

In [None]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.857143,7,72000
7,11,7.0,8,80000


In [None]:
reg = linear_model.LinearRegression()
reg.fit(df.values[:, :-1], df.values[:, -1])

LinearRegression()

In [67]:
# predict salary for these 2 candidates based on experience, test core and interview score respectively
new_candidates = [[2,9,6], [12,10,10]]
reg.predict([new_candidates[0]])

array([53290.89255945])

In [68]:
reg.predict([new_candidates[1]])

array([92268.07227784])