# Multivariate Linear Regression

> **Multiple linear regression (MLR) is a statistical technique that models the relationship between a dependent variable and two or more independent variables**

> y[dependent variable] = m1*x1 + m2*x2 + ------- + mn*xn [independent variable] + b[ y intercept]

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
%matplotlib inline

In [21]:
df = pd.read_csv("ML Practice Files/Linear Regression Multivariate/homeprices.csv")
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [24]:
df.bedrooms = df.bedrooms.fillna(df.bedrooms.median())
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [27]:
reg = linear_model.LinearRegression()
reg.fit(df[['area','bedrooms','age']],df.price)

In [29]:
reg.predict(pd.DataFrame([[3000,3,40]],columns=['area','bedrooms','age']))

array([498408.25158031])

#### Internally behind the scenes

In [30]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

In [31]:
reg.intercept_

np.float64(221323.0018654043)

In [32]:
# price = m1*area + m2*bedrooms + m3*age + intercept
112.06244194*3000 + 23388.88007794*3 + -3231.71790863*40 + 221323.0018654043

498408.2515740243

### Exercise
> In exercise folder (same level as this notebook on github) there is hiring.csv. This file contains hiring statistics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,

> 2 yr experience, 9 test score, 6 interview score

> 12 yr experience, 10 test score, 10 interview score

In [71]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from word2number import w2n

# Experience column fix
data = pd.read_csv("ML Practice Files/Linear Regression Multivariate/hiring.csv")
data.experience = data.experience.fillna("zero")
data.experience = data.experience.apply(w2n.word_to_num)
print(data)

# Test score column fix
missing_test_score = data["test_score(out of 10)"].median()
data["test_score(out of 10)"] = data["test_score(out of 10)"].fillna(missing_test_score)
print(data)

# ML Model
model = linear_model.LinearRegression()
model.fit(data[['experience','test_score(out of 10)','interview_score(out of 10)']],data['salary($)'])
model.predict( pd.DataFrame(([2,9,6],[12,10,10]),columns=['experience','test_score(out of 10)','interview_score(out of 10)']) )

   experience  test_score(out of 10)  interview_score(out of 10)  salary($)
0           0                    8.0                           9      50000
1           0                    8.0                           6      45000
2           5                    6.0                           7      60000
3           2                   10.0                          10      65000
4           7                    9.0                           6      70000
5           3                    7.0                          10      62000
6          10                    NaN                           7      72000
7          11                    7.0                           8      80000
   experience  test_score(out of 10)  interview_score(out of 10)  salary($)
0           0                    8.0                           9      50000
1           0                    8.0                           6      45000
2           5                    6.0                           7      60000
3           

array([53205.96797671, 92002.18340611])