**Multi Linear Regression**

**Problem:**

In exercise folder (same level as this notebook on github) there is hiring.csv. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,

* 2 yr experience, 9 test score, 6 interview score

* 12 yr experience, 10 test score, 10 interview score

**Formula:**

  y = b₀ + b₁x₁ + b₂x₂ + b₃x₃ + ... + bₙxₙ


Where:

* y = Predicted salary (the target variable you want to predict)

* x1,x2,x3,...xn = Features (independent variables): In your case, these are experience, test score, and interview score.

* b0
  = Intercept: This is the point where the regression line crosses the y-axis (when all the independent variables are 0).

* b1,b2,b3,..bn= Coefficients (weights): These represent how much each feature affects the outcome. For example,
b1​ represents how much the salary increases (or decreases) for each unit increase in experience.

**For your case:**

   Salary=b0 +b1 ×(Experience)+b2 ×(Test Score)+b3 ×(Interview Score)


**Example Formula:**

If your model learned the following coefficients after training:
Salary= 20000 + 5000 × (Experience) + 2000 × (Test Score) + 1500 ×  (Interview Score)

**For a candidate with:**

* 2 years of experience
* 9 test score
* 6 interview score
The salary prediction will be:
Salary= 20000 + 5000 × 2 + 2000 × 9 + 1500 × 6 = 20000 + 10000 + 18000 + 9000 = 57000
So, the predicted salary for this candidate will be 57,000.

In [1]:
# importing library
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

In [26]:
# upload the dataset
hire = pd.read_csv('/content/hiring.csv')
hire.head()

Unnamed: 0,experience,test_score,interview_score,salary
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000


Data Preprocessing

In [27]:
# show the null values
hire.isnull().sum()

Unnamed: 0,0
experience,0
test_score,1
interview_score,0
salary,0


In [30]:
# fide the median
hire_median= hire['test_score'].median()
hire_median

8.0

In [31]:
hire.test_score = hire.test_score.fillna(hire_median)
hire

Unnamed: 0,experience,test_score,interview_score,salary
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,8.0,7,72000
7,11,7.0,8,80000


Split the dataset

In [33]:
x = hire[['experience','test_score','interview_score']]
y = hire['salary']

In [34]:
reg = LinearRegression()
reg.fit(x,y)

In [36]:
# predict the value
reg.predict([[2,9,6]])



array([53205.96797671])

In [38]:
reg.coef_

array([2812.95487627, 1845.70596798, 2205.24017467])

In [39]:
reg.intercept_

17737.263464337688

In [40]:
2812.95487627*2+1845.70596798*9+2205.24017467*6+17737.263464337688

53205.96797671769

In [41]:
# predict the value
reg.predict([[12,10,10]])



array([92002.18340611])