# Candidate Salary
Build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates:

* 2 yr experience, 9 test score, 6 interview score
* 12 yr experience, 10 test score, 10 interview score

#### Data Preprocessing

In [1]:
import pandas as pd
import numpy as np
from word2number import w2n

In [2]:
dataset = pd.read_csv("https://raw.githubusercontent.com/codebasics/py/master/ML/2_linear_reg_multivariate/Exercise/hiring.csv")
dataset

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [3]:
dataset.experience = dataset.experience.fillna("zero")
dataset

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,zero,8.0,9,50000
1,zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [4]:
dataset.experience = dataset.experience.apply(w2n.word_to_num)
dataset

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [5]:
import math
median_test_score = math.floor(dataset['test_score(out of 10)'].mean())
median_test_score

7

In [6]:
dataset['test_score(out of 10)'] = dataset['test_score(out of 10)'].fillna(median_test_score)
dataset

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.0,7,72000
7,11,7.0,8,80000


#### Model Building

In [11]:
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(dataset[['experience','test_score(out of 10)','interview_score(out of 10)']].values, dataset['salary($)'].values)

LinearRegression()

In [12]:
model.coef_, model.intercept_

(array([2922.26901502, 2221.30909959, 2147.48256637]), 14992.651446693148)

#### Prediction

In [13]:
# 2 yr experience, 9 test score, 6 interview score
model.predict([[2, 9, 6]])

array([53713.86677124])

In [14]:
# 12 yr experience, 10 test score, 10 interview score
model.predict([[12, 10, 10]])

array([93747.79628651])