<a href="https://colab.research.google.com/github/moizmaj1k/MachineLearning/blob/main/ML_03_LinearRegressionMultipleVariables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Exercise**

In exercise folder (same level as this notebook on github) there is hiring.csv. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,

* 2 yr experience, 9 test score, 6 interview score

* 12 yr experience, 10 test score, 10 interview score

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
%matplotlib inline
import warnings
from sklearn.exceptions import ConvergenceWarning
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=ConvergenceWarning)

In [7]:
df = pd.read_csv('ML_03_hiring.csv')
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


<h3>Data preprocessing</h3>

In [8]:
# first we will fix the 'experience' column and we want to change the text with numbers.

# Mapping dictionary
word_to_number = {'one': 1,
                  'two': 2,
                  'three': 3,
                  'four': 4,
                  'five': 5,
                  'six': 6,
                  'seven': 7,
                  'eight': 8,
                  'nine': 9,
                  'ten': 10,
                  'eleven': 11,
                  'twelve': 12}

# Using the map function
df['experience'] = df['experience'].map(word_to_number)
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,5.0,6.0,7,60000
3,2.0,10.0,10,65000
4,7.0,9.0,6,70000
5,3.0,7.0,10,62000
6,10.0,,7,72000
7,11.0,7.0,8,80000


In [None]:
df['test_score(out of 10)']

In [10]:
# NaN values can be fixed by taking the median of all the values in the column
# We have to do this for 'experience' and 'test_score' column

import math

median_experience = math.floor(df['experience'].median())
median_test_score = math.floor(df['test_score(out of 10)'].median())

print('Median experience: ', median_experience)
print('Median test_score: ', median_test_score)

df['experience'] = df['experience'].fillna(median_experience)
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(median_test_score)

Median experience:  6
Median test_score:  8


In [12]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,6.0,8.0,9,50000
1,6.0,8.0,6,45000
2,5.0,6.0,7,60000
3,2.0,10.0,10,65000
4,7.0,9.0,6,70000
5,3.0,7.0,10,62000
6,10.0,8.0,7,72000
7,11.0,7.0,8,80000


<h3>Applying Linear Regression</h3>

In [13]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(df[['experience', 'test_score(out of 10)', 'interview_score(out of 10)']], df['salary($)'])

# Finding out the value for 'm' and 'b' in the equation y=mx+b
print('The value of slope(m): ', lr.coef_)
print('The value of y intercept(b): ', lr.intercept_)

The value of slope(m):  [2813.00813008 1333.33333333 2926.82926829]
The value of y intercept(b):  11869.91869918695


In [14]:
# prediciting values

print('Prediction for input - 2 yr experience, 9 test score, 6 interview score:', lr.predict([[2.0, 9.0, 6]]))
print('Prediction for input - 12 yr experience, 10 test score, 10 interview score:', lr.predict([[12.0, 10.0, 10]]))

Prediction for input - 2 yr experience, 9 test score, 6 interview score: [47056.91056911]
Prediction for input - 12 yr experience, 10 test score, 10 interview score: [88227.64227642]


