#### Author : Sanjoy Biswas
#### Topic : Linear Regression Tutorial With Project Solving
#### Email : sanjoy.eee32@gmail.com

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

#### Import Libraries

In [2]:
import numpy as np
import pandas as pd
from sklearn import linear_model
from word2number import w2n
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

#### Import Dataset

In [4]:
df = pd.read_csv('F:\ML Algorithms By Me\Linear Regression\hiring.csv')
df.head()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


#### Preprocessing Datasets

In [5]:
df.experience = df.experience.fillna('Zero')

In [6]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,Zero,8.0,9,50000
1,Zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [7]:
### Apply word_to_num
df.experience = df.experience.apply(w2n.word_to_num)

In [8]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [9]:
import math
median_test_score = math.floor(df['test_score(out of 10)'].mean())
median_test_score

7

In [10]:
dff = df['test_score(out of 10)'].mean()
dff

7.857142857142857

In [11]:
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(dff)

In [12]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.857143,7,72000
7,11,7.0,8,80000


In [13]:
### Show Columns Name
df.columns

Index(['experience', 'test_score(out of 10)', 'interview_score(out of 10)',
       'salary($)'],
      dtype='object')

#### Features Selection

In [14]:
predictors = ['experience', 'test_score(out of 10)', 'interview_score(out of 10)']
x = df[predictors]
y = df['salary($)']

#### Split Train and test datasets

In [15]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [16]:
x_train.shape,x_test.shape

((6, 3), (2, 3))

In [17]:
y_train.shape,y_test.shape

((6,), (2,))

#### Apply Linear Regression

In [18]:
reg = LinearRegression()

In [19]:
model = reg.fit(x_train,y_train)

In [None]:
model.predict([[5,6,7]])

array([58355.74491311])

#### Accuracy Score

In [20]:
model.score(x_train,y_train)

0.945171325224806

In [21]:
model.score(x_test,y_test)

0.9287916364000984