## Multiple Linear Regression 

- is an extension of simple linear regression as it takes more then one predictor variable to predict the response variable. 
- models the linear relationship between a single dependent continuous variable and more than one independent variable. 
- uses two or more independent variables to predict a dependent variable by fitting a best linear relationship 

*Equation*: Y = β0 + β1X1 + β2X2 + β3X3 + … + βnXn + e

Y = dependent variable/target variable

β0 = intercept of regression line

β1,β2,β3,... = slope of the regression line which tells whether or not the 
line is increasing or decreasing

X1,X2,X3,... = independent variable

e = error

**Example**: Predicting sales based on money spent on TV, Radio, and Newspaper for marketing. In this case, there are 3 independent variables, (money spent on TV, Radio, and Newspaper) for marketing, and one dependent variable (sales) that is the value to be predicted

**Problem statement**: Build a multiple linear regression model to predict sales based on the moneyspent on TV, Radio, and Newspaper for advertising. 

In [4]:
# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# reading in the data
ad = pd.read_csv('advertising.csv')

# previewing the data
ad.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


**Equation**: Sales(Y) = β0 + (β1 * TV) + (β2 * Radio) + (β3 * Newspaper) + e

In [16]:
# setting the value for x and y 
x = ad[['TV','Radio','Newspaper']]
y = ad[['Sales']]


# splitting data into train and test
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3, random_state = 100)


# fitting the multiple linear regression model
from sklearn.linear_model import LinearRegression
mlr = LinearRegression()
mlr.fit(x_train,y_train)


# printing intercept and coefficient
print('Intercept: ', mlr.intercept_)
print('Coefficients:',mlr.coef_)

Intercept:  [4.33459586]
Coefficients: [[0.05382911 0.11001224 0.00628995]]


*Regression equation*: Sales = 4.3345 + (0.0538*TV)+(1.1100 *Radio)+ (0.0062 *Newspaper) +e

Interpretation: With the intercept being *4.3345*, this means that if the money spent on TV,Radio and Newspaper advertisements is 0, then the estimated average sales will be 4.3345, and a single unit increase in money spent on TV for ads increases sales by 0.053, a single unit increase in money spent on radio ads increases sales by 1.11, and money spent on newspaper ads increases sales by 0.0062..

In [18]:
# Prediction of test set
y_pred_mlr = mlr.predict(x_test)

# Predicted values
print("Prediction for test set: {}".format(y_pred_mlr))


Prediction for test set: [[ 9.35221067]
 [20.96344625]
 [16.48851064]
 [20.10971005]
 [21.67148354]
 [16.16054424]
 [13.5618056 ]
 [15.39338129]
 [20.81980757]
 [21.00537077]
 [12.29451311]
 [20.70848608]
 [ 8.17367308]
 [16.82471534]
 [10.48954832]
 [ 9.99530649]
 [16.34698901]
 [14.5758119 ]
 [17.23065133]
 [12.56890735]
 [18.55715915]
 [12.12402775]
 [20.43312609]
 [17.78017811]
 [16.73623408]
 [21.60387629]
 [20.13532087]
 [10.82559967]
 [19.12782848]
 [14.84537816]
 [13.13597397]
 [ 9.07757918]
 [12.07834143]
 [16.62824427]
 [ 8.41792841]
 [14.0456697 ]
 [ 9.92050209]
 [14.26101605]
 [16.76262961]
 [17.17185467]
 [18.88797595]
 [15.50165469]
 [15.78688377]
 [16.86266686]
 [13.03405813]
 [10.47673934]
 [10.6141644 ]
 [20.85264977]
 [10.1517568 ]
 [ 6.88471443]
 [17.88702583]
 [18.16013938]
 [12.55907083]
 [16.28189561]
 [18.98024679]
 [11.33714913]
 [ 5.91026916]
 [10.06159509]
 [17.62383031]
 [13.19628335]]


In [21]:
# once we have trained the model, we can make predictions using the predict() 
# function; passing the values of x_test to this method to compare values called
# y_pred_mlr to y_test to check how accurate predicted values are

y_test = np.squeeze(y_test)
y_pred_mlr = np.squeeze(y_pred_mlr)


mlr_diff = pd.DataFrame({'Actual value': y_test,
                         'Predicted value': y_pred_mlr})

mlr_diff.head()

Unnamed: 0,Actual value,Predicted value
126,6.6,9.352211
104,20.7,20.963446
99,17.2,16.488511
92,19.4,20.10971
111,21.8,21.671484


In [26]:
# Model evaluation

from sklearn import metrics

mean_ab_error = metrics.mean_absolute_error(y_test,y_pred_mlr)
mean_sq_error = metrics.mean_squared_error(y_test, y_pred_mlr)
rmse = np.sqrt(metrics.mean_squared_error(y_test,y_pred_mlr))

print('R squared: {:.2f}'.format(mlr.score(x,y)*100))
print('Mean Absolute Error:',mean_ab_error)
print('Mean Square Error:', mean_sq_error)
print('RMSE:',rmse)

R squared: 90.11
Mean Absolute Error: 1.227818356658941
Mean Square Error: 2.6360765623280646
RMSE: 1.6235998775338907


*R squared* : 90.11% of the variation in the outcome variable is explained by the predictor variables

*MAE*: MAE is close to zero, which means the model's performance is good. 

*RMSE*: metric used to measure accuracy pf the model