# Model Development

## Objectives

After completing this lab you will be able to:

*   Develop prediction models


<h4>Setup</h4>

Import libraries:


In [78]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import skillsnetwork
import warnings
import sklearn
warnings.filterwarnings('ignore')

Load the data and store it in dataframe `df`:


In [79]:
path = 'automobileEDA.csv'
df = pd.read_csv(path, header=0)
df.head()

Unnamed: 0,symboling,normalized-losses,make,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,...,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price,city-L/100km,horsepower-binned,diesel,gas
0,3,122,alfa-romero,std,two,convertible,rwd,front,88.6,0.811148,...,9.0,111.0,5000.0,21,27,13495.0,11.190476,Medium,0,1
1,3,122,alfa-romero,std,two,convertible,rwd,front,88.6,0.811148,...,9.0,111.0,5000.0,21,27,16500.0,11.190476,Medium,0,1
2,1,122,alfa-romero,std,two,hatchback,rwd,front,94.5,0.822681,...,9.0,154.0,5000.0,19,26,16500.0,12.368421,Medium,0,1
3,2,164,audi,std,four,sedan,fwd,front,99.8,0.84863,...,10.0,102.0,5500.0,24,30,13950.0,9.791667,Medium,0,1
4,2,164,audi,std,four,sedan,4wd,front,99.4,0.84863,...,8.0,115.0,5500.0,18,22,17450.0,13.055556,Medium,0,1


<h2>1. Linear Regression and Multiple Linear Regression</h2>

<h4>Linear Regression</h4>


<p>One example of a Data  Model that we will be using is:</p>
<b>Simple Linear Regression</b>

<br>
<p>Simple Linear Regression is a method to help us understand the relationship between two variables:</p>
<ul>
    <li>The predictor/independent variable (X)</li>
    <li>The response/dependent variable (that we want to predict)(Y)</li>
</ul>

<p>The result of Linear Regression is a <b>linear function</b> that predicts the response (dependent) variable as a function of the predictor (independent) variable.</p>
$$
Y: Response \ Variable\\\\\\\\\\\\
X: Predictor \ Variables
$$

<b>Linear Function</b>
$$
Yhat = a + b  X
$$

<ul>
    <li>a refers to the <b>intercept</b> of the regression line, in other words: the value of Y when X is 0</li>
    <li>b refers to the <b>slope</b> of the regression line, in other words: the value with which Y changes when X increases by 1 unit</li>
</ul>



<h4>Let's load the modules for linear regression:</h4>


In [80]:
from sklearn.linear_model import LinearRegression

<h4>Create the linear regression object:</h4>


In [81]:
lm = LinearRegression()
lm

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


<h4>How could "highway-mpg" help us predict car price?</h4>


For this example, we want to look at how highway-mpg can help us predict car price.
Using simple linear regression, we will create a linear function with "highway-mpg" as the predictor variable and the "price" as the response variable.


In [82]:
X = df[['highway-mpg']]
Y = df['price']

Fit the linear model using highway-mpg:


In [83]:
lm.fit(X,Y)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


We can output a prediction:


In [84]:
Yhat=lm.predict(X)
print(Yhat[0:5])  

[16236.50464347 16236.50464347 17058.23802179 13771.3045085
 20345.17153508]


<h4>What is the value of the intercept (a)?</h4>


In [85]:
print(lm.intercept_)

38423.3058581574


<h4>What is the value of the slope (b)?</h4>


In [86]:
print(lm.coef_)

[-821.73337832]


<h3>What is the final estimated linear model we get?</h3>


As we saw above, we should get a final linear model with the structure:


$$
Yhat = a + b  X
$$


Plugging in the actual values we get:


<b>Price</b> = 38423.31 - 821.73 x <b>highway-mpg</b>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1>Question #1 a): </h1>

<b>Create a linear regression object called "lm1".</b>

</div>


In [87]:
lm1 = LinearRegression()
lm1

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #1 b): </h1>

<b>Train the model using "engine-size" as the independent variable and "price" as the dependent variable?</b>

</div>


In [88]:
lm1.fit(df[['engine-size']], df[['price']])
lm1

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1>Question #1 c):</h1>

<b>Find the slope and intercept of the model.</b>

</div>


In [89]:
lm1.intercept_
print('Slope:', lm1.coef_[0][0])

lm1.coef_
print('Intercept:', lm1.intercept_[0])

Slope: 166.86001569141584
Intercept: -7963.338906281027


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1>Question #1 d): </h1>

<b>What is the equation of the predicted line? You can use x and yhat or "engine-size" or "price".</b>

</div>


In [90]:
Yhat=-7963.34 + 166.86 * X

Price=-7963.34 + 166.86 * df['engine-size']

<h4>Multiple Linear Regression</h4>


<p>What if we want to predict car price using more than one variable?</p>

<p>If we want to use more variables in our model to predict car price, we can use <b>Multiple Linear Regression</b>.
Multiple Linear Regression is very similar to Simple Linear Regression, but this method is used to explain the relationship between one continuous response (dependent) variable and <b>two or more</b> predictor (independent) variables.
Most of the real-world regression models involve multiple predictors. We will illustrate the structure by using four predictor variables, but these results can generalize to any integer:</p>


$$
Y: Response \ Variable\\\\\\\\\\\\
X\_1 :Predictor\ Variable \ 1\\\\\\
X\_2: Predictor\ Variable \ 2\\\\\\
X\_3: Predictor\ Variable \ 3\\\\\\
X\_4: Predictor\ Variable \ 4\\\\\\
$$


$$
a: intercept\\\\\\\\\\\\
b\_1 :coefficients \ of\ Variable \ 1\\\\\\
b\_2: coefficients \ of\ Variable \ 2\\\\\\
b\_3: coefficients \ of\ Variable \ 3\\\\\\
b\_4: coefficients \ of\ Variable \ 4\\\\\\
$$


The equation is given by:


$$
Yhat = a + b\_1 X\_1 + b\_2 X\_2 + b\_3 X\_3 + b\_4 X\_4
$$


<p>From the previous section  we know that other good predictors of price could be:</p>
<ul>
    <li>Horsepower</li>
    <li>Curb-weight</li>
    <li>Engine-size</li>
    <li>Highway-mpg</li>
</ul>
Let's develop a model using these variables as the predictor variables.


In [91]:
Z = df[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']]

Fit the linear model using the four above-mentioned variables.


In [92]:
lm.fit(Z, df['price'])

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


What is the value of the intercept(a)?


In [93]:
lm.intercept_
print('The estimated intercept coefficient is {}'.format(lm.intercept_))

The estimated intercept coefficient is -15806.62462632922


What are the values of the coefficients (b1, b2, b3, b4)?


In [94]:
lm.coef_
print('The estimated coefficients are {}'.format(lm.coef_))

The estimated coefficients are [53.49574423  4.70770099 81.53026382 36.05748882]


What is the final estimated linear model that we get?


As we saw above, we should get a final linear function with the structure:

$$
Yhat = a + b\_1 X\_1 + b\_2 X\_2 + b\_3 X\_3 + b\_4 X\_4
$$

What is the linear function we get in this example?


<b>Price</b> = -15806.62462632918 + 53.49574423 x <b>horsepower</b> + 4.70770099 x <b>curb-weight</b> + 81.53026382 x <b>engine-size</b> + 36.05748882 x <b>highway-mpg</b>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question  #2 a): </h1>
Create and train a Multiple Linear Regression model "lm2" where the response variable is "price", and the predictor variable is "normalized-losses" and  "highway-mpg".
</div>



## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>
