#Model Evaluation & Hyperparameter Tuning
<h2> Demo 2: Model Evaluation Metrics for Regression </h2>

In this demo, you will learn how to use different model evaluation metrics like MAPE, RMSE, RMSLE for regression model.

# Problem Statement

The dataset has various features from coffee beans from different countries. The aim is to predict mean altitude.

# Importing Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import mean_squared_error,mean_squared_log_error
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Data Analysis

In [2]:
data=pd.read_csv('arabica_data.csv')

In [3]:
data.head()

Unnamed: 0,Country.of.Origin,Number.of.Bags,Bag.Weight,Aroma,Flavor,Aftertaste,Acidity,Body,Balance,Uniformity,Clean.Cup,Sweetness,Cupper.Points,Total.Cup.Points,Moisture,altitude_mean_meters
0,Ethiopia,300,60,8.67,8.83,8.67,8.75,8.5,8.42,10.0,10.0,10.0,8.75,90.58,0.12,2075.0
1,Ethiopia,300,60,8.75,8.67,8.5,8.58,8.42,8.42,10.0,10.0,10.0,8.58,89.92,0.12,2075.0
2,Guatemala,5,1,8.42,8.5,8.42,8.42,8.33,8.42,10.0,10.0,10.0,9.25,89.75,0.0,1700.0
3,Ethiopia,320,60,8.17,8.58,8.42,8.42,8.5,8.25,10.0,10.0,10.0,8.67,89.0,0.11,2000.0
4,Ethiopia,300,60,8.25,8.5,8.25,8.5,8.42,8.33,10.0,10.0,10.0,8.58,88.83,0.12,2075.0


In [4]:
data.shape

(1311, 16)

<div style="background-color:##FBFFE3">
    <div style="float:left">
        <img src='https://drive.google.com/uc?export=download&id=1gpc8kbwaenTpaIvnO_MQR3vw12rKYQrJ', width=20,height=20,>
    </div>
    <span style="padding:20px;">Dataset has 1310 rows and 16 columns</span>
</div>

<div style="background-color:##EDF9FF">
  <ul><li><h3> Testing For Null Values</h3></li></ul>
</div>

In [5]:
pd.DataFrame({'Count':data.isnull().sum(),'Percentage':data.isnull().sum()/data.shape[0]})

Unnamed: 0,Count,Percentage
Country.of.Origin,1,0.000763
Number.of.Bags,0,0.0
Bag.Weight,0,0.0
Aroma,0,0.0
Flavor,0,0.0
Aftertaste,0,0.0
Acidity,0,0.0
Body,0,0.0
Balance,0,0.0
Uniformity,0,0.0


<div style="background-color:##FBFFE3">
    <div style="float:left">
        <img src='https://drive.google.com/uc?export=download&id=1gpc8kbwaenTpaIvnO_MQR3vw12rKYQrJ', width=20,height=20,>
    </div>
    <span style="padding:20px;">Dataset has null values. Since <b>Country.of.Origin</b> has only one null values we can drop that row and <b>altitube_mean_meters</b> has 227 null values we will have to impute them.</span>
</div>

<div style="background-color:##EDF9FF">
  <ul><li><h3>Handling Null Values</h3></li></ul>
</div>

In [6]:
# Null value in Country of origin
data[data['Country.of.Origin'].isna()==True]

Unnamed: 0,Country.of.Origin,Number.of.Bags,Bag.Weight,Aroma,Flavor,Aftertaste,Acidity,Body,Balance,Uniformity,Clean.Cup,Sweetness,Cupper.Points,Total.Cup.Points,Moisture,altitude_mean_meters
1197,,149,70,6.75,6.75,6.42,6.83,7.58,7.5,10.0,10.0,10.0,7.25,79.08,0.1,


In [7]:
data=data.drop(1197,axis=0)

In [8]:
#Handling target null value
mean=data.altitude_mean_meters.mean()
data.altitude_mean_meters=data.altitude_mean_meters.fillna(mean)

In [9]:
pd.DataFrame({'Count':data.isnull().sum(),'Percentage':data.isnull().sum()/data.shape[0]})

Unnamed: 0,Count,Percentage
Country.of.Origin,0,0.0
Number.of.Bags,0,0.0
Bag.Weight,0,0.0
Aroma,0,0.0
Flavor,0,0.0
Aftertaste,0,0.0
Acidity,0,0.0
Body,0,0.0
Balance,0,0.0
Uniformity,0,0.0


<div style="background-color:##FBFFE3">
    <div style="float:left">
        <img src='https://drive.google.com/uc?export=download&id=1gpc8kbwaenTpaIvnO_MQR3vw12rKYQrJ', width=20,height=20,>
    </div>
    <span style="padding:20px;">Dataset has no null values now.</span>
</div>

In [10]:
print(*data.columns,sep=',')

Country.of.Origin,Number.of.Bags,Bag.Weight,Aroma,Flavor,Aftertaste,Acidity,Body,Balance,Uniformity,Clean.Cup,Sweetness,Cupper.Points,Total.Cup.Points,Moisture,altitude_mean_meters


<div style="background-color:##EDF9FF">
  <ul><li><h3>Selecting Features</h3></li></ul>
</div>

In [11]:
X=data[['Aroma','Flavor','Aftertaste','Acidity','Body','Balance','Uniformity','Clean.Cup',\
             'Sweetness','Cupper.Points','Total.Cup.Points','Moisture']]
y=data.altitude_mean_meters

<div style="background-color:##EDF9FF">
  <ul><li><h3>Training And Testing Data</h3></li></ul>
</div>

In [12]:
X_train, X_test, y_train, y_test=train_test_split(X,y,random_state=2,test_size=0.3)

# Training Model

In [13]:
lreg=LinearRegression()
lreg.fit(X_train,y_train)
lpred=lreg.predict(X_test)


In [14]:
#MAPE
mape= np.mean(np.abs((np.array(y_test) - np.array(lpred)) / y_test)) * 100
print(mape)

1839.5565935584532


<div style="background-color:##FBFFE3">
    <div style="float:left">
        <img src='https://drive.google.com/uc?export=download&id=1gpc8kbwaenTpaIvnO_MQR3vw12rKYQrJ', width=20,height=20,>
    </div>
    <span style="padding:20px;">High MAPE represents error margin is very high</span>
</div>

In [15]:
# MSE
mean_squared_error(y_test,lpred)

1679073.419874591

In [16]:
# RMSE
np.sqrt(mean_squared_error(y_test,lpred))

1295.7906543398865

In [17]:
# MSLE
mean_squared_log_error(y_test,lpred)

0.9752585397205423

In [18]:
#RMLSE
np.sqrt(mean_squared_log_error(y_test,lpred))

0.9875517909054402

*__Conclusion__*: In this demonstration, we have learned how to use regression metrics.