<b><font size="6">01. Model Evaluation</font></b><br><br>
We can define **Model Evaluation** as the process of measuring how well the model is performing a certain task. This evaluation is done by checking the performance of one or more different predictive models based on the use of a validation, or test dataset.
<br>
# <font color='#BFD72F'>Contents</font> <a class="anchor" id="toc"></a>

* [0 - Import Libraries and Data](#import)<br>
* [1 - Data Partition Techniques](#datapart)<br>
    * [1.1. The Hold-Out Method](#1st-bullet)<br>
    * [1.2. The K-Fold Cross Validation](#2nd-bullet)<br>
    * [1.3. The leave One Out](#4th-bullet)<br>
    * [1.4. Stratified k-fold and others](#5th-bullet)<br>
* [2 - Compare Models](#train-test)

<br>

### <font color='#BFD72F'> Supervised versus Unsupervised Methods</font>
<font size="1">Daniel T. Larose, Chantal D. Larose (2015) “Data Mining and Predictive Analytics,” 2nd Edition, Wiley (pp. 160-161)</font>

Data mining methods may be categorized as either supervised or unsupervised. In **unsupervised methods**, no target variable is identified as such. Instead, the data mining algorithm searches for patterns and structures among all the variables. The most common unsupervised data mining method is clustering.

Most data mining methods are **supervised methods**, however, meaning that there is a particular prespecified target variable, and that the algorithm is given many examples where the value of the target variable is provided, so that the algorithm may learn which values of the target variable are associated with which values of the predictor variables. For example, regression methods are supervised methods.

<img src="images\sup_unsup.png" width="600px">

If you wish to explore this difference further, check this <a href="https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning"> link</a>.
<br><br><br>

<img src="images\semma.png">

# 0. Importing Libraries and Data <a class="anchor" id="import"></a>
[Back to Contents](#toc)

__`Step 1`__ Import the needed libraries.

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

__`Step 2`__ Read the dataset __diabetes.csv__

In [2]:
diabetes = pd.read_csv(r'data/diabetes.csv')
diabetes.head(5)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


<hr>

### `Diabetes dataset`
(Source: UCI Machine Learning Repository)

`INPUT VARIABLES`: numeric <br>
`OUPUT VARIABLE`: categorical <br>

__GOAL__: predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

`Pregnancies` Number of times pregnant <br>
`Glucose` Plasma glucose concentration a 2 hours in an oral glucose tolerance test<br>
`BloodPressure`Diastolic blood pressure (mm Hg)<br>
`SkinThickness`Triceps skin fold thickness (mm)<br>
`Insulin`2-Hour serum insulin (mu U/ml)<br>
`BMI` Body mass index (weight in kg/(height in m)^2)<br>
`DiabetesPedigreeFunction` Diabetes pedigree function<br>
`Age`Age (years)<br>
`Outcome` Class variable (0 or 1) 268 of 768 are 1, the others are 0<br>

<hr>

__`Step 3`__ Create an object named __data__ that will contain your independent variables and another object named __target__ that will contain your dependent variable / target (the last column in the dataset)

In [3]:
data = diabetes.iloc[:,:-1]
target = diabetes.iloc[:,-1]

# 1. Data partition <a class="anchor" id="datapart"></a>
[Back to Contents](#toc)
<br><br>
* [1.1. The Hold-Out Method](#1st-bullet)<br>
* [1.2. The K-Fold Cross Validation](#2nd-bullet)<br>
* [1.3. The leave One Out](#4th-bullet)<br>
* [1.4. Stratified k-fold and others](#5th-bullet)<br>

<a class="anchor" id="1st-bullet">

## 1.1. The Hold-Out Method
    
</a>

In this approach we randomly split the complete data into **training** and **test** sets. Then perform the model training on the training set and use the test set for validation purpose, ideally split the data into 70:30 or 80:20. With this approach there is a possibility of high bias if we have limited data, because we would miss some information about the data which we have not used for training. If our data is huge and our test sample and train sample has the same distribution then this approach is acceptable. <br>

<img src="images/hold_out.jpg" width="400px" />

By default, `sklearn` has a function named `train_test_split` that allows to split the dataset into two different datasets.

__`Step 4`__ Import the library `train_test_split` from `sklearn.model_selection`

In [4]:
from sklearn.model_selection import train_test_split

__`Step 5`__ Divide the `data`into `X_train_val` and `X_test`, the `target`into `y_train_val` and `y_test`, and define the following arguments: `test_size = 0.2`, `random_state = 15`, `shuffle = True` and `stratify = target`  _(written for you)_

In [5]:
X_train_val, X_test, y_train_val, y_test = train_test_split(data, 
                                                    target, 
                                                    test_size=0.2, 
                                                    random_state=15, 
                                                    shuffle=True, 
                                                    stratify=target
                                                   )

This will allow me to create two different datasets, one for train (80% of the data) and one for test (20% of the data). <br>
The stratification will allow me to have the same proportion of each label of the dependent variable in both datasets.


### How to create the three datasets: train, validation and test?

In this exercise, we are going to split our dataset into **train**, **test** and **validation**. <br> <br>

<img src="images/hold_out_3.png" width=500 />

To create three datasets (train, validation and test) we are going to use the function train_test_split twice. <br><br>
First we are going to create two sets of datasets, one for test (X_test and y_test) and another one that includes the data for training and validation (X_train_val and y_train_val).

__`Step 6`__  Divide the `X_train_val`into `X_train` and `X_val`, the `y_train_val` into `y_train` and `y_val`, and define the following arguments: `test_size = 0.25`, `random_state = 15`, `shuffle = True` and `stratify = y_train_val`.

In [6]:
X_train, X_val, y_train, y_val = train_test_split(X_train_val,
                                                  y_train_val,
                                                  test_size = 0.25,
                                                  random_state = 15,
                                                  shuffle=True,
                                                  stratify=y_train_val
                                                 )

__`Step 7`__ Check the proportion of data for each dataset. _(written for you)_

In [7]:
print('train:{}% | validation:{}% | test:{}%'.format(round(len(y_train)/len(target),2),
                                                     round(len(y_val)/len(target),2),
                                                     round(len(y_test)/len(target),2)
                                                    ))

train:0.6% | validation:0.2% | test:0.2%


Now we have three different datasets, namely:
- Training dataset, with 60% of the data, that will allow me to build the model;
- Validation dataset, with 20% of the data, that will allow me to fine tune the model and check some problems like overfitting;
- Test dataset, with 20% of the data, that will allow me to evaluate the performance of the final model.

__`Step 8`__ Now we are going to train and validate a model (Logictic Regression) using the created datasets. Start by importing __LogisticRegression__ from __sklearn.linear_model__.

In [8]:
from sklearn.linear_model import LogisticRegression

__`Step 9`__ Create an instance of `LogisticRegression` named as __log_model__ with the default parameters and `fit` to your train data.

In [9]:
log_model = LogisticRegression()
log_model.fit(X_train, y_train)

LogisticRegression()

__`Step 10`__ Check the performance of your model using the method `.score()`.

In [10]:
print('Train:', log_model.score(X_train, y_train))
print('Validation:', log_model.score(X_val, y_val))
print('Test:', log_model.score(X_test, y_test))

Train: 0.7934782608695652
Validation: 0.7597402597402597
Test: 0.7597402597402597



<a class="anchor" id="2nd-bullet">

## 1.2. K-Fold Cross-Validation
    
</a>

The different techniques we are going to check in this step are commonly used in applied machine learning to compare and select a model for a given predictive modeling problem.

__Definition__<br>
_Divide the total dataset into k subsets mutually exclusive of the same size. Each subset is going to be used as test and the remaining k-1 subsets will be used for training. This process is repeated k times by alternating the test subset._

<img src="images/kfold.png" width=400 />

In the following cases, we are going to check the performance of a Logistic Regression using those different techniques.

__`Step 11`__ Import __KFold__ from __sklearn.model_selection__

In [11]:
from sklearn.model_selection import KFold

__`Step 12`__ Create a function named __avg_score__ that will return the average score value for the train and the test set by applying a model, in this case, Logistic Regression. This will have as parameters the technique you are going to use, the model, your dependent variable and your independent variables.

In [12]:
def avg_score(method, mod, X, y):
    score_train = []
    score_test = []
    
    for train_index, test_index in method.split(X):
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        model = mod.fit(X_train, y_train)
        value_train = model.score(X_train,y_train)
        print('Train:', value_train)
        value_test = model.score(X_test, y_test)
        print('Test:', value_test)
        print('')
        score_train.append(value_train)
        score_test.append(value_test)

    print('-------------------------------')
    print('Average Train:' +  str(round(np.mean(score_train),4)) + '+/-' + str(round(np.std(score_train),4)))
    print('Average Test:' +  str(round(np.mean(score_test),4)) + '+/-' + str(round(np.std(score_test),4)))

__`Step 13`__ Create a KFold Instance where the number of splits is 10 (*n_splits*) and name it as __kf__

In [13]:
kf = KFold(n_splits=10)

__`Step 14`__ Call the function __avg_score__ and check the average score for the train and the test sets using __kf__

In [14]:
avg_score(kf, LogisticRegression(), data, target)

Train: 0.7988422575976846
Test: 0.7012987012987013

Train: 0.7742402315484804
Test: 0.8441558441558441

Train: 0.7785817655571635
Test: 0.7532467532467533

Train: 0.788712011577424
Test: 0.6883116883116883

Train: 0.7829232995658466
Test: 0.7922077922077922

Train: 0.7901591895803184
Test: 0.7402597402597403

Train: 0.768451519536903
Test: 0.8571428571428571

Train: 0.7742402315484804
Test: 0.8181818181818182

Train: 0.7933526011560693
Test: 0.7368421052631579

Train: 0.7803468208092486
Test: 0.8026315789473685

-------------------------------
Average Train:0.783+/-0.0091
Average Test:0.7734+/-0.0552


<a class="anchor" id="4th-bullet">

## 1.4. Leave One Out
    
</a>

__Definition__<br>
_We divide the data set into two parts. In one part we have a single observation, which is our test data and in the other part, we have all the other observations from the dataset forming our training data.
If we have a data set with n observations then training data contains n-1 observation and test data contains 1 observation._

__`Step 15`__ Do the same steps you applied on the previous techniques, but this time using the Leave One Out. For that, you need to import __LeaveOneOut__ from __sklearn.model_selection__

In [15]:
from sklearn.model_selection import LeaveOneOut
loo = LeaveOneOut()
avg_score(loo, LogisticRegression(), data, target)

Train: 0.7770534550195567
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7757496740547588
Test: 1.0

Train: 0.7770534550195567
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7848761408083442
Test: 0.0

Train: 0.7835723598435462
Test: 0.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7796610169491526
Test: 0.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7835723598435462
Test: 0.0

Train: 0.7783572359843546
Test: 1.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7770534550195567
Test: 1.0

Train: 0.7835723598435462
Test: 0.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7835723598435462
Test: 0.0

Train: 0.7822685788787483
Test: 1.0

T

Train: 0.7835723598435462
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7861799217731421
Test: 0.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7783572359843546
Test: 1.0

Train: 0.7848761408083442
Test: 1.0

Train: 0.7796610169491526
Test: 0.0

Train: 0.7783572359843546
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

T

Train: 0.7822685788787483
Test: 1.0

Train: 0.7796610169491526
Test: 0.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7861799217731421
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7770534550195567
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7861799217731421
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7770534550195567
Test: 1.0

Train: 0.7770534550195567
Test: 0.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7822685788787483
Test: 0.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7848761408083442
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7848761408083442
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.788787483702738
Test: 0.0

Tr

Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7822685788787483
Test: 1.0

Train: 0.7770534550195567
Test: 0.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7796610169491526
Test: 0.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7783572359843546
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7848761408083442
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7757496740547588
Test: 1.0

Train: 0.7861799217731421
Test: 0.0

Train: 0.7783572359843546
Test: 1.0

Train: 0.7809647979139505
Test: 1.0

Train: 0.7796610169491526
Test: 1.0

Train: 0.7796610169491526
Test: 0.0

Train: 0.7835723598435462
Test: 1.0

Train: 0.7848761408083442
T

<a class="anchor" id="5th-bullet">

## 1.5. Stratified k-fold and others
    
</a>

Using SkLearn you have several options to select your model, and the application is similar to the cases we saw previously.

<img src="images/model_selection.png" alt="Drawing" style="width: 800px;"/> <br>

<a class="anchor" id="6th-bullet">

## 1.6. Comparing models
    
</a>

Don't forget that the purpose of this notebook is to compare different models. In this step, you are going to fit your data into a DecisionTree model also, and use the __RepeatedKFold__ to compare the performance of it with the Logistic Regression

__`Step 16`__ Import __DecisionTreeClassifier__ from __sklearn.tree__

In [16]:
from sklearn.tree import DecisionTreeClassifier

__`Step 17`__ Similarly to step 12, use the function named __avg_score__ that will return the average score value for the train and the test set, but this time check the results for a decision tree.

In [17]:
avg_score(KFold(n_splits=10), DecisionTreeClassifier(max_depth = 5), data, target)

Train: 0.8422575976845152
Test: 0.6493506493506493

Train: 0.8364688856729378
Test: 0.7792207792207793

Train: 0.8263386396526773
Test: 0.7012987012987013

Train: 0.8480463096960926
Test: 0.5844155844155844

Train: 0.8379160636758322
Test: 0.8051948051948052

Train: 0.8364688856729378
Test: 0.8311688311688312

Train: 0.8306801736613604
Test: 0.8311688311688312

Train: 0.8335745296671491
Test: 0.8311688311688312

Train: 0.8395953757225434
Test: 0.7105263157894737

Train: 0.8236994219653179
Test: 0.7631578947368421

-------------------------------
Average Train:0.8355+/-0.0069
Average Test:0.7487+/-0.0808


We can verify that the decision trees are prone to overfitting. In the notebook 4, we are going to address some techniques to reduce this overfitting.