# Models' predictions evaluation

**NOTEBOOK GOAL**: Evaluate the predicted NumberOfSales values.


Compared predictions:

- **1_RFR** - Notebook 5.3 Random forrest
- **2_XGB** - Notebook 6.4 XGBoost
- **3_AVG** - Notebook 7.2 AVG Month ensembles
- **4_ENS_ALL** - Ensable by averaging all the previous models

In [1]:
from import_man import *
import collections

from BIP import get_BIP_error, get_BIP_error_, apply_BIP_submission_structure, apply_BIP_submission_structure_keep_actual

**NOTE** If you cannot load the followig datasets, please go to the corresponding notebook and run it to generate the related dataset file. 

In [2]:
test_53_RFR_on_prep = pd.read_csv('./dataset/test_m12_53_RFR_on_prep.csv')

In [3]:
test_64_Model_XGBoost = pd.read_csv('./dataset/test_m12_64_Model_XGBoost_final.csv')

In [4]:
test_72_AVG_Month_Ensemble = pd.read_csv('./dataset/test_m12_72_AVG_Month_Ensemble.csv')

In [5]:
dfs_dict = collections.OrderedDict()

# the following dataset will be evaluated
dfs_dict['RFR'] = test_53_RFR_on_prep
dfs_dict['XGB'] = test_64_Model_XGBoost
dfs_dict['AVG'] = test_72_AVG_Month_Ensemble

In [6]:
# apply the BIP_submission_structure_keep_actual to all the dataframes
for mdl_lbl, df in dfs_dict.items():
    dfs_dict[mdl_lbl] = apply_BIP_submission_structure_keep_actual(df)

### Let's combine models' predictions by their average

Model ensable by the mean of the predictions of all the models.

In [7]:
# let's copy the first df in order to have a base dataframe
df_ens = list(dfs_dict.values())[0].copy()

df_ens['_NumberOfSales'] = 0

# sum up all the NumberOfSales
for mdl_lbl, df in dfs_dict.items():
    df_ens['_NumberOfSales'] += df['_NumberOfSales']
    
# divide by the number of models
df_ens['_NumberOfSales'] /= len(dfs_dict)

#dfs_dict['ENS_ALL'] = df_ens

## Evaluation metrics

1. **[EVS] Explained Variance Score**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score)

    The mean_absolute_error function computes mean absolute error, a risk metric corresponding to the expected value of the absolute error loss or l1-norm loss.
    
    $$ \texttt{explained__variance}(y, \hat{y}) = 1 - \frac{Var\{ y - \hat{y}\}}{Var\{y\}} $$
    
    The best possible score is 1.0, lower values are worse.
    
2. **[MAE] Mean absolute error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error)

    The mean_squared_error function computes mean square error, a risk metric corresponding to the expected value of the squared (quadratic) error or loss.
   
    $$ \text{MAE}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| y_i - \hat{y}_i \right| $$
    
3. **[MSE] Mean squared error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-error)

    The mean_squared_error function computes mean square error, a risk metric corresponding to the expected value of the squared (quadratic) error or loss.
    
    $$ \text{MSE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2 $$
    
4. **[RMSE] Root mean squared error**

    $$ \text{RMSE}(y, \hat{y}) = \sqrt{\frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2} $$
    
5. **[MSLE] Mean squared logarithmic error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-logarithmic-error)
    
    The mean_squared_log_error function computes a risk metric corresponding to the expected value of the squared logarithmic (quadratic) error or loss.
    
    $$ \text{MSLE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (\log_e (1 + y_i) - \log_e (1 + \hat{y}_i) )^2 $$
    
    Where log_e (x) means the natural logarithm of x. This metric is best to use when targets having exponential growth, such as population counts, average sales of a commodity over a span of years etc. Note that this metric penalizes an under-predicted estimate greater than an over-predicted estimate.

6. **[MedAE] Median absolute error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#median-absolute-error)

    The median_absolute_error is particularly interesting because it is robust to outliers. The loss is calculated by taking the median of all absolute differences between the target and the prediction.
    
    $$ \text{MedAE}(y, \hat{y}) = \text{median}(\mid y_1 - \hat{y}_1 \mid, \ldots, \mid y_n - \hat{y}_n \mid)$$
    
7. **[R²] R²score, the coefficient of determination**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination)

    The r2_score function computes R², the coefficient of determination. It provides a measure of how well future samples are likely to be predicted by the model. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
    
    $$ R^2(y, \hat{y}) = 1 - \frac{\sum_{i=0}^{n_{\text{samples}} - 1} (y_i - \hat{y}_i)^2}{\sum_{i=0}^{n_\text{samples} - 1} (y_i - \bar{y})^2}$$
    
    where
    
    $$ \bar{y} =  \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} y_i $$

In [8]:
from sklearn import metrics

from math import sqrt

def print_regr_stats(df):
    y_true = df['NumberOfSales']
    y_pred = df['_NumberOfSales']
    
    stats = collections.OrderedDict()
    stats['EVS']   = metrics.explained_variance_score(y_true, y_pred)
    stats['MAE']   = metrics.mean_absolute_error(y_true, y_pred)
    stats['MSE']   = metrics.mean_squared_error(y_true, y_pred)
    stats['RMSE']  = sqrt(stats['MSE'])
    stats['MSLE']  = metrics.mean_squared_log_error(y_true, y_pred)
    stats['MedAE'] = metrics.median_absolute_error(y_true, y_pred)
    stats['R2']    = metrics.r2_score(y_true, y_pred)
    
    print("Explained Variance Score               [EVS] ", stats['EVS'])
    print("Mean absolute error                    [MAE] ", stats['MAE'])
    print("Mean squared error                     [MSE] ", stats['MSE'])
    print("Root mean squared error               [RMSE] ", stats['RMSE'])
    print("Mean squared logarithmic error        [MSLE] ", stats['MSLE'])
    print("Median absolute error (reference)    [MedAE] ", stats['MedAE'])
    print("R²score, coefficient of determination   [R²] ", stats['R2'])
    
    return stats


In [9]:
models_stats = collections.OrderedDict()
models_stats['EVS']   = []
models_stats['MAE']   = []
models_stats['MSE']   = []
models_stats['RMSE']  = []
models_stats['MSLE']  = []
models_stats['MedAE'] = []
models_stats['R2']    = []


for mdl_lbl, df in dfs_dict.items():
    print('................................ ' + mdl_lbl + '................................')
    stats = print_regr_stats(df)
    
    # add the model statistics to the main index
    for name, val in stats.items():
        models_stats[name].append((mdl_lbl, val))
        
    print('\n')
    
    
print("\n\nSTATISTICS COMPARISON\n")

for name, l in models_stats.items():
    print(name)
    pprint(list(enumerate(sorted(l, key=lambda tup: tup[1]))))


................................ RFR................................
Explained Variance Score               [EVS]  0.9516157191345931
Mean absolute error                    [MAE]  8403.387817765266
Mean squared error                     [MSE]  130439386.0449889
Root mean squared error               [RMSE]  11421.006349923324
Mean squared logarithmic error        [MSLE]  0.00879432389306739
Median absolute error (reference)    [MedAE]  6596.779882267692
R²score, coefficient of determination   [R²]  0.9266278413635942


................................ XGB................................
Explained Variance Score               [EVS]  0.9554457225986508
Mean absolute error                    [MAE]  6066.938750547397
Mean squared error                     [MSE]  79212072.19523914
Root mean squared error               [RMSE]  8900.116414701502
Mean squared logarithmic error        [MSLE]  0.005160802533279368
Median absolute error (reference)    [MedAE]  4035.7363999999943
R²score, coefficie

## BIP error

In [11]:
#for mdl_lbl, df in dfs_dict.items():
#    print('................................ ' + mdl_lbl + '................................')
#    get_BIP_error(df)
#    get_BIP_error_(df)

In [1]:

import pandas as pd
df = pd.read_csv('./dataset/train.csv')[['StoreID', 'Region']]

In [2]:
df.head()

Unnamed: 0,StoreID,Region
0,1000,7
1,1000,7
2,1000,7
3,1000,7
4,1000,7


In [3]:
df = df[['StoreID', 'Region']]

In [4]:
df.head()

Unnamed: 0,StoreID,Region
0,1000,7
1,1000,7
2,1000,7
3,1000,7
4,1000,7


In [1]:
import pandas as pd
from BIP import get_BIP_error
test_64_Model_XGBoost = pd.read_csv('./dataset/test_m12_64_Model_XGBoost_final.csv')

df = test_64_Model_XGBoost

In [2]:
df.head()

Unnamed: 0,StoreID,Date,IsHoliday,HasPromotions,NearestCompetitor,Region,NumberOfSales,Region_AreaKM2,Region_GDP,Region_PopulationK,...,p4,p5,p6,p7,p8,p9,p10,p11,p12,_NumberOfSales
0,1000,01/01/2018,0,0,326,7,8540,9643,17130,2770,...,-0.000163,-0.002012,0.002307,0.001049,-0.001226,0.000372,9.5e-05,0.000836,-0.00017,8007.3823
1,1000,02/01/2018,0,0,326,7,10364,9643,17130,2770,...,-0.000491,-0.00207,0.002708,-0.000669,-0.001426,-0.000215,-0.00054,0.000367,-0.000163,6596.4365
2,1000,03/01/2018,0,0,326,7,4676,9643,17130,2770,...,-0.000582,-0.002125,0.003129,-0.000895,-0.001529,-0.000557,3.1e-05,-0.000107,-3.7e-05,5140.5054
3,1000,05/01/2018,0,0,326,7,6267,9643,17130,2770,...,-0.000389,0.002572,-0.000628,-0.000278,-0.001221,0.00054,0.000499,-8.7e-05,-0.000236,6469.892
4,1000,06/01/2018,0,0,326,7,5953,9643,17130,2770,...,-0.000154,0.000616,0.003896,0.0002,-0.001384,-8.7e-05,-9.1e-05,8.5e-05,0.000273,5815.529


In [3]:
get_BIP_error(df)

BIP total error: 0.04913953326583034


0.04913953326583034