# Models' predictions evaluation

**NOTEBOOK GOAL**: Evaluate the predicted NumberOfSales values.


Compared predictions:

- **1_RFR** - Notebook 5.3 Random forest
- **2_XGB** - Notebook 6.4 XGBoost
- **3_AVG** - Notebook 7.0 AVG Monthly average
- **4_ENS_AVG** - Notebook 8.2 Ensamble by averaging all the previous models

In [44]:
from import_man import *
import collections

from BIP import get_BIP_error, apply_BIP_submission_format

### Load predicted tests

**NOTE** If you cannot load the followig datasets, please go to the corresponding notebook and run it to generate the related dataset file. 

In [45]:
dfs_dict = collections.OrderedDict()
# the following dataset will be evaluated

In [46]:
#dfs_dict['RFR'] = pd.read_csv('./dataset/test_m12_53_RFR_on_prep.csv')

In [47]:
dfs_dict['RFR'] = pd.read_csv('./dataset/testsplit_53_Model_RFR_on_prep.csv')

In [48]:
#dfs_dict['XGB'] = pd.read_csv('./dataset/test_m12_64_Model_XGBoost_final.csv')

In [49]:
dfs_dict['XGB'] = pd.read_csv('./dataset/testsplit_63_Model_XGB.csv')

In [50]:
#dfs_dict['AVG'] = pd.read_csv('./dataset/test_m12_70_Model_monthly_average.csv')

In [51]:
#dfs_dict['AVG'] = pd.read_csv('./dataset/test4_81_AVG_RFR_XGB.csv')

In [52]:
# apply the apply_BIP_submission_format to all the dataframes
for mdl_lbl, df in dfs_dict.items():
    dfs_dict[mdl_lbl] = apply_BIP_submission_format(df)

In [53]:
# The following dataset is already in the BIP submission format
#dfs_dict['ENS_AVG'] = pd.read_csv('./dataset/test_m12_82_Ensemble_average.csv')

## Evaluation metrics

1. **[EVS] Explained Variance Score**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score)

    The mean_absolute_error function computes mean absolute error, a risk metric corresponding to the expected value of the absolute error loss or l1-norm loss.
    
    $$ \texttt{explained__variance}(y, \hat{y}) = 1 - \frac{Var\{ y - \hat{y}\}}{Var\{y\}} $$
    
    The best possible score is 1.0, lower values are worse.
    
2. **[MAE] Mean absolute error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error)

    The mean_squared_error function computes mean square error, a risk metric corresponding to the expected value of the squared (quadratic) error or loss.
   
    $$ \text{MAE}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| y_i - \hat{y}_i \right| $$
    
3. **[MSE] Mean squared error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-error)

    The mean_squared_error function computes mean square error, a risk metric corresponding to the expected value of the squared (quadratic) error or loss.
    
    $$ \text{MSE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2 $$
    
4. **[RMSE] Root mean squared error**

    $$ \text{RMSE}(y, \hat{y}) = \sqrt{\frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2} $$
    
5. **[MSLE] Mean squared logarithmic error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-logarithmic-error)
    
    The mean_squared_log_error function computes a risk metric corresponding to the expected value of the squared logarithmic (quadratic) error or loss.
    
    $$ \text{MSLE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (\log_e (1 + y_i) - \log_e (1 + \hat{y}_i) )^2 $$
    
    Where log_e (x) means the natural logarithm of x. This metric is best to use when targets having exponential growth, such as population counts, average sales of a commodity over a span of years etc. Note that this metric penalizes an under-predicted estimate greater than an over-predicted estimate.

6. **[MedAE] Median absolute error**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#median-absolute-error)

    The median_absolute_error is particularly interesting because it is robust to outliers. The loss is calculated by taking the median of all absolute differences between the target and the prediction.
    
    $$ \text{MedAE}(y, \hat{y}) = \text{median}(\mid y_1 - \hat{y}_1 \mid, \ldots, \mid y_n - \hat{y}_n \mid)$$
    
7. **[R²] R²score, the coefficient of determination**  [(reference)](http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination)

    The r2_score function computes R², the coefficient of determination. It provides a measure of how well future samples are likely to be predicted by the model. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
    
    $$ R^2(y, \hat{y}) = 1 - \frac{\sum_{i=0}^{n_{\text{samples}} - 1} (y_i - \hat{y}_i)^2}{\sum_{i=0}^{n_\text{samples} - 1} (y_i - \bar{y})^2}$$
    
    where
    
    $$ \bar{y} =  \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} y_i $$

In [54]:
list(dfs_dict.values())[0].head()

Unnamed: 0,StoreID,Month,Target,NumberOfSales
0,1000,1,15264,15520.579617
1,1000,2,8981,9840.734749
2,1000,3,21904,23323.335985
3,1000,4,24757,22331.130159
4,1000,6,51534,49906.908483


In [55]:
from sklearn import metrics

from math import sqrt

def print_regr_stats(df):
    y_true = df['Target']
    y_pred = df['NumberOfSales']
    
    stats = collections.OrderedDict()
    stats['EVS']   = metrics.explained_variance_score(y_true, y_pred)
    stats['MAE']   = metrics.mean_absolute_error(y_true, y_pred)
    stats['MSE']   = metrics.mean_squared_error(y_true, y_pred)
    stats['RMSE']  = sqrt(stats['MSE'])
    stats['MSLE']  = metrics.mean_squared_log_error(y_true, y_pred)
    stats['MedAE'] = metrics.median_absolute_error(y_true, y_pred)
    stats['R2']    = metrics.r2_score(y_true, y_pred)
    
    print("Explained Variance Score               [EVS] ", stats['EVS'])
    print("Mean absolute error                    [MAE] ", stats['MAE'])
    print("Mean squared error                     [MSE] ", stats['MSE'])
    print("Root mean squared error               [RMSE] ", stats['RMSE'])
    print("Mean squared logarithmic error        [MSLE] ", stats['MSLE'])
    print("Median absolute error (reference)    [MedAE] ", stats['MedAE'])
    print("R²score, coefficient of determination   [R²] ", stats['R2'])
    
    return stats


In [56]:
models_stats = collections.OrderedDict([(k, []) for k in ['EVS','MAE', 'MSE', 'RMSE', 'MSLE', 'MedAE', 'R2']])


for mdl_lbl, df in dfs_dict.items():
    print('................................ ' + mdl_lbl + '................................')
    stats = print_regr_stats(df)
    
    # add the model statistics to the main index
    for name, val in stats.items():
        models_stats[name].append((mdl_lbl, val))
        
    print('\n')
    
    
print("\n\nSTATISTICS COMPARISON\n")

for name, l in models_stats.items():
    print(name)
    pprint(list(enumerate(sorted(l, key=lambda tup: tup[1]))))


................................ RFR................................
Explained Variance Score               [EVS]  0.9888127734320802
Mean absolute error                    [MAE]  887.6448282329818
Mean squared error                     [MSE]  1726837.3417475056
Root mean squared error               [RMSE]  1314.091831550408
Mean squared logarithmic error        [MSLE]  0.0052798159302647266
Median absolute error (reference)    [MedAE]  609.215261678939
R²score, coefficient of determination   [R²]  0.9887989003436097


................................ XGB................................
Explained Variance Score               [EVS]  0.9911733013681217
Mean absolute error                    [MAE]  778.7089587980824
Mean squared error                     [MSE]  1367360.1670107795
Root mean squared error               [RMSE]  1169.341766555347
Mean squared logarithmic error        [MSLE]  0.0041367042961507755
Median absolute error (reference)    [MedAE]  525.9935000000005
R²score, coeffic

## BIP error

In [57]:
for mdl_lbl, df in dfs_dict.items():
    print('................................ ' + mdl_lbl + '................................')
    get_BIP_error(df, already_BIP_format=True)
    print('\n')

................................ RFR................................
BIP total error: 0.04512461531703465


................................ XGB................................
BIP total error: 0.04017197179034863




| # | Model name                 | Test-set months |         BIP error     |
|--------------------------------------------------------------------------|
| 1 | Random Forest Regression   |  3,4 2016       | 0.056858865469469826  |
| 2 | Random Forest Regression   |  1,2 2017       | 0.03313866910756147   |
| 3 | Random Forest Regression   |  1,2 2018       | 0.03885510345652681   | 
| 4 | Random Forest Regression   |  60 random days | 0.04512461531703465   | 
| 5 |  XGBoost                   |  3,4 2016       | 0.0632513068571206    |
| 6 |  XGBoost                   |  1,2 2017       | 0.03288920541031862   |
| 7 |  XGBoost                   |  1,2 2018       | 0.05569450465576987   |
| 8 |  XGBoost                   |  60 random days | 0.04017197179034863   |
| 9 |  Ensemble                  |  3,4 2016       | 0.057574756858398995  |
| 10|  Ensemble                  |  1,2 2017       | 0.03133979718107204   |
| 11|  Ensemble                  |  1,2 2018       | 0.0460454830406279    |


Errors of months 3,4 2016

................................ RFR................................
BIP total error: 0.056858865469469826


................................ XGB................................
BIP total error: 0.0632513068571206


................................ AVG................................
BIP total error: 0.057574756858398995

Errors of months 1,2 2018

................................ RFR................................
BIP total error: 0.03885510345652681


................................ XGB................................
BIP total error: 0.05569450465576987


................................ AVG................................
BIP total error: 0.0460454830406279

Errors of months 1,2 2017

................................ RFR................................
BIP total error: 0.03313866910756147


................................ XGB................................
BIP total error: 0.03288920541031862


................................ AVG................................
BIP total error: 0.03133979718107204

Errors of 60 random sampled days

................................ RFR................................
BIP total error: 0.04512461531703465
R²score, coefficient of determination   [R²]  0.9887989003436097


................................ XGB................................
BIP total error: 0.04017197179034863
R²score, coefficient of determination   [R²]  0.9910492782368341