![](https://www.revistacobertura.com.br/site-2017/wp-content/uploads/2018/12/Machine-Learning.jpeg)

### Table of Contents

1. [Introduction](#1.-Introduction)

2. [Base Libraries](#2.-Base-Libraries)

3. [Support Functions](#3.-Support-Functions)

4. [Data Preprocessing](#4.-Data-Preprocessing)

5. [Models and Hyperparametrization](#5.-Models-and-Hyperparametrization)

6. [Diebold-Mariano Test](#6.-Diebold-Mariano-Test)

7. [Conclusions](#7.-Conclusions)

8. [References](#8.-References)

# 1. Introduction

Trying to find an implementation of the Diebold-Mariano test, I ended up finding just one code made available on GitHub (Thanks to John Tsang for that - https://github.com/johntwk/Diebold-Mariano-Test/). So, I thought it could help the Kaggle community to develop a kernel that contained this metric, for anyone who needed to use it. 
However, to help me, I ended up creating a simple script (also available here for the community) called **easymetrics**. It contains not only the Diebold-Mariano test, but also easier ways (for me at least) to achieve the results of training, testing and validating 5 metrics: R2, Explained Variance Score, RMSE, RMSLE and MAE.

In this kernel, **XGboost** and **LightGBM** frameworks are hyperparametrized and compared using Diebold-Mariano Test. The dataset used is a energy generation compilation of several countries, measured in THh between 2000 and 2019. Its contents were extracted from World in Data.


# 2. Base Libraries

First, the basic libraries are imported: **pandas**, **matplotlib** and **numpy** to Dataframes, Graphs and numeric operations; **MinMaxScaler** to normalize our data between 0 and 1, **train_test_split** to help split the dataset (usually 70% training/ 30% testing), **GridSearchCV** for hyperparameter tuning, **xgboost** and **lightgbm** for our machine learning models, and the new created **easymetrics** library for performance tests (this library is avaiable here at Kaggle, in "Utility Scripts" Section

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
# Transformation
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import power_transform
# Models
import xgboost as xgb
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from xgboost.sklearn import XGBRegressor
from sklearn.model_selection import GridSearchCV
# Metrics
from easymetrics import diebold_mariano_test
from easymetrics import r2_all
from easymetrics import evs_all
from easymetrics import mae_all
from easymetrics import rmse_all
from easymetrics import rmsle_all

To measure the efficiency of our generated models, we use a bunch of metrics (maybe more than we need):

* **mean_absolute_error** (MAE): a measure of errors between paired observations expressing the same phenomenon;
* **mean_squared_error**  (root - RMSE): the standard deviation of the residuals (prediction errors);
* **mean_squared_log_error** (root - RMSLE): that measures the ratio between actual and predicted;
* **r2_score** (R2): coefficient of determination, the proportion of the variance in the dependent variable that is predictable from the independent variable(s); and
* **explained_variance_score** (EVS): measures the discrepancy between a model and actual data
* **diebold_mariano_test** (DM): compares the forecast accuracy of two forecast methods.

# 3. Support Functions

Functions created to help reduce the code. In this case, this one show a graph of our metrics.

In [None]:
def plot_r2_mae_evs_rmse_rmsle(data):
    X = np.arange(5)
    fig = plt.figure(figsize=(12, 6))
    ax = fig.add_axes([0,0,1,1])
    ax.bar(X + 0.00, data[0], color = 'b', width = 0.25)
    ax.bar(X + 0.25, data[1], color = 'r', width = 0.25)
    ax.bar(X + 0.50, data[2], color = 'g', width = 0.25)
    ax.set_ylabel("Values")
    ax.set_title("Metrics")
    ax.set_xticks(X + 0.20 / 2)
    ax.set_xticklabels(('R2', 'MAE', 'EVS', 'RMSE', 'RMSLE'))
    ax.legend(labels=['Train', 'Test','Valid'])
    return fig, ax

# 4. Data Preprocessing

Initially, the data is imported from the dataset, and the categorical columns are excluded (in this case, only "Country"). To transform the dataset and still keep it as a Dataframe, the *scaler* library is used inside the *Dataframe* library, normalizing the data between 0 and 1, and keeping the dataframe properties. After this, the *describe* function is called, to show some important data about the dataframe, like mean, max, min, std and others.

In [None]:
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

df_power = pd.read_csv('../input/hydropower-generation/Hydropower_Consumption.csv', sep = ',')
cat_columns = df_power.select_dtypes('object').columns
df_power = df_power.drop(columns = cat_columns)

scaler = MinMaxScaler()

df_power = power_transform(df_power, method='yeo-johnson')
df_power = scaler.fit_transform(df_power)

df_power = pd.DataFrame(scaler.fit_transform(df_power), 
                        columns=['2000','2001','2002','2003','2004','2005',
                                 '2006','2007','2008','2009','2010','2011',
                                 '2012','2013','2014','2015','2016','2017',
                                 '2018','2019'])

sns.distplot(df_power.iloc[:, 0:19])

In [None]:
df_power.describe()

In [None]:
X = df_power.iloc[:,0:19]
y = df_power.iloc[:,19]

# (optional) plot train & test
fig, ax=plt.subplots(1,2,figsize=(30, 6))
sns.distplot(X, ax=ax[0])
sns.distplot(y, ax=ax[1])

Now, my goal was compare two models that would predict power generation for 2019, based on the previous 18 years (2000 - 2018). For this, the data set was separated into X and y, X being my prediction data, and y what I intended to predict.  The data is divided into training (70%) and testing (30%).

# 5. Models and Hyperparametrization

## 5.1. XGboost Hyperparametrization

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 0)

The **XGBoost** is an open-source software library which provides a gradient boosting framework for C++, Java, Python,R, Julia,Perl, and Scala. It works on Linux, Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, and Apache Flink. It has gained much popularity and attention recently as the algorithm of choice for many winning teams of machine learning competitions.

In [None]:
xgbm = XGBRegressor()
param_grid = {'nthread':[4], #when use hyperthread, xgboost may become slower
              'objective':['reg:squarederror'],
              'learning_rate': [.03, 0.05, .07], #so called `eta` value
              'max_depth': [5, 6, 7],
              'min_child_weight': [4],
              'subsample': [0.7],
              'colsample_bytree': [0.7],
              'n_estimators': [500]}

xgbm_gs = GridSearchCV(xgbm,
                        param_grid,
                        cv = 5,
                        n_jobs = -1,
                        verbose = 2)

xgbm_gs.fit(X_train, y_train)

y_pred_xgbm = xgbm_gs.predict(X_test)

xgbm_best = xgbm_gs.best_estimator_
print(xgbm_gs.best_params_)

The traditional way of performing hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a held-out validation set. After this, metrics are measured through our library, to define model's performance.

In [None]:
r2_train, r2_test, r2_valid = r2_all(xgbm_best, X_train, y_train, X_test, y_test, y_pred_xgbm)
evs_train, evs_test, evs_valid = evs_all(xgbm_best, X_train, y_train, X_test, y_test, y_pred_xgbm)
mae_train, mae_test, mae_valid = mae_all(xgbm_best, X_train, y_train, X_test, y_test, y_pred_xgbm)
rmse_train, rmse_test, rmse_valid = rmse_all(xgbm_best, X_train, y_train, X_test, y_test, y_pred_xgbm)
rmsle_train, rmsle_test, rmsle_valid = rmsle_all(xgbm_best, X_train, y_train, X_test, y_test, y_pred_xgbm)

data = [[r2_train, mae_train, evs_train, rmse_train, rmsle_train],
        [r2_test, mae_test, evs_test, rmse_test, rmsle_test],
        [r2_valid, mae_valid, evs_valid, rmse_valid, rmsle_valid]]

fig, ax = plot_r2_mae_evs_rmse_rmsle(data)

## 5.2. LightGBM Hyperparametrization

**LightGBM**, short for Light Gradient Boosting Machine, is a free and open source distributed gradient boosting framework for machine learning originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.

In [None]:
lgbm = lgb.LGBMRegressor()
param_grid = {'task': ['train'],
              'boosting_type': ['gbdt'],
              'objective': ['regression'],
              'metric': ['l2', 'auc'],
              'learning_rate': [0.005],
              'feature_fraction': [0.9],
              'bagging_fraction': [0.7],
              'bagging_freq': [10],
              'verbose': [0],
              'max_depth': [8],
              'num_leaves': [128],  
              'max_bin': [512],
              #'num_iterations': [100000],
              'n_estimators': [1000]}

lgbm_gs = GridSearchCV(lgbm,
                        param_grid,
                        cv = 5,
                        n_jobs = -1,
                        verbose = 2)

lgbm_gs.fit(X_train, y_train)

y_pred_lgbm = lgbm_gs.predict(X_test)

lgbm_best = lgbm_gs.best_estimator_
print(lgbm_gs.best_params_)

The traditional way of performing hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a held-out validation set. After this, metrics are measured through our library, to define model's performance.

In [None]:
r2_train, r2_test, r2_valid = r2_all(lgbm_best, X_train, y_train, X_test, y_test, y_pred_lgbm)
evs_train, evs_test, evs_valid = evs_all(lgbm_best, X_train, y_train, X_test, y_test, y_pred_lgbm)
mae_train, mae_test, mae_valid = mae_all(lgbm_best, X_train, y_train, X_test, y_test, y_pred_lgbm)
rmse_train, rmse_test, rmse_valid = rmse_all(lgbm_best, X_train, y_train, X_test, y_test, y_pred_lgbm)
rmsle_train, rmsle_test, rmsle_valid = rmsle_all(lgbm_best, X_train, y_train, X_test, y_test, y_pred_lgbm)

data = [[r2_train, mae_train, evs_train, rmse_train, rmsle_train],
        [r2_test, mae_test, evs_test, rmse_test, rmsle_test],
        [r2_valid, mae_valid, evs_valid, rmse_valid, rmsle_valid]]

fig, ax = plot_r2_mae_evs_rmse_rmsle(data)

Once we have the model ready and the prediction made, we can apply the metrics and analyze the results.

# 6. Diebold-Mariano Test

Suppose that the difference between the first list of prediction and the actual values is e1 and the second list of prediction and the actual value is e2. The length of time-series is T.
Then d can be defined based on different criterion (crit).

* MSE : d = (e1)^2 - (e2)^2
* MAD : d = abs(e1) - abs(e2)
* MAPE: d = abs((e1 - actual)/(actual))
* Poly: d = (e1)^power - (e2)^power

The null hypothesis is E[d] = 0.
The test statistics follow the student-T distribution with degree of freedom (T - 1).

In [None]:
rt = diebold_mariano_test(y_test,y_pred_xgbm,y_pred_lgbm,h = 1, crit="MAD")
print(rt)
rt = diebold_mariano_test(y_test,y_pred_xgbm,y_pred_lgbm,h = 1, crit="MSE")
print(rt)
rt = diebold_mariano_test(y_test,y_pred_xgbm,y_pred_lgbm,h = 1, crit="poly", power=4)
print(rt)

# 7. Conclusions

The results of the Diebold-Mariano Test shows that the difference between this two forecasts is not too great, which leads us to believe that both frameworks have a similar capacity in terms of machine learning for regression.

# 8. References

[1] Brownlee, Jason (March 31, 2020). "Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost".

[2] Chen, Tianqi; Guestrin, Carlos (2016). "XGBoost: A Scalable Tree Boosting System". In Krishnapuram, Balaji; Shah, Mohak; Smola, Alexander J.; Aggarwal, Charu C.; Shen, Dou; Rastogi, Rajeev (eds.). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. ACM. pp. 785–794. arXiv:1603.02754. doi:10.1145/2939672.2939785.

[3] Diebold, F. X. and Mariano, R. S. (1995), Comparing predictive accuracy, Journal of business & economic statistics 13(3), 253-264.

[4] Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of forecasting, 13(2), 281-291.
 
[5] Kopitar, Leon; Kocbek, Primoz; Cilar, Leona; Sheikh, Aziz; Stiglic, Gregor (July 20, 2020). "Early detection of type 2 diabetes mellitus using machine learning-based prediction models". Scientific Reports. 10 (1): 11981. Bibcode:2020NatSR..1011981K. doi:10.1038/s41598-020-68771-z. PMC 7371679. PMID 32686721 – via www.nature.com.

