# Datasets Benchmark

**Summary of this Article** 
- Loading best hyperparameters for each model
- Model training
- Results discussion


## Loading best hyperparameters for each model

As explained in another notebook, the hyperparameters for each model were tunnned using the Optuna library. For each dataset and model, the hyperparameters have different values. The values for each hyperparameters are seen bellow.   


In [16]:
# Import hyperparameters dataset.
import os 
import pandas as pd

In [17]:
sparse_hyper_params = {}
focused_hyper_params = {}
boolean_hyper_params = {}
for file in os.listdir('hyper_params_results'):
    if file.endswith('.csv') and 'sparse' in file.split('_') and 'classifier' not in file:
        df = pd.read_csv(os.path.join('hyper_params_results', file))
        sparse_hyper_params[file] = df
    elif file.endswith('.csv') and 'focused' in file.split('_') and 'classifier' not in file:
        df = pd.read_csv(os.path.join('hyper_params_results', file))
        focused_hyper_params[file] = df
    elif file.endswith('.csv') and 'classifier' in file:
        df = pd.read_csv(os.path.join('hyper_params_results', file))
        boolean_hyper_params[file] = df
print('Sparse hyper params:\n')
for key in sparse_hyper_params.keys():
    print(key, ':\n ',sparse_hyper_params[key])
print('Focused hyper params:\n')
for key in focused_hyper_params.keys():
    print(key, ':\n',focused_hyper_params[key])
print('Boolean hyper params:\n')
for key in boolean_hyper_params.keys():
    print(key, ':\n',boolean_hyper_params[key])

Sparse hyper params:

params_gradient_boost_regression_sparse_max_u.csv :
            params                  value
0   n_estimators                    544
1  learning_rate    0.36423151911958196
2           loss          squared_error
3          value  0.0064745090355284845
params_gradient_boost_regression_sparse_min_u.csv :
            params               value
0   n_estimators                  22
1  learning_rate  0.8296843568407096
2           loss      absolute_error
3          value                 0.0
params_support_vector_regression_sparse_max_u.csv :
     params                   value
0  kernel                    poly
1       C  0.00018807624871896921
2  degree                       5
3   gamma      0.8511446423066539
4   value      0.4008346176510517
params_support_vector_regression_sparse_min_u.csv :
     params                   value
0  kernel                    poly
1       C  3.4084813417139984e-06
2  degree                       2
3   gamma  7.5582957595600045e-06
4  

In [18]:
import ast
def get_hyper_params_from_df(df):
    output = {}
    for row in df.iterrows():
        if row[1]['params'] != 'value':
            try:
                output[row[1]['params']] = ast.literal_eval(row[1]['value'])
            except :
                output[row[1]['params']] = row[1]['value']
    return output
get_hyper_params_from_df(sparse_hyper_params['params_gradient_boost_regression_sparse_max_u.csv'])

{'n_estimators': 544,
 'learning_rate': 0.36423151911958196,
 'loss': 'squared_error'}

## Loading the data

In [19]:
import sys
sys.path.append('..')
from thesis_package import aimodels as my_ai, utils, metrics

import sklearn.metrics
from sklearn.model_selection import train_test_split

exogenous_data = pd.read_csv('..\data\processed\production\exogenous_data_extended.csv').drop(columns=['date'])

In [20]:

# Regression data sparse
y_max_u_sparse = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_max_constr.csv').drop(columns=['timestamps'])
y_min_u_sparse = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_min_constr.csv').drop(columns=['timestamps'])

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_max_u_sparse)
data_max_u_sparse = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_max_u_sparse, scaling=True)
data_max_u_scaled_sparse = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_min_u_sparse)
data_min_u_sparse = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_min_u_sparse, scaling=True)
data_min_u_scaled_sparse = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}


In [21]:

# Regresison data focused
y_max_u_focused = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_max_bal_constr.csv')
exogenous_data_focused_max_u = pd.read_csv('..\data\ground_truth\exogenous_data_vm_pu_max_bal.csv').drop(columns=['date'])
y_min_u_focused = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_min_bal_constr.csv')
exogenous_data_focused_min_u = pd.read_csv('..\data\ground_truth\exogenous_data_vm_pu_min_bal.csv').drop(columns=['date'])

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data_focused_max_u, y_max_u_focused)
data_max_u_focused = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data_focused_max_u, y_max_u_focused, scaling=True)
data_max_u_scaled_focused = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data_focused_min_u, y_min_u_focused)
data_min_u_focused = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data_focused_min_u, y_min_u_focused, scaling=True)
data_min_u_scaled_focused = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}


In [26]:
# Classification data
y_max_u = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_max_bool_constr.csv').drop(columns=['timestamps'])
y_min_u = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_min_bool_constr.csv').drop(columns=['timestamps'])
y_max_u = y_max_u[utils.cols_with_positive_values(y_max_u)]
y_min_u = y_min_u[utils.cols_with_positive_values(y_min_u)]

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_max_u)
data_max_u_bool = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_max_u, scaling=True)
data_max_u_bool_scaled = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_min_u)
data_min_u_bool = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

train_x, test_x, train_y, test_y = utils.split_and_suffle(exogenous_data, y_min_u, scaling=True)
data_min_u_bool_scaled = {'X_train': train_x, 'X_test': test_x, 'y_train': train_y, 'y_test': test_y}

## Training models
In this section the models will be trained with the hyperparameters loaded above. All the models will be stored in the same `Context` object for later evaluation. The `Context` object is a class that stores all the models and their respective hyperparameters. The `Context` object is defined in the `aimodels.py` file. The `Context` object is defined as follows:

### Max Voltage

In [27]:
sparse_hyper_params.keys()

dict_keys(['params_gradient_boost_regression_sparse_max_u.csv', 'params_gradient_boost_regression_sparse_min_u.csv', 'params_support_vector_regression_sparse_max_u.csv', 'params_support_vector_regression_sparse_min_u.csv', 'params_xgboost_regression_sparse_max_u.csv', 'params_xgboost_regression_sparse_min_u.csv'])

In [29]:
# max_u regression sparse
if 'max_u_regressor_sparse.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Linear Regression
    regressor_max_u = my_ai.Context(strategy=my_ai.LinearRegressionStrategy())
    regressor_max_u.fit(data=data_max_u_sparse)
    # Gradient Boost Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_gradient_boost_regression_sparse_max_u.csv'])
    regressor_max_u.strategy = my_ai.GradientBoostRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_sparse)
    # Extreme GBoost Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_xgboost_regression_sparse_max_u.csv']) 
    regressor_max_u.strategy = my_ai.XGBoostRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_sparse)
    # Support Vector Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_support_vector_regression_sparse_max_u.csv'])
    regressor_max_u.strategy = my_ai.SupportVectorRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_scaled_sparse)
    utils.serialize_object('pickles\dataset_benchmark\max_u_regressor_spare', regressor_max_u)
else: 
    regressor_max_u = utils.deserialize_object('pickles\dataset_benchmark\\max_u_regressor_sparse')
# Linear Regression
prediction_lr_max_u = regressor_max_u.strategies[0].predict(data=data_max_u_sparse)
prediction_lr_max_u = pd.DataFrame(prediction_lr_max_u , columns=data_max_u_sparse['y_test'].columns)
# Gradient Boost Regression
prediction_gb_max_u =  regressor_max_u.strategies[1].predict(data=data_max_u_sparse)
prediction_gb_max_u = pd.DataFrame(prediction_gb_max_u, columns=data_max_u_sparse['y_test'].columns)
# Extreme GBoost Regression
prediction_xgb_max_u =  regressor_max_u.strategies[2].predict(data=data_max_u_sparse)
prediction_xgb_max_u = pd.DataFrame(prediction_xgb_max_u, columns=data_max_u_sparse['y_test'].columns)
# Support Vector Regression
prediction_svr_max_u =  regressor_max_u.strategies[3].predict(data=data_max_u_scaled_sparse)
prediction_svr_max_u = pd.DataFrame(prediction_svr_max_u, columns=data_max_u_sparse['y_test'].columns)



In [38]:
# max_u regression focused
if 'max_u_regressor_focused.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Linear Regression
    regressor_max_u = my_ai.Context(strategy=my_ai.LinearRegressionStrategy())
    regressor_max_u.fit(data=data_max_u_focused)
    # Gradient Boost Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_gradient_boost_regression_focused_max_u.csv'])
    regressor_max_u.strategy = my_ai.GradientBoostRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_focused)
    # Extreme GBoost Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_xgboost_regression_focused_max_u.csv']) 
    regressor_max_u.strategy = my_ai.XGBoostRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_focused)
    # Support Vector Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_support_vector_regression_focused_max_u.csv'])
    regressor_max_u.strategy = my_ai.SupportVectorRegressorStrategy(hyper_params)
    regressor_max_u.fit(data=data_max_u_scaled_focused)
    utils.serialize_object('pickles\dataset_benchmark\max_u_regressor_focused', regressor_max_u)
else: 
    regressor_max_u = utils.deserialize_object('pickles\dataset_benchmark\\max_u_regressor_focused')
# Linear Regression
prediction_lr_max_u = regressor_max_u.strategies[0].predict(data=data_max_u_focused)
prediction_lr_max_u = pd.DataFrame(prediction_lr_max_u , columns=data_max_u_focused['y_test'].columns)
# Gradient Boost Regression
prediction_gb_max_u =  regressor_max_u.strategies[1].predict(data=data_max_u_focused)
prediction_gb_max_u = pd.DataFrame(prediction_gb_max_u, columns=data_max_u_focused['y_test'].columns)
# Extreme GBoost Regression
prediction_xgb_max_u =  regressor_max_u.strategies[2].predict(data=data_max_u_focused)
prediction_xgb_max_u = pd.DataFrame(prediction_xgb_max_u, columns=data_max_u_focused['y_test'].columns)
# Support Vector Regression
prediction_svr_max_u =  regressor_max_u.strategies[3].predict(data=data_max_u_scaled_focused)
prediction_svr_max_u = pd.DataFrame(prediction_svr_max_u, columns=data_max_u_focused['y_test'].columns)

In [30]:
# max_u classification
if 'max_u_classifier.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Gradient Boost Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_gradient_boost_classifier_max_u.csv'])
    classifier_max_u = my_ai.Context(strategy=my_ai.GradientBoostClassifierStrategy(hyper_params))
    classifier_max_u.fit(data=data_max_u_bool)
    # Extreme GBoost Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_xgboost_classifier_max_u.csv'])
    classifier_max_u.strategy = my_ai.XGBoostClassifierStrategy(hyper_params)
    classifier_max_u.fit(data=data_max_u_bool)
    # Support Vector Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_support_vector_classifier_max_u.csv'])
    classifier_max_u.strategy = my_ai.SupportVectorClassifierStrategy(hyper_params)
    classifier_max_u.fit(data=data_max_u_bool_scaled)
    utils.serialize_object('pickles\dataset_benchmark\max_u_classifier', classifier_max_u)
else: 
    classifier_max_u = utils.deserialize_object('pickles\dataset_benchmark\max_u_classifier')
# Gradient Boost Classifier
prediction_gb_max_u = classifier_max_u.strategies[0].predict(data=data_max_u_bool)
prediction_gb_max_u = pd.DataFrame(prediction_gb_max_u, columns=data_max_u_bool['y_test'].columns)
# Extreme GBoost Classifier
prediction_xgb_max_u = classifier_max_u.strategies[1].predict(data=data_max_u_bool)
prediction_xgb_max_u = pd.DataFrame(prediction_xgb_max_u, columns=data_max_u_bool['y_test'].columns)
# Support Vector Classifier
prediction_svr_max_u = classifier_max_u.strategies[2].predict(data=data_max_u_bool_scaled)
prediction_svr_max_u = pd.DataFrame(prediction_svr_max_u, columns=data_max_u_bool['y_test'].columns)

Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false 

### Min u regression training


In [35]:
# min_u regression sparse
if 'min_u_regressor_sparse.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Linear Regression
    regressor_min_u = my_ai.Context(strategy=my_ai.LinearRegressionStrategy())
    regressor_min_u.fit(data=data_min_u_sparse)
    # Gradient Boost Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_gradient_boost_regression_sparse_min_u.csv'])
    regressor_min_u.strategy = my_ai.GradientBoostRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_sparse)
    # Extreme GBoost Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_xgboost_regression_sparse_min_u.csv'])
    regressor_min_u.strategy = my_ai.XGBoostRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_sparse)
    # Support Vector Regression
    hyper_params = get_hyper_params_from_df(sparse_hyper_params['params_support_vector_regression_sparse_min_u.csv'])
    regressor_min_u.strategy = my_ai.SupportVectorRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_scaled_sparse)
    utils.serialize_object('pickles\dataset_benchmark\min_u_regressor_sparse', regressor_min_u)
else:
    regressor_min_u = utils.deserialize_object('pickles\dataset_benchmark\min_u_regressor_sparse')
# Linear Regression
prediction_lr_min_u = regressor_min_u.strategies[0].predict(data=data_min_u_sparse)
prediction_lr_min_u = pd.DataFrame(prediction_lr_min_u , columns=data_min_u_sparse['y_test'].columns)
# Gradient Boost Regression
prediction_gb_min_u =  regressor_min_u.strategies[1].predict(data=data_min_u_sparse)
prediction_gb_min_u = pd.DataFrame(prediction_gb_min_u, columns=data_min_u_sparse['y_test'].columns)
# Extreme GBoost Regression
prediction_xgb_min_u =  regressor_min_u.strategies[2].predict(data=data_min_u_sparse)
prediction_xgb_min_u = pd.DataFrame(prediction_xgb_min_u, columns=data_min_u_sparse['y_test'].columns)
# Support Vector Regression
prediction_svr_min_u =  regressor_min_u.strategies[3].predict(data=data_min_u_scaled_sparse)
prediction_svr_min_u = pd.DataFrame(prediction_svr_min_u, columns=data_min_u_sparse['y_test'].columns)

Parameters: { "colsample_bytree", "subsample" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "colsample_bytree", "subsample" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "colsample_bytree", "subsample" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such c

In [39]:
# min_u regression focused
if 'min_u_regressor_focused.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Linear Regression
    regressor_min_u = my_ai.Context(strategy=my_ai.LinearRegressionStrategy())
    regressor_min_u.fit(data=data_min_u_focused)
    # Gradient Boost Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_gradient_boost_regression_focused_min_u.csv'])
    regressor_min_u.strategy = my_ai.GradientBoostRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_focused)
    # Extreme GBoost Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_xgboost_regression_focused_min_u.csv'])
    regressor_min_u.strategy = my_ai.XGBoostRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_focused)
    # Support Vector Regression
    hyper_params = get_hyper_params_from_df(focused_hyper_params['params_support_vector_regression_focused_min_u.csv'])
    regressor_min_u.strategy = my_ai.SupportVectorRegressorStrategy(hyper_params)
    regressor_min_u.fit(data=data_min_u_scaled_focused)
    utils.serialize_object('pickles\dataset_benchmark\min_u_regressor_focused', regressor_min_u)
else:
    regressor_min_u = utils.deserialize_object('pickles\dataset_benchmark\min_u_regressor_focused')
# Linear Regression
prediction_lr_min_u = regressor_min_u.strategies[0].predict(data=data_min_u_focused)
prediction_lr_min_u = pd.DataFrame(prediction_lr_min_u , columns=data_min_u_focused['y_test'].columns)
# Gradient Boost Regression
prediction_gb_min_u =  regressor_min_u.strategies[1].predict(data=data_min_u_focused)
prediction_gb_min_u = pd.DataFrame(prediction_gb_min_u, columns=data_min_u_focused['y_test'].columns)
# Extreme GBoost Regression
prediction_xgb_min_u =  regressor_min_u.strategies[2].predict(data=data_min_u_focused)
prediction_xgb_min_u = pd.DataFrame(prediction_xgb_min_u, columns=data_min_u_focused['y_test'].columns)
# Support Vector Regression
prediction_svr_min_u =  regressor_min_u.strategies[3].predict(data=data_min_u_scaled_focused)
prediction_svr_min_u = pd.DataFrame(prediction_svr_min_u, columns=data_min_u_focused['y_test'].columns)

In [36]:

# min_u classification
if 'min_u_classifier.pickle' not in os.listdir('pickles\dataset_benchmark'):
    # Gradient Boost Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_gradient_boost_classifier_max_u.csv'])
    classifier_min_u = my_ai.Context(strategy=my_ai.GradientBoostClassifierStrategy(hyper_params))
    classifier_min_u.fit(data=data_min_u_bool)
    # Extreme GBoost Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_xgboost_classifier_min_u.csv'])
    classifier_min_u.strategy = my_ai.XGBoostClassifierStrategy(hyper_params)
    classifier_min_u.fit(data=data_min_u_bool)
    # Support Vector Classifier
    hyper_params = get_hyper_params_from_df(boolean_hyper_params['params_support_vector_classifier_min_u.csv'])
    classifier_min_u.strategy = my_ai.SupportVectorClassifierStrategy(hyper_params)
    classifier_min_u.fit(data=data_min_u_bool_scaled)
    utils.serialize_object('pickles\dataset_benchmark\min_u_classifier', classifier_min_u)
else: 
    classifier_min_u = utils.deserialize_object('pickles\dataset_benchmark\classifier_min_u')
# Gradient Boost Classifier
prediction_gb_min_u = classifier_min_u.strategies[0].predict(data=data_min_u_bool)
prediction_gb_min_u = pd.DataFrame(prediction_gb_min_u, columns=data_min_u_bool['y_test'].columns)
# Extreme GBoost Classifier
prediction_xgb_min_u = classifier_min_u.strategies[1].predict(data=data_min_u_bool)
prediction_xgb_min_u = pd.DataFrame(prediction_xgb_min_u, columns=data_min_u_bool['y_test'].columns)
# Support Vector Classifier 
prediction_svr_min_u = classifier_min_u.strategies[2].predict(data=data_min_u_bool_scaled)
prediction_svr_min_u = pd.DataFrame(prediction_svr_min_u, columns=data_min_u_bool['y_test'].columns)

Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "loss" } might not be used.

  This could be a false 

## Results Discussion
In this section the results of the training and testing are presented and compared. The main objectives of this experience is to compare the performance of the regression models in terms of the hybrid metrics confusion matrix and the hybrid metrics rmse. The comparisons will be the following:
- Compare the confusion matrices of the classification models and the regression models evaluate with the hybrid metrics.
- Compare the error results of the regression models trained with the focused dataset and the sparse dataset. 

In [None]:
threshold = train_y.loc[:, train_y.max(axis=0) != 0].max(axis=0).mean() * 0.1 
metric = metrics.Metrics()
valid_y = pd.DataFrame(valid_y, columns=y_min_u.columns)
metric.get_prediction_scores(prediction, valid_y, threshold=threshold)