# Using Custom Metric Function

In this notebook, we will show an example of how to calculate custom performance metrics on an H2O model. The notebook will go through the following steps:

1. Train a GBM model in H2O
2. Write a script to calculate Mean Absolute Percent Error (MAPE)
3. Train a GBM model in H2O using MAPE as a [`custom_metric_func`](https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/dev/custom_functions.md)
4. Train a Grid of GBMs and choose model based on MAPE


## 1. Train a  GBM Model in H2O

In [None]:
# Load H2O library
import h2o
h2o.init()

In [None]:
# Import Data
train_path = "https://raw.githubusercontent.com/h2oai/app-consumer-loan/master/data/loan.csv"
train = h2o.import_file(train_path, destination_frame = "loan_train")

In [None]:
# Set target and predictor variables
y = "int_rate"
x = train.col_names
x.remove(y)
x.remove("bad_loan")

In [None]:
# Train GBM Model
from h2o.estimators import H2OGradientBoostingEstimator

gbm_v1 = H2OGradientBoostingEstimator(model_id = "gbm_v1.hex")

gbm_v1.train(y = y, x = x, training_frame = train)

In [None]:
print(gbm_v1)

## 2. Write Script to Calculate Mean Absolute Percent Error (MAPE)

### Function to Calculate MAPE in H2O

In [None]:
def MAPE(actual, predict):
    abs_pct_error = abs((actual - predict) / actual)
    mape = abs_pct_error.mean()[0]
    return mape

In [None]:
mape_v1 = MAPE(train[y], gbm_v1.predict(train))
print("MAPE: " + str(round(mape_v1, 4)))

### Python Script to calculate MAPE in custom_metric_func

The MAPE metric is defined in a class stored in utils_model_metrics.py. This class contains three methods `map`, `reduce`, and `metric`. The `map` method takes 5 arguments `predicted`, `actual`, `weight`, `offset` and `model`.

```
class MapeMetric:
    def map(self, predicted, actual, weight, offset, model):
        return [weight * abs((actual[0] - predicted[0]) / actual[0]), weight]

    def reduce(self, left, right):
        return [left[0] + right[0], left[1] + right[1]]

    def metric(self, last):
        return last[0] / last[1]
```

This class definition is uploaded to the H2O cluster using [`h2o.upload_custom_metric`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=custom_metric#h2o.upload_custom_metric).

In [None]:
from utils_model_metrics import MapeMetric

mape_func = h2o.upload_custom_metric(MapeMetric, func_name = "MAPE", func_file = "mape.py")

In [None]:
type(mape_func)

In [None]:
print(mape_func)

## 3. Train a GBM Model using custom_metric_func to calculate MAPE

The [`H2OGeneralizedLinearEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2ogeneralizedlinearestimator),
[`H2ORandomForestEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2orandomforestestimator), and
[`H2OGradientBoostingEstimator`](http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html?highlight=automl#h2ogradientboostingestimator) models accept a `custom_metric_func` argument.

In [None]:
# Train GBM Model with custom_metric_function
gbm_v2 = H2OGradientBoostingEstimator(model_id = "gbm_v2.hex",
                                      custom_metric_func = mape_func)

gbm_v2.train(y = y, x = x, training_frame = train)

In [None]:
perf = gbm_v2.model_performance()
perf

In [None]:
perf.custom_metric_name()

In [None]:
perf.custom_metric_value()

We can see that our custom mae function is in the model performance metrics labeled `mae`.  This value matches the MAE calculated in our original GBM model.

In [None]:
print("MAPE V1: " + str(round(mape_v1, 4)))
print("MAPE V2: " + str(round(gbm_v2.model_performance().custom_metric_value(), 4)))

## 4. Train a Grid of GBMs and choose model based on MAPE

In [None]:
from h2o.grid.grid_search import H2OGridSearch
gbm_hyper_parameters = {'max_depth': [7, 8, 9]}
gbm_grid = H2OGridSearch(H2OGradientBoostingEstimator(custom_metric_func = mape_func,
                                                      nfolds = 5),
                           gbm_hyper_parameters)
gbm_grid.train(x = x, y = y, training_frame = train, grid_id = "gbm_grid")

In [None]:
sorted([[h2o.get_model(x).model_performance(xval = True).custom_metric_value(), x] for x in gbm_grid.model_ids])

## Shutdown H2O Cluster

In [None]:
h2o.cluster().shutdown()