In [None]:
# Upgrade Oracle ADS to pick up latest features and maintain compatibility with Oracle Cloud Infrastructure.

!pip install -U oracle-ads

Oracle Data Science service sample notebook.

Copyright (c) 2019, 2022 Oracle, Inc. All rights reserved. Licensed under the [Universal Permissive License v 1.0](https://oss.oracle.com/licenses/upl).

---

# <font color="red">Model Evaluation with ADSEvaluator</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color="teal">Oracle Cloud Infrastructure Data Science Service.</font></p>

---

# Overview:

This notebook will demonstrate the capabilities of the `ADSEvaluator`. It is a machine learning (ML) evaluation component of Oracle Cloud Infrastructure's Accelerated Data Science (ADS) package. You will learn how it can be used for the evaluation of any general class of supervised machine learning models, as well as comparison amongst models within the same class.   

Specifically, the notebook will focus on binary classification using an imbalanced dataset, multi-class classification using a synthetically generated dataset consisting of three equally distributed classes and lastly a regression using a synthetically generated dataset with positive targets. The training is done using a standard library, and subsequently, the models would be evaluated using `ADSEvaluator`.

Compatible conda pack: [General Machine Learning](https://docs.oracle.com/en-us/iaas/data-science/using/conda-gml-fam.htm) for CPU on Python 3.8 (version 1.0)

## Contents:

- <a href='#binary'>Binary Classification</a>
    - <a href='#binary_data'>Data</a>
    - <a href='#binary_train'>Train</a>
    - <a href='#binary_adsmodel'>Convert to an `ADSModel`</a>
    - <a href='#binary_evaluation'>Model Evaluation</a>
- <a href='#multi'>Multiclass Classification</a>
    - <a href='#multi_data'>Data</a>
    - <a href='#multi_train'>Train</a>
    - <a href='#multi_adsmodel'>Convert to an `ADSModel`</a>
    - <a href='#multi_evaluation'>Model Evaluation</a>
- <a href='#reg'>Regression</a>
    - <a href='#reg_data'>Data</a>
    - <a href='#reg_train'>Train</a>
    - <a href='#reg_adsmodel'>Convert to an `ADSModel`</a>
    - <a href='#reg_evaluation'>Model Evaluation</a>
- <a href='#adsevaluator'>Working with `ADSEvaluator`</a>
    - <a href='#adsevaluator_metrics'>Raw Metrics</a>
    - <a href='#adsevaluator_admod'>Add and Delete Models</a>
    - <a href='#adsevaluatoradmet'>Add and Delete Custom Metrics</a>
    - <a href='#adsevaluator_cost'>Calculate Cost</a>
- <a href='#ref'>References</a>
 
---


Datasets are provided as a convenience.  Datasets are considered third-party content and are not considered materials 
under your agreement with Oracle.
    
You can access the `oracle_fraud_dataset1` dataset license [here](https://oss.oracle.com/licenses/upl). 
    
You can access the `wine` dataset license is available [here](https://github.com/scikit-learn/scikit-learn/blob/master/COPYING).

---


In [None]:
import ads.environment.ml_runtime
import logging
import matplotlib.font_manager
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import warnings

from ads.common.data import ADSData
from ads.common.model import ADSModel
from ads.dataset.dataset_browser import DatasetBrowser
from ads.dataset.factory import DatasetFactory
from ads.evaluations.evaluator import ADSEvaluator
from os.path import join
from scipy import stats
from sklearn import tree
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, LinearRegression, Lasso
from sklearn.metrics import fbeta_score

warnings.filterwarnings("ignore")
logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.ERROR)

<a id='binary'></a>
# Binary Classification

The next few cells will demonstrate one way to create and binary classification `ADSEvaluator` object. However, each of these cells is modular and can be interchanged with your favorite alternative and weaved back in.

<a id='binary_data'></a>
## Data

For this example, you want to predict whether or not a given transaction may be fraudulent based on a variety of columns.

The data are using are stored on Oracle Cloud Infrastructure (OCI) Object Storage. OCI Object Storage is a performant hot storage option for data files. You read the file directly at the URL listed below.

You will use a `DatasetFactory` object from the Oracle Accelerated Data Science (ADS) library to pull the data from Object Storage. `DatasetFactory.open()` creates an `ADSDataset` type object, which can be used for a variety of visualizations. Here you pass in the target variable, `anomalous`.

In [None]:
attrition_path = join(
    "/", "opt", "notebooks", "ads-examples", "oracle_data", "oracle_fraud_dataset1.csv"
)
binary_fk = DatasetFactory.open(attrition_path, target="anomalous").sample(frac=0.1)

Now that you have data with a target, you will split the dataset into two separate datasets. 85% of the data will be for training and 15% for model evaluation. This will be done using the `.train_test_split()` method.

In [None]:
train, test = binary_fk.train_test_split(test_size=0.15)
X_train = train.X.values
y_train = train.y.values
X_test = test.X.values
y_test = test.y.values

<a id='binary_train'></a>
## Train

Sklearn is a well-known Python library for training various kinds of machine learning models. You use it to train two classifiers:
 - __Logistic regression__: Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression)
 - __Random Forest__: Random forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees

In [None]:
lr_clf = LogisticRegression(
    random_state=0, solver="lbfgs", multi_class="multinomial"
).fit(X_train, y_train)

rf_clf = RandomForestClassifier(n_estimators=10, random_state=0).fit(X_train, y_train)

<a id='binary_adsmodel'></a>
## Convert to an `ADSModel`

The `ADSModel` class in the ADS package, has a `.from_estimator()` method. It takes as input a fitted estimator and converts it into an `ADSModel` object. In the case of classification, you need to pass the class labels in the `classes` argument. The `ADSModel` object is used for evaluation in the `ADSEvaluator` object. 

In [None]:
bin_lr_model = ADSModel.from_estimator(lr_clf, classes=[0, 1])
bin_rf_model = ADSModel.from_estimator(rf_clf, classes=[0, 1])

<a id='binary_evaluation'></a>
## Model Evaluation

To instantiate an `ADSEvaluator` object, two main parameters are:
 - __ADSData__: The `ADSData` object for the test set prepared earlier
 - __Models__: The `ADSModel` objects for the logistic regression and random forest. 

In [None]:
bin_evaluator = ADSEvaluator(
    test, models=[bin_lr_model, bin_rf_model], training_data=train
)

 <a id='plot'></a>
The `ADSEvaluator` object has a `.show_in_notebook()` method, which can be used to visualize a variety of machine learning evaluation metrics. For binary classification you can view the following:
- __gain_chart__: A plot of gain vs % baseline positives (true positive rate or recall vs predictive positive rate). [Read more](http://mlwiki.org/index.php/Cumulative_Gain_Chart).
- __ks_statistics__: (or the Kolmogorov–Smirnov statistic) A nonparametric plot of the difference in the distributions of both labels. [Read more](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
- __lift_chart__: A plot of lift vs % baseline positives (cumulative gain/total gain vs predictive positive rate). [Read more](https://en.wikipedia.org/wiki/Lift_(data_mining)).
- __pr_curve__: A plot of precision vs recall (the proportion of positive class predictions that were correct vs the proportion of positive class objects that were correctly identified). [Read more](https://en.wikipedia.org/wiki/Precision_and_recall).
- __normalized_confusion_matrix__: A matrix of the number of actual vs predicted values for each class, normalized by the number of true labels per class (rows). [Read more](https://en.wikipedia.org/wiki/Confusion_matrix).
- __roc_curve__: A plot of true positive rate vs false positive rate (recall vs the proportion of negative class objects that were identified incorrectly). [Read more](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).

In [None]:
bin_evaluator.show_in_notebook(perfect=True)

 Note on the parameters:
 - If `perfect` is set to `True`, a plot of a perfect classifier for comparison in the Lift and Gain charts.
 - If `baseline` is set `True`, then the baseline will not be included for the comparison of various plots.
 - If `use_training_data` is set `True`, the training data will be used for evaluations.
 - If `plots` contains a list of plot types, then only those plot types will be displayed.

In [None]:
bin_evaluator.show_in_notebook(
    ["gain_chart", "lift_chart"], baseline=False, use_training_data=True
)

 <a id='met'></a>
Further, you can compare various metrics using the `metrics` property of the `ADSEvaluator` object. For binary classification, the following metrics are available:
- __accuracy__: Proportion of correctly classified examples.
- __auc__: Area under the ROC curve. 
- __f1__: Harmonic mean of precision and recall.
- __hamming_loss__: Proportion of incorrectly classified examples.
- __precision__: Proportion of positive class predictions that were correct.
- __recall__: Proportion of positive class examples that were correctly identified.

The metrics in blue are the best for that row of models on testing data. The metrics in yellow are the best for that row of models on training data.

In [None]:
bin_evaluator.metrics

You can select specific metrics, by extracting them from the Pandas DataFrames `test_evaluations` or `train_evaluations` and subsequently indexing the row `.loc['metric_name']`. For example, you can show only the __precision__ metric.

In [None]:
bin_evaluator.test_evaluations.loc["precision"]

<a id='multi'></a>
# Multiclass Classification

This example is similar to the <a href='#binary'>binary classification</a> example. However, instead of predicting one of two classes, you will be predicting one of three classes. This adds a level of complexity to the model but also the evaluation metrics.

<a id='multi_data'></a>
## Data

Here you will use the wine dataset from sklearn. This dataset contains certain features about different wines, and your goal is to predict the type. You can load the data using the `DatasetBrowser` object as shown in the following cell.

In [None]:
multi_ds = DatasetBrowser.sklearn().open("wine").set_target("target")

Now that you have data with a target, you will split the dataset into two separate datasets. 85% of the data will be for training and 15% for model evaluation. This will be done using the `.train_test_split()` method.

In [None]:
train, test = multi_ds.train_test_split(test_size=0.15)
X_train = train.X.values
y_train = train.y.values
X_test = test.X.values
y_test = test.y.values

<a id='multi_train'></a>
## Train

Use sklearn to train two multi-class classifiers:
- __Multinomial Logistic Regression__: Multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. It is also known as softmax regression, because of the use of a softmax function. The logistic function used in binary logistic regression is a special case of the softmax function for two outcomes.
- __Random Forest__: Random forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

In [None]:
lr_clf = LogisticRegression(
    random_state=0, solver="lbfgs", multi_class="multinomial"
).fit(X_train, y_train)

rf_clf = RandomForestClassifier(n_estimators=10).fit(X_train, y_train)

<a id='multi_adsmodel'></a>
## Convert to an `ADSModel`

Similar to the case of binary classification, you need to pass in the class labels using the `classes` argument. There are three classes in the dataset, __0__, __1__ and __2__.

In [None]:
lr_model = ADSModel.from_estimator(lr_clf)
rf_model = ADSModel.from_estimator(rf_clf)

<a id='multi_evaluation'></a>
## Model Evaluation

Similar to the binary classification problem, to instantiate an `ADSEvaluator` object you need the following objects:
- __ADSData__: The `ADSData` object for the test set prepared earlier
- __Models__: The `ADSModel` objects for the multinomial logistic regression and random forest.

In [None]:
multi_evaluator = ADSEvaluator(test, models=[lr_model, rf_model])

For multi-class classification you can view the following using `show_in_notebook()`:
- __confusion_matrix__: A matrix of the number of actual vs predicted values for each class. [Read more](https://en.wikipedia.org/wiki/Confusion_matrix)
- __f1_by_label__: Harmonic mean of precision_by_label and recall_by_label. Compute f1 for each, __3__ f1 scores in this examples. [Read more](https://en.wikipedia.org/wiki/F1_score)
- __jaccard_by_label__: Computes the similarity for each label distribution. [Read more](https://en.wikipedia.org/wiki/Jaccard_index)
- __pr_curve__: A plot of precision vs recall (the proportion of positive class predictions that were correct vs the proportion of positive class objects that were correctly identified). [Read more](https://en.wikipedia.org/wiki/Precision_and_recall).
- __precision_by_label__: Consider one label as positive class and rest as negative. Compute precision for each, __3__ precision numbers in this example. [Read more](https://en.wikipedia.org/wiki/Precision_(statistics))
- __recall_by_label__: Consider one label as positive class and rest as negative. Compute recall for each, __3__ recall numbers in this example. [Read more](https://en.wikipedia.org/wiki/Precision_and_recall)
- __roc_curve__: A plot of true positive rate vs false positive rate (recall vs the proportion of negative class objects that were identified incorrectly). [Read more](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).

In [None]:
multi_evaluator.show_in_notebook()

For multi-class classification, you can have the following metrics:
- __accuracy__: Number of correctly classified examples divided by total examples
- __f1_micro__: Global f1. Can be calculated by using the harmonic mean of __precision_micro__ and __recall_micro__.
- __f1_weighted__: Weighted average of __f1_by_label__. Weights are proportional to the number of true instances for each label.
- __hamming_loss__: 1 - accuracy
- __precision_micro__: Global precision. Calculated by using global true positives and false positives.
- __precision_weighted__: Weighted average of __precision_by_label__. Weights are proportional to the number of true instances for each label.
- __recall_micro__: Global recall. Calculated by using global true positives and false negatives. 
- __recall_weighted__: Weighted average of __recall_by_label__. Weights are proportional to the number of true instances for each label.

All of these metrics can be computed directly from the confusion matrix. 

In [None]:
multi_evaluator.metrics

<a id='reg'></a>
# Regression

In this section, you will see another example of building an `ADSEvaluator` object. However, this section is a regression problem instead of the previous classification problems.

<a id='reg_data'></a>
## Data

The next cell will create a synthetic.

In [None]:
_X, _y = make_regression(
    n_samples=10000, n_features=10, n_informative=2, random_state=42
)
df = pd.DataFrame(_X, columns=["F{}".format(x) for x in range(10)])
df["target"] = pd.Series(_y)
reg_ds = DatasetFactory.open(df).set_target("target")

Now that you have data with a target, you will split the dataset into two separate datasets. 85% of the data will be for training and 15% for model evaluation. This will be done using the `.train_test_split()` method.

In [None]:
train, test = reg_ds.train_test_split(test_size=0.15)
X_train = train.X.values
y_train = train.y.values
X_test = test.X.values
y_test = test.y.values

<a id='reg_train'></a>
## Train

You will use sklearn to train two regression models:
- __Linear Regression__: Linear regression is a linear approach to modeling the relationship between a scalar response and one or more predictor variables. Linear regression models try to minimize the sum of the squares of the residuals, in the loss function.
- __Lasso__: Lasso (least absolute shrinkage and selection operator) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. Mathematically, it applies the L1-penalty to the least-squares of the residuals loss function in linear regression.

In [None]:
lin_reg = LinearRegression().fit(X_train, y_train)

lasso_reg = Lasso(alpha=0.1).fit(X_train, y_train)

<a id='reg_adsmodel'></a>
## Convert to an `ADSModel`

In the case of regression, you only need to pass in the fitted estimators (regression models in this case).

In [None]:
lin_reg_model = ADSModel.from_estimator(lin_reg)
lasso_reg_model = ADSModel.from_estimator(lasso_reg)

<a id='reg'></a>
## Model Evaluation

Like before, the `ADSEvaluator` object needs two main things for instantiation:
 - __ADSData__: The `ADSData` object for the test set prepared earlier
 - __Models__: The `ADSModel` objects for the linear regression and lasso regression.

In [None]:
reg_evaluator = ADSEvaluator(test, models=[lin_reg_model, lasso_reg_model])

For regression you can view the following using `show_in_notebook()`:
- __observed_v_predicted__: Plot of the observed, or actual values against your predicted values output by your models.
- __residuals_qq__: Quantile-quantile plot between residuals and quantiles of a standard normal distribution. Should be very close to a straight line for a good model. 
- __residuals_vs_observed__: Plot of Residuals vs Observed values. This, too, should not carry a lot of structure in a good model.
- __residuals_vs_predicted__: Plot of Residuals vs Predicted values. This should not carry a lot of structure in a good model.

In [None]:
reg_evaluator.show_in_notebook()

For regression, you can use the `.metrics` property to see the following:
- __explained_variance_score__: Variance of the model's predictions. Mean of the squared difference between the  predicted values and the true mean of the data. [Read more](https://en.wikipedia.org/wiki/Explained_variation)
- __mean_absolute_error__: Mean of the absolute difference between the true values and predicted values. [Read more](https://en.wikipedia.org/wiki/Mean_absolute_error)
- __mean_residuals__: Mean of the difference between the true values and predicted values. [Read more](https://en.wikipedia.org/wiki/Errors_and_residuals)
- __mean_squared_error__: Mean of the squared difference between the true values and predicted values. [Read More](https://en.wikipedia.org/wiki/Mean_squared_error)
- __r2_score__: Also known as __coefficient of determination__, is the proportion of the variance in the dependent variable that is predictable from the independent variables. [Read more](https://en.wikipedia.org/wiki/Coefficient_of_determination)
- __root_mean_squared_error__: Square root of __mean_squared_error__. [Read more](https://en.wikipedia.org/wiki/Root-mean-square_deviation)

In [None]:
reg_evaluator.metrics

<a id='adsevaluator'></a>
# Working with `ADSEvaluator`

In this section, you will generate various types of `ADSEvaluator` objects and examine the associated plots and metrics. Let's dive into a few more advanced features of the `ADSEvaluator` class.

<a id='adsevaluator_metrics'></a>
# Raw Metrics

Going back to the original binary classification problem, you will access the raw metrics from you `ADSEvaluator` object. The results are returned in JSON format.

In [None]:
bin_evaluator.raw_metrics

<a id='adsevaluator_admod'></a>
## Add and Delete Models

You can also add models later on for evaluation, by using the `.add_models([model_list])` method. For example, assume you just read a paper that suggested Decision Tree Classifiers might be better for capturing a part of your data. You wish to add that to an existing `ADSEvaluator` object. You need to create this model and then add it to the`ADSEvaluator` object. The following cells demonstrate how to do this.

In [None]:
train, _ = binary_fk.train_test_split(test_size=0.15)
X_train = train.X.values
y_train = train.y.values

tree_mod = tree.DecisionTreeClassifier(max_depth=3).fit(X_train, y_train)

bin_tree_model = ADSModel.from_estimator(tree_mod, classes=[0, 1])

In [None]:
bin_evaluator.add_models([bin_tree_model])

In [None]:
bin_evaluator.metrics

Looking at the metrics summary above, this model doesn't seem to have improved any of the metrics you are interested in. Therefore, you may want to remove this model to de-clutter the output. To do this, use the `del_models()` method:

In [None]:
bin_evaluator.del_models(["DecisionTreeClassifier"])

In [None]:
bin_evaluator.metrics

<a id='adsevaluator_admet'></a>
## Add and Delete Custom Metrics

Just as you can add and delete models, you can add and delete metrics. This is for those problems that require esoteric and specific metrics not yet supported by the `ADSEvaluator` class. For example, with your highly imbalanced dataset of fraudulent credit card purchases, you might find that the $F_2$ score is more relevant than the $F_1$ score. However, there is no standard library that has the $F_2$ metric, thus you will have to write a function to compute it. Once you define the function you would pass it into the `ADSEvaluator` object via `.add_metrics()` method. This function will get the true values and the predicted values from your model and put them into your `evaluator.metrics` output.

`.add_metrics()` accepts multiple functions passed in as a list. To demonstrate this, you will pass in a metric that tells you the number of correct predictions.

Due to a limitation with how Jupyter handles created functions if you intend to use any of these functions within `AutoML` you need to define them in a separate file and import them.

In [None]:
def func1(y_true, y_pred):
    return sum(y_true == y_pred)


def func2(y_true, y_pred):
    return fbeta_score(y_true, y_pred, beta=2)


bin_evaluator.add_metrics([func1, func2], ["Total True", "F2 Score"])

In [None]:
bin_evaluator.metrics

These metrics have been interesting, but ultimately didn't add much, let's delete them from your `ADSEvaluator` object.

In [None]:
bin_evaluator.del_metrics(["Total True", "F2 Score"])

In [None]:
bin_evaluator.metrics

<a id='adsevaluator_cost'></a>
## Calculate Cost

The `.calculate_cost()` method helps you to evaluate your binary classification model based on your own weighting of the problem. If true positives are really important and false positives are less important in some sort of medical diagnosis, for example, you can use this method to quantify that difference.

The method requires these parameters: `tn_weight`, `fp_weight`, `fn_weight`, `tp_weight`, which of course represent the weightings of the 4 values of a binary confusion matrix. See some example distributions below:

In [None]:
bin_evaluator.calculate_cost(0, 1, 1, 0)

Assume you wanted a 100-1 ratio of false positives to false negatives. Meaning you would much rather have 99 positive estimations that were wrong, than 1 negative estimation that is wrong. Here you can see that while logistic regression won out on pure accuracy. The random forest classifier was actually better for the cases you care about.

In [None]:
bin_evaluator.calculate_cost(0, 1, 0.01, 0)

 <a id='ref'></a>
# References

- [ADS Library Documentation](https://accelerated-data-science.readthedocs.io/en/latest/index.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)
- Pedregosa, Fabian, et al. [Scikit-learn: Machine learning in Python.](http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html) Journal of machine learning research 12.Oct (2011): 2825-2830.
- Tibshirani, Robert. [Regression shrinkage and selection via the lasso.](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1996.tb02080.x) Journal of the Royal Statistical Society: Series B (Methodological) 58.1 (1996): 267-288.