## Disclaimer and license
Copyright 2022 Google LLC. This solution, including any related sample code or data, is made available on an “as is,” “as available,” and “with all faults” basis, solely for illustrative purposes, and without warranty or representation of any kind. This solution is experimental, unsupported and provided solely for your convenience. Your use of it is subject to your agreements with Google, as applicable, and may constitute a beta feature as defined under those agreements. To the extent that you make any data available to Google in connection with your use of the solution, you represent and warrant that you have all necessary and appropriate rights, consents and permissions to permit Google to use and process that data. By using any portion of this solution, you acknowledge, assume and accept all risks, known and unknown, associated with its usage, including with respect to your deployment of any portion of this solution in your systems, or usage in connection with your business, if at all.

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# 5.Model Evaluation and Diagnostics

This notebook demonstrates the evaluation of a LTV regression model by using the
[Regression Diagnostics](https://github.com/google/gps_building_blocks/blob/master/py/gps_building_blocks/ml/diagnostics/regression.py)
module.

This evaluation consists of:
* Model performance with respect to a variety of metrics.
* Plots to understand the model performance to design media experiments.
* Model insights (the relationship between features and the prediction values) helping to generate new business insights.
* Insights helping to diagnose the model to make sure it is reasonable.

## Requirements
The model and the testing dataset should be available in GCP BigQuery.

## Install and import required modules

In [None]:
# Uncomment to install required python modules
#!sh ../utils/setup.sh

In [None]:
# Add custom utils module to Python environment
import os
import sys
sys.path.append(os.path.abspath(os.pardir))

from gps_building_blocks.cloud.utils import bigquery as bigquery_utils
from gps_building_blocks.ml.diagnostics import regression
from utils import helpers

## Set parameters

In [None]:
configs = helpers.get_configs('config.yaml')
dest_configs = configs.destination

# GCP project ID
PROJECT_ID = dest_configs.project_id
# Name of the BigQuery dataset
DATASET_NAME = dest_configs.dataset_name
# To distinguish the seperate runs of the training pipeline
RUN_ID = '01'
# BigQuery table name containing model development dataset
FEATURES_DEV_TABLE = f'features_dev_table_{RUN_ID}'
# BigQuery table name containing out of time test dataset
FEATURES_TEST_TABLE = f'features_test_table_{RUN_ID}'
# BQML model name to save in BigQuery
BQML_MODEL_NAME = f'ltv_model_bqml_{RUN_ID}'
# BigQuery table name containing the testing predictions dataset
PREDICTION_TABLE = 'predictions_table'
# Name of the column in the prediction table with the predicted label
PREDICTED_LABEL = 'predicted_label'
# Name of the column in the prediction table with the actual label
ACTUAL_LABEL = 'label'

In [None]:
# Initialize BigQuery client.
bq_utils = bigquery_utils.BigQueryUtils(project_id=PROJECT_ID)

## Create test dataset for prediction (if not predicted)
In this step, it creates the prediction dataset using test dataset for model performance diagnostic. This step is skippable if the test dataset for prediction already exists.


In [None]:
prediction_query = f"""
  CREATE OR REPLACE TABLE `{PROJECT_ID}.{DATASET_NAME}.{PREDICTION_TABLE}`
  AS (
    SELECT *
    FROM ML.PREDICT(MODEL `{PROJECT_ID}.{DATASET_NAME}.{BQML_MODEL_NAME}`,
                    TABLE `{PROJECT_ID}.{DATASET_NAME}.{FEATURES_TEST_TABLE}`)
  );
"""
print(prediction_query)
bq_utils.run_query(prediction_query)

## Read test dataset for prediction (if already predicted)

In this step, we assume the test dataset containing both prediction and actual values is available in BigQuery.

In [None]:
sql = f"""
  SELECT *
  FROM `{PROJECT_ID}.{DATASET_NAME}.{PREDICTION_TABLE}`
  ;
"""
df_pred_test = bq_utils.run_query(sql).to_dataframe()
df_pred_test.head()

In [None]:
# Change negative predictions to 0.0 to avoid metrics calculation error.
df_pred_test[PREDICTED_LABEL] = [
    0.0 if predicted_label < 0 else predicted_label
    for predicted_label in df_pred_test[PREDICTED_LABEL]
]

## Run regression diagnostics

### Calculate performance metrics

To check the regression model performance, we calculated the following metrics for diagnostic:
- Mean squared error: a risk metric corresponding to the expected value of the squared error, calculated by taking the mean of squared error.
- Root mean squared error: the root value of the mean squared error.
- Mean absolute error: a risk metric corresponding to the expected of the absolute value, calculated by taking the mean of absolute error.
- Mean absolute percentage error: an evaluation metric for regression problems, sensitive to relative errors, calculated by taking the mean of the absolute percentage error.
- R-squared: coefficient of determination, representing the proportion of variance that has been explained by the independent variables in the model.
- Pearson correlation: a correlation metric between actual and predicted labels.

In [None]:
perf_metrics = regression.calc_performance_metrics(
    labels=df_pred_test[ACTUAL_LABEL],
    predictions=df_pred_test[PREDICTED_LABEL].values,
    decimal_points=4)

print(perf_metrics)

### Scatter Plots of prediction values and residuals

The following function plots:
1. the scatter plot of actual values versus prediction values
2. the scatter plot of residuals versus predicted values
From the first plot, we can see how prediction values differ from actual values, whether there exists strong correlation between the predictions and actual results. In the second plot, the residual plot, we can see how the residuals deviate from the line at zero since the residual equals to the actual value minus the prediction value. Ideal residual plots would be symmetrically distributed and cluster close to y=0 value.

In [None]:
regression.plot_prediction_residuals(
    labels=df_pred_test[ACTUAL_LABEL],
    predictions=df_pred_test[PREDICTED_LABEL].values)

### Calculate performance metrics for the bins of the prediction values

The following function calculates performance metrics for each bin of the predictions. The default number of the bins of the prediction values is a parameter and here we use 3 as an example. The calculation is firstly rankd the prediction values from the highest to the lowest and then devide the predictions into 3 bins such that the first bin contains the highest 33.33% of the predictions and the second bin contains the next 33.33% of the predictions and so on. We suggest to start from small number of bins like 3 and tune the parameter to larger number like 10 check the performance. Due to the distribution of the prediction and actual label, if the proportion of zero is too high then we will encounter all zeros in certain bins, which will cause issue in performance metrics calculation. The following metrics are calculated for each bin:
- mean_label: Mean of actual values in the bin.
- mean_prediction: Mean of predictions in the bin.
- rmse: Root mean squared error.
- mape: Mean absolute percentage error.
- corr: Pearson correlation coefficient.

In [None]:
bin_metrics = regression.calc_reg_bin_metrics(
    labels=df_pred_test[ACTUAL_LABEL],
    predictions=df_pred_test[PREDICTED_LABEL].values,
    number_bins=3)

bin_metrics

### Plot performance metrics for the bins of the prediction values

The following function plots performance metrics in each bin using bin_metrics calculated in last section. These plots allow us to have better understanding of predictions in the test dataset. In the first subplot, it shows the mean of actual and prediction values in each bin. An ideal plot would be the height of bars of both prediction and actual values decrease as the number of the bin increases. In the rest three plots (MAPE, RMSE, corr), the evaluation metric has same interpretation as in the regression models.

In [None]:
regression.plot_reg_bin_metrics(
    bin_metrics=bin_metrics,
    fig_width=25,
    fig_height=30);

### Plot actual distribution over the bins of the prediction values

The following function plots the distribution of actual label values over the bins of the prediction values. The plot provides better insight for the action value distribution in each bin of the predictions. From the boxplots we can have a good understanding of the median, the spread, the interquartile range and outliers if any in each prediction bin, expecting a monotonically decreasing trend over the bins from the highest to the lowest predictions.

In [None]:
regression.plot_binned_preds_labels(
    labels=df_pred_test[ACTUAL_LABEL],
    predictions=df_pred_test[PREDICTED_LABEL].values,
    number_bins=3);

### Confusion matrix of the actual vs predicted bins

The function helps to visualize and compare the distribution of the bins of both the actual value and the predicted value. It does the following:

* Sort both actual value and predicted value in the descending order and divide them into number_bins bins.
* Calculate confusion matrix and normalize it over the true labels when the parameter normalize = true. It can also normalize the confusion matrix over the predictions or over all population. It takes the values 'pred' and 'all' respectively.
* Plot heatmap of the actual and predited bins from the highest to the lowest.

In [None]:
regression.plot_confusion_matrix_bin_heatmap(
    labels=df_pred_test[ACTUAL_LABEL],
    predictions=df_pred_test[PREDICTED_LABEL].values,
    number_bins=3,
    normalize='true');