# Challenger model validation

This notebook performs validation tasks on the candidate __Challenger__ model.

It goes through a few steps to validate the model before labelling it (by setting its alias) to `Challenger`.

When organizations first start to put MLOps processes in place, they should consider having a "human-in-the-loop" to perform visual analyses to validate models before promoting them. As they get more familiar with the process, they can consider automating the steps in a __Workflow__ . The benefits of automation is to ensure that these validation checks are systematically performed before new models are integrated into inference pipelines or deployed for realtime serving. Of course, organizations can opt to retain a "human-in-the-loop" in any part of the process and put in place the degree of automation that suits its business needs.

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/mlops/mlops-uc-end2end-4.png?raw=true" width="1200">

*Note: in a typical mlops setup, this would run as part of an automated job to validate new model. For this simpel demo, we'll run it as an interactive notebook.*

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-science&org_id=3227024006299960&notebook=%2F01-mlops-quickstart%2F04_challenger_validation&demo_name=mlops-end2end&event=VIEW&path=%2F_dbdemos%2Fdata-science%2Fmlops-end2end%2F01-mlops-quickstart%2F04_challenger_validation&version=1">

### A cluster has been created for this demo
To run this demo, just select the cluster `Current Cluster` from the dropdown menu ([open cluster configuration](https://enb-deloitte.cloud.databricks.com/#setting/clusters/0106-134225-g5g75oj5/configuration)). <br />
*Note: If the cluster was deleted after 30 days, you can re-create it with `dbdemos.create_cluster('mlops-end2end')` or re-install the demo: `dbdemos.install('mlops-end2end')`*


## General Validation Checks

<!--img style="float: right" src="https://github.com/QuentinAmbard/databricks-demo/raw/main/retail/resources/images/churn-mlflow-webhook-1.png" width=600 -->

In the context of MLOps, there are more tests than simply how accurate a model will be.  To ensure the stability of our ML system and compliance with any regulatory requirements, we will subject each model added to the registry to a series of validation checks.  These include, but are not limited to:
<br>
* __Model documentation__
* __Inference on production data__
* __Champion-Challenger testing to ensure that business KPIs are acceptable__

In this notebook we explore some approaches to performing these tests, and how we can add metadata to our models with tagging if they have passed a given test or not.

This part is typically specific to your line of business and quality requirements.

For each test, we'll add information using tags to know what has been validated in the model. We can also add Comments to a model if needed.

In [0]:
%pip install --quiet mlflow==2.19
dbutils.library.restartPython()

In [0]:
%run ../_resources/00-setup

## Fetch Model information

We will fetch the model information for the __Challenger__ model from Unity Catalog.

In [0]:
# We are interested in validating the Challenger model
model_alias = "Challenger"
model_name = f"{catalog}.{db}.mlops_churn"

client = MlflowClient()
model_details = client.get_model_version_by_alias(model_name, model_alias)
model_version = int(model_details.version)

print(f"Validating {model_alias} model for {model_name} on model version {model_version}")

## Model checks

#### Description check

Has the data scientist provided a description of the model being submitted?

In [0]:
# If there's no description or an insufficient number of charaters, tag accordingly
if not model_details.description:
  has_description = False
  print("Please add model description")
elif not len(model_details.description) > 20:
  has_description = False
  print("Please add detailed model description (40 char min).")
else:
  has_description = True

print(f'Model {model_name} version {model_details.version} has description: {has_description}')
client.set_model_version_tag(name=model_name, version=str(model_details.version), key="has_description", value=has_description)

#### Model performance metric

We want to validate the model performance metric. Typically, we want to compare this metric obtained for the Challenger model agaist that of the Champion model. Since we have yet to register a Champion model, we will only retrieve the metric for the Challenger model without doing a comparison.

The registered model captures information about the MLflow experiment run, where the model metrics were logged during training. This gives you traceability from the deployed model back to the initial training runs.

Here, we will use the F1 score for the out-of-sample test data that was set aside at training time.

In [0]:
model_run_id = model_details.run_id
f1_score = mlflow.get_run(model_run_id).data.metrics['test_f1_score']

try:
    #Compare the challenger f1 score to the existing champion if it exists
    champion_model = client.get_model_version_by_alias(model_name, "Champion")
    champion_f1 = mlflow.get_run(champion_model.run_id).data.metrics['test_f1_score']
    print(f'Champion f1 score: {champion_f1}. Challenger f1 score: {f1_score}.')
    metric_f1_passed = f1_score >= champion_f1
except:
    print(f"No Champion found. Accept the model as it's the first one.")
    metric_f1_passed = True

print(f'Model {model_name} version {model_details.version} metric_f1_passed: {metric_f1_passed}')
# Tag that F1 metric check has passed
client.set_model_version_tag(name=model_name, version=model_details.version, key="metric_f1_passed", value=metric_f1_passed)

### Benchmark or business metrics on the eval dataset

Let's use our validation dataset to check the potential new model impact.

***Note: This is just to evaluate our models, not to be confused with A/B testing**. A/B testing is done online, splitting the traffic to 2 models and requires a feedback loop to evaluate the effect of the prediction (e.g. after a prediction, did the discount we offered to the customer prevent the churn?). We will cover A/B testing in the advanced part.*

In [0]:
import pyspark.sql.functions as F
#get our validation dataset:
validation_df = spark.table('mlops_churn_training').filter("split='validate'")

#Call the model with the given alias and return the prediction
def predict_churn(validation_df, model_alias):
    model = mlflow.pyfunc.spark_udf(spark, model_uri=f"models:/{catalog}.{db}.mlops_churn@{model_alias}")
    return validation_df.withColumn('predictions', model(*model.metadata.get_input_schema().input_names()))

In [0]:
import pandas as pd
import plotly.express as px
from sklearn.metrics import confusion_matrix

#Note: this is over-simplified and depends of your use-case, but the idea is to evaluate our model against business metrics
cost_of_customer_churn = 2000 #in dollar
cost_of_discount = 500 #in dollar

cost_true_negative = 0 #did not churn, we did not give him the discount
cost_false_negative = cost_of_customer_churn #did churn, we lost the customer
cost_true_positive = cost_of_customer_churn -cost_of_discount #We avoided churn with the discount
cost_false_positive = -cost_of_discount #doesn't churn, we gave the discount for free

def get_model_value_in_dollar(model_alias):
    # Convert preds_df to Pandas DataFrame
    model_predictions = predict_churn(validation_df, model_alias).toPandas()
    # Calculate the confusion matrix
    tn, fp, fn, tp = confusion_matrix(model_predictions['churn'], model_predictions['predictions']).ravel()
    return tn * cost_true_negative+ fp * cost_false_positive + fn * cost_false_negative + tp * cost_true_positive
#add exception to catch non-existing model champion yet
is_champ_model_exist = True
try:
    client.get_model_version_by_alias(f"{catalog}.{db}.mlops_churn", "Champion")
    print("Model already registered as Champion")
except Exception as error:
    print("An error occurred:", type(error).__name__, "It means no champion model yet exist")
    is_champ_model_exist = False
if is_champ_model_exist:
    champion_potential_revenue_gain = get_model_value_in_dollar("Champion")
    challenger_potential_revenue_gain = get_model_value_in_dollar("Challenger")

try:
    #Compare the challenger f1 score to the existing champion if it exists
    champion_potential_revenue_gain = get_model_value_in_dollar("Champion")
except:
    print(f"No Champion found. Accept the model as it's the first one.")
    champion_potential_revenue_gain = 0
    
challenger_potential_revenue_gain = get_model_value_in_dollar("Challenger")

data = {'Model Alias': ['Challenger', 'Champion'],
        'Potential Revenue Gain': [challenger_potential_revenue_gain, champion_potential_revenue_gain]}

# Create a bar plot using plotly express
px.bar(data, x='Model Alias', y='Potential Revenue Gain', color='Model Alias',
    labels={'Potential Revenue Gain': 'Revenue Impacted'},
    title='Business Metrics - Revenue Impacted')

## Validation results

That's it! We have demonstrated some simple checks on the model. Let's take a look at the validation results.

In [0]:
results = client.get_model_version(model_name, model_version)
results.tags

## Promoting the Challenger to Champion

When we are satisfied with the results of the __Challenger__ model, we can then promote it to Champion. This is done by setting its alias to `@Champion`. Inference pipelines that load the model using the `@Champion` alias will then be loading this new model. The alias on the older Champion model, if there is one, will be automatically unset. The model retains its `@Challenger` alias until a newer Challenger model is deployed with the alias to replace it.

In [0]:
if results.tags["has_description"] == "True" and results.tags["metric_f1_passed"] == "True":
  print('register model as Champion!')
  client.set_registered_model_alias(
    name=model_name,
    alias="Champion",
    version=model_version
  )
else:
  raise Exception("Model not ready for promotion")

### Congratulations, our model is now validated and promoted accordingly

We now have the certainty that our model is ready to be used in inference pipelines and in realtime serving endpoints, as it matches our validation standards.


Next: [Run batch inference from our newly promoted Champion model]($./05_batch_inference)