# Model Bias Monitoring with AWS SageMaker Clarity

This Jupyter notebook shows how to perform model bias ovservability with AWS SageMaker in 4 chapters:

1. [Training and deploying model](#1.-Training-and-deploying-model) for ML Observability on the prepared in advance dataset
2. [Setting up monitoring job](#2.-Setting-up-monitoring-job) - Creates Monitoring tasks by creating baseline and scheduling regular monitoring
3. [Generate traffic](#3.-Generating-traffic) - Provides traffic (examples and ground truth) to the endpoint based on which bias metrics will be calculated
4. [Clean up](#4.-Clean-up) - Removes all the created resourses.

Prerequisites:

- Existing Roles with all needed permissions (S3, SageMaker, etc.);
- Configured SageMaker Domain;
- SageMaker Studio user.

One can use SageMaker Studio Jupyter-like environment to run this notebook. To do that, follow the next steps:

1. Run the SageMaker Studio
2. Clone this repository (https://github.com/griddynamics/gd-ml-observability.git)

![Add the repository.png](images/add_the_repository.jpg)
![repository.png](images/repository.jpg)

3. Run this notebook cell by cell paying attention to comments

### Importing necessary libraries

In [None]:
!pip install --upgrade boto3

In [None]:
import json
import random
import time
import numpy as np
import pandas as pd
import sagemaker

from datetime import datetime, timedelta

from sagemaker import get_execution_role, image_uris
from sagemaker.clarify import (
    BiasConfig,
    DataConfig,
    ModelConfig,
    ModelPredictedLabelConfig,
    SHAPConfig,
)
from sagemaker.model import Model
from sagemaker.model_monitor import (
    BiasAnalysisConfig,
    CronExpressionGenerator,
    DataCaptureConfig,
    EndpointInput,
    ModelBiasMonitor,
)
from sagemaker.s3 import S3Downloader, S3Uploader

### Setting up necessary variables and constants

In [None]:
sagemaker_session = sagemaker.session.Session()
sagemaker_client = sagemaker_session.sagemaker_client
sagemaker_runtime_client = sagemaker_session.sagemaker_runtime_client
role = get_execution_role()

DATA_BUCKET = 'adp-rnd-ml-datasets'
# Change the following bucket names if you want to run this code outside GridDynamics to the ones you have accees to.
STAGE_BUCKET = 'adp-rnd-ml-stage'
MODEL_BUCKET = 'adp-rnd-ml-models'

NOW = datetime.now()

## 1. Training and deploying model

Creating Estimator instance fox XGBoost classifier model which then will be deployed for ML Observability. This model uses data obtained from https://www.kaggle.com/datasets/yasserh/loan-default-dataset/code which was preprocessed and stored in the public AWS S3 bucket. Data preparation is out of scope of this notebook.

In [None]:
s3_input_train = sagemaker.inputs.TrainingInput(
    s3_data=f"s3://{DATA_BUCKET}/loan_default/train.csv", content_type="csv")
s3_input_validation = sagemaker.inputs.TrainingInput(
    s3_data=f"s3://{DATA_BUCKET}/loan_default/eval.csv", content_type="csv")

container = sagemaker.image_uris.retrieve("xgboost", sagemaker_session.boto_region_name, "1.5-1")

prefix = f'xgboost-for-loan-default-data-{NOW:%Y-%m-%d}'
region = 'us-east-1'
xgb_model_name = f'loan-default-classification-xgboost-{NOW:%Y-%m-%d}'
xgb_endpoint_name = f'{xgb_model_name}-endpoint'

### Training load default classification model

In [None]:
# Creating a XGBoost Estimator
xgb = sagemaker.estimator.Estimator(
    container,
    role,
    base_job_name=prefix,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path=f"s3://{MODEL_BUCKET}/{prefix}/output/",
    sagemaker_session=sagemaker_session,
)

xgb.set_hyperparameters(
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=2,
    subsample=0.8,
    verbosity=0,
    objective="binary:logistic",
    num_round=100,
)

In [None]:
xgb.fit({"train": s3_input_train, "validation": s3_input_validation})

### Deploying model

Adding data caprure config to be able to monitor model quality on the stored data 

In [None]:
s3_key = f"s3://{STAGE_BUCKET}/{prefix}"
print(f"S3 key: {s3_key}")
s3_capture_upload_path = f"{s3_key}/datacapture"

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=s3_capture_upload_path,
)

In [None]:
xgb_predictor = xgb.deploy(
    endpoint_name=xgb_endpoint_name,
    model_name=xgb_model_name,
    initial_instance_count=1, 
    instance_type="ml.m4.xlarge",
    data_capture_config=data_capture_config,
)

### Results

As a result, one will get a Load Default model and an endpoint. You can find them in the AWS Console -> SageMaker -> Governance -> Model cards. The new model should appear with specified endpoint (and that's all for now)


![governance.png](images/governance.jpg)

## 2. Setting up monitoring job

In [None]:
monitoring_prefix = f"{prefix}/ClarifyModelMonitor-{NOW:%Y-%m-%d}"


ground_truth_upload_path = f"{s3_key}/ground_truth_data"
s3_report_path = f"{s3_key}/reports"

print(f"Capture path: {s3_capture_upload_path}")
print(f"Ground truth path: {ground_truth_upload_path}")
print(f"Report path: {s3_report_path}")

baseline_results_uri = f"{s3_key}/baselining"
print(f"Baseline results uri: {baseline_results_uri}")
model_bias_baselining_job_result_uri = f"{baseline_results_uri}/model_bias"
validation_dataset = s3_input_validation.config['DataSource']['S3DataSource']['S3Uri']
training_dataset = s3_input_validation.config['DataSource']['S3DataSource']['S3Uri']

In [None]:
label_header = 'Status'
all_headers = ['Status', 'loan_amount', 'term', 'income', 'Credit_Score', 'loan_limit',
       'approv_in_adv', 'Credit_Worthiness', 'open_credit',
       'business_or_commercial', 'Neg_ammortization', 'interest_only',
       'lump_sum_payment', 'Secured_by', 'co-applicant_credit_type',
       'submission_of_application', 'Gender_Female', 'Gender_Joint',
       'Gender_Male', 'Gender_Sex Not Available', 'loan_purpose_p1',
       'loan_purpose_p2', 'loan_purpose_p3', 'loan_purpose_p4',
       'occupancy_type_ir', 'occupancy_type_pr', 'occupancy_type_sr',
       'total_units_1U', 'total_units_2U', 'total_units_3U', 'total_units_4U',
       'age_25-34', 'age_35-44', 'age_45-54', 'age_55-64', 'age_65-74',
       'age_gt74', 'age_lt25']
dataset_type = "text/csv"

In [None]:
model_bias_monitor = ModelBiasMonitor(
    role=role,
    sagemaker_session=sagemaker_session,
    max_runtime_in_seconds=1800,
)

model_bias_data_config = DataConfig(
    s3_data_input_path=validation_dataset,
    s3_output_path=model_bias_baselining_job_result_uri,
    label=label_header,
    headers=all_headers,
    dataset_type=dataset_type,
)

Setting up facets - sensitive features based on which model bias will be calculated and observed

In [None]:
FACET_COLUMNS_VALUES = {
    facet: values 
    for facet, values
    in zip(
        ['Gender_Female', 'Gender_Joint', 'Gender_Male', 'Gender_Sex Not Available', 'income',
         'occupancy_type_ir', 'occupancy_type_pr', 'occupancy_type_sr', 'age_25-34', 'age_35-44',
         'age_45-54', 'age_55-64', 'age_65-74', 'age_gt74', 'age_lt25'],
        [
            [1], [1], [1], [1], # Gender - Binary data, assessing only positive values of one-hot-encoded features
             [25000.], # income - threshold splitting into 2 groups with income level less and more then 30000.0 
             [1], [1], [1], # occupancy_pyte - Binary data, assessing only positive values of one-hot-encoded features 
             [1], [1], [1], [1], [1], [1], [1] # age -  Binary data, assessing only positive values of one-hot-encoded features
        ]
    )
}

In [None]:
model_bias_config = BiasConfig(
    # Loan default value of target feature Status is 1. 
    # Thus, we'll choose 0 as positive outcome, so Bias metrics could make sense
    label_values_or_threshold=[0],
    facet_name=list(FACET_COLUMNS_VALUES.keys()),
    facet_values_or_threshold=list(FACET_COLUMNS_VALUES.values()),
)

In [None]:
model_predicted_label_config = ModelPredictedLabelConfig(probability_threshold=.5)
model_config = ModelConfig(
    instance_count=1,
    instance_type="ml.m4.xlarge",
    content_type=dataset_type,
    accept_type=dataset_type,
    endpoint_name=xgb_endpoint_name,
)

### Creating baseline

In [None]:
model_bias_monitor.suggest_baseline(
    model_configvalidation_datasetig,
    data_config=model_bias_data_config,
    bias_config=model_bias_config,
    model_predicted_label_config=model_predicted_label_config,
)

model_bias_monitor.latest_baselining_job.wait(logs=False)
model_bias_constraints = model_bias_monitor.suggested_constraints()
print()
print(f"ModelBiasMonitor suggested constraints: {model_bias_constraints.file_s3_uri}")
print(S3Downloader.read_file(model_bias_constraints.file_s3_uri))


### Setting up monitoring job

In [None]:
model_bias_analysis_config = None
if not model_bias_monitor.latest_baselining_job:
    model_bias_analysis_config = BiasAnalysisConfig(
        model_bias_config,
        headers=all_headers,
        label=label_header,
    )

model_bias_monitor.create_monitoring_schedule(
    analysis_config=model_bias_analysis_config,
    output_s3_uri=s3_report_path,
    endpoint_input=EndpointInput(
        endpoint_name=xgb_endpoint_name,
        destination="/opt/ml/processing/input/endpoint",
        start_time_offset="-PT1H",
        end_time_offset="-PT0H",
        probability_threshold_attribute=0.5,
    ),
    ground_truth_input=ground_truth_upload_path,
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)
print(f"Model bias monitoring schedule: {model_bias_monitor.monitoring_schedule_name}")

### Results

Model monitoring job will be created. It might be found in the model dashboard if we go to the card or in the SageMaker Studio:
SageMaker Studio -> Home -> Deployments -> Endpoint -> _Model name_ -> Model bias


![studio_model](images/studio_model.jpg)
![monitoring_job](images/monitoring_job.jpg)

After some time, when several reports will be generated (after the 3rd chapter) one will be able to see bias metrics charts below

## 3. Generating traffic

Preparing data for checking for model bias. It can be any data supported by model. For simplicity we can just use our train and validation datasets to see different model monitoring results.

In [None]:
test_df = pd.read_csv(validation_dataset)
tr_df = pd.read_csv(training_dataset)
for col in FACET_COLUMNS_VALUES:
    if col == 'income':
        continue
    test_df[col] = test_df[col].astype(int)
    tr_df[col] = tr_df[col].astype(int)

In [None]:
# Serialization function which will make Clarity understand types of attributes
def cast_str(x):
    if x in [1.0, 1, 0, 0.0]:
        return(str(int(x)))
    return str(x)

def serialize_example(example):
    return ','.join(map(cast_str, example))

In [None]:
import threading

class WorkerThread(threading.Thread):
    def __init__(self, do_run, *args, **kwargs):
        super(WorkerThread, self).__init__(*args, **kwargs)
        self.__do_run = do_run
        self.__terminate_event = threading.Event()

    def terminate(self):
        self.__terminate_event.set()

    def run(self):
        is_test = False
        while not self.__terminate_event.is_set():
            df = test_df if is_test else tr_df
            self.__do_run(self.__terminate_event, df)
            is_test = not is_test

In [None]:
def invoke(df, terminate_event):
    # Getting predictions from deployed model from given dataframe
    for i, row in df.drop(columns=['Status']).iterrows():
        payload = serialize_example(row)
        response = sagemaker_runtime_client.invoke_endpoint(
            EndpointName=xgb_endpoint_name,
            Body=payload,
            ContentType=dataset_type,
            InferenceId=str(i)
        )
        prediction = response["Body"].read()
        time.sleep(0.01)
        if terminate_event.is_set():
            break

def real_ground_truth_with_id(inference_id, df):
    return {
        "groundTruthData": {
            "data": str(int(df.iloc[inference_id]['Status'])),
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(upload_time, df):
    records = [real_ground_truth_with_id(i, df) for i in range(len(df))]
    records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

def generate_ground_truth(terminate_event, df):
    invoke(df, terminate_event)
    upload_ground_truth(datetime.utcnow(), df)
    for _ in range(0, 60):
        time.sleep(60)
        if terminate_event.is_set():
            break

In [None]:
ground_truth_thread = WorkerThread(do_run=generate_ground_truth)
ground_truth_thread.start()

### Results

After some time, we'll see the monitoring job results in the Monitoring Job History tab

![monitoring_result](images/monitoring_result.jpg)

Clicking on them we can see the additional details: which metrics constrains were violated compared to the baseline 
![violations](images/violations.jpg)

## 4. Cleaning up 

Now we know how to set up model bias monitoring with AWS SageMaker tools. And we can remove all created resourses, in order not to recieve unexpected bills at the end of the month.

In [None]:
# Turning off traffic generation
ground_truth_thread.terminate()
# Deleting monitoring job
model_bias_monitor.delete_monitoring_schedule()
# Removing endpoint and model
xgb_predictor.delete_endpoint()

Don't forget to stop the current runtime (if you work from the SageMaker Studio):

![runtime](images/runtime.jpg)