# Onboard a Credit Approval Model to Evaluate Fairness

In this notebook, we present the steps for onboarding a model to evaluate model fairness.  

Fiddler is the pioneer in enterprise Model Performance Management (MPM), offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and LOB teams to **monitor, explain, analyze, and improve ML deployments at enterprise scale**. 
Obtain contextual insights at any stage of the ML lifecycle, improve predictions, increase transparency and fairness, and optimize business revenue.

---

You can experience Fiddler's Fairness Offering ***in minutes*** by following these four quick steps:

1. Connect to Fiddler
2. Upload a baseline dataset
3. Upload a model package directory containing the **1) package.py and 2) model artifact**
4. Get Fairness insights

# 0. Imports

In [1]:
!pip install -q fiddler-client==2.1.0.dev4

import fiddler as fdl
import pandas as pd
import yaml
import datetime
import time
from IPython.display import clear_output

print(f"Running Fiddler client version {fdl.__version__}")

[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Running Fiddler client version 2.1.0.dev4


# 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using our Python client.

---

**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your organization ID
3. Your authorization token

The latter two of these can be found by pointing your browser to your Fiddler URL and navigating to the **Settings** page.

In [4]:
URL = 'https://preprod.fiddler.ai' # Make sure to include the full URL (including https://).
ORG_ID = 'preprod'
AUTH_TOKEN = '6lxdgyAZ3B2PNFxR3GZ7N4ao6As6UvicPQdamdaU13g'

Now just run the following code block to connect the client to your Fiddler environment.

In [5]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's [create_project](https://docs.fiddler.ai/reference/clientcreate_project) function.

In [7]:
PROJECT_ID = 'danny_credit_approval'

if not PROJECT_ID in client.list_projects():
    print(f'Creating project: {PROJECT_ID}')
    client.create_project(PROJECT_ID)
else:
    print(f'Project: {PROJECT_ID} already exists')

Creating project: danny_credit_approval


# 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts credit approval worthiness**.
  
In order to get insights into the model's performance, **Fiddler needs a small  sample of data that can serve as a baseline** for making comparisons with data in production.


---


*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/docs/designing-a-baseline-dataset).*

In [8]:
PATH_TO_BASELINE_CSV = 'https://media.githubusercontent.com/media/fiddler-labs/fiddler-examples/main/quickstart/data/intersectionally_unfair_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,FLAG_OWN_CAR,FLAG_OWN_REALTY,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,DAYS_BIRTH,DAYS_EMPLOYED,CNT_FAM_MEMBERS,gender,race,income,paid_off,#_of_pastdues,no_loan,target,Approve_probability_of_credit_request
0,1,1,2,1,0,0,32.890411,-12.443836,2.0,M,Other,49306.571969,0,3,1,0,0.024598
1,1,1,2,2,0,0,58.832877,-3.106849,2.0,F,Caucasian,139386.670908,9,10,0,0,0.200638
2,0,1,2,2,1,0,52.356164,-8.358904,1.0,M,Caucasian,144281.758418,0,0,22,1,0.826015
3,0,1,0,1,1,0,61.545205,1000.665753,1.0,F,Asian,158338.663878,0,0,15,1,0.808336
4,1,1,2,1,0,0,46.224658,-2.106849,2.0,F,Caucasian,134150.633849,0,0,60,1,0.986340
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45980,1,1,2,2,0,0,38.334247,-2.178082,2.0,M,Asian,131966.029281,6,12,0,0,0.041951
45981,1,1,2,2,0,0,30.865753,-7.019178,4.0,M,Asian,129770.545151,0,0,18,1,0.685311
45982,0,1,2,2,1,0,36.512329,-9.936986,1.0,M,Caucasian,145865.051770,1,12,0,0,0.016415
45983,0,1,2,2,0,0,40.002740,-1.093151,4.0,F,Caucasian,120365.414155,0,2,0,0,0.180347


Fiddler uses this baseline dataset to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

---

You can construct a [DatasetInfo](https://docs.fiddler.ai/reference/fdldatasetinfo) object to be used as **a schema for keeping track of this information** by running the following code block.

In [9]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,FLAG_OWN_CAR,INTEGER,,False,0 - 1
1,FLAG_OWN_REALTY,INTEGER,,False,0 - 1
2,NAME_INCOME_TYPE,INTEGER,,False,0 - 2
3,NAME_EDUCATION_TYPE,INTEGER,,False,0 - 2
4,NAME_FAMILY_STATUS,INTEGER,,False,0 - 1
5,NAME_HOUSING_TYPE,INTEGER,,False,0 - 1
6,DAYS_BIRTH,FLOAT,,False,20.52 - 69.04
7,DAYS_EMPLOYED,FLOAT,,False,"-48.0 - 1,001.0"
8,CNT_FAM_MEMBERS,FLOAT,,False,1.0 - 39.0
9,gender,CATEGORY,2.0,False,


Then use the client's [upload_dataset](https://docs.fiddler.ai/reference/clientupload_dataset) function to send this information to Fiddler.
  
*Just include:*
1. A unique dataset ID
2. The baseline dataset as a pandas DataFrame
3. The `DatasetInfo` object you just created

In [10]:
DATASET_ID = 'intersectionally_unfair'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

{'uuid': 'd9bfc31e-781a-4ccf-b984-859395e1a2b8',
 'name': 'Ingestion dataset Upload',
 'info': {'project_name': 'danny_credit_approval',
  'resource_name': 'intersectionally_unfair',
  'resource_type': 'DATASET'},
 'status': 'SUCCESS',
 'progress': 100.0,
 'error_message': None,
 'error_reason': None}

Within your Fiddler environment's UI, you should now be able to see the newly created dataset within your project.

## 3. Upload your model package

Now it's time to upload your model package to Fiddler.  To complete this step, we need to ensure we have 2 assets in a directory.  It doesn't matter what this directory is called, but for this example we will call it **/model**.

In [11]:
import os
os.makedirs("model")

***Your model package directory will need to contain:***
1. A **package.py** file which explains to Fiddler how to invoke your model's prediction endpoint
2. And the **model artifact** itself
3. (Optional) A **requirements.txt** specifying which python libraries need by package.py.  This example doesn't require any additional libraries to be installed so a requirements.txt file is not needed here.

---

### 3.1.a  Create the **model_info** object 

This is done by creating our [model_info](https://docs.fiddler.ai/reference/fdlmodelinfo) object.


In [14]:
metadata_cols = ['gender','race']
feature_columns = ['FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH', 'DAYS_EMPLOYED',
       'CNT_FAM_MEMBERS', 'income', 'paid_off', '#_of_pastdues', 'no_loan']

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=client.get_dataset_info(PROJECT_ID, DATASET_ID),
    target='target', 
    features=feature_columns,
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION,
    metadata_cols = metadata_cols,
    outputs=['Approve_probability_of_credit_request'],
    display_name='Credit model with systemic racial and gender bias',
    description='logistic reg model'
)

model_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,target,INTEGER,,False,0 - 1

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,FLAG_OWN_CAR,INTEGER,,False,0 - 1
1,FLAG_OWN_REALTY,INTEGER,,False,0 - 1
2,NAME_INCOME_TYPE,INTEGER,,False,0 - 2
3,NAME_EDUCATION_TYPE,INTEGER,,False,0 - 2
4,NAME_FAMILY_STATUS,INTEGER,,False,0 - 1
5,NAME_HOUSING_TYPE,INTEGER,,False,0 - 1
6,DAYS_BIRTH,FLOAT,,False,20.52 - 69.04
7,DAYS_EMPLOYED,FLOAT,,False,"-48.0 - 1,001.0"
8,CNT_FAM_MEMBERS,FLOAT,,False,1.0 - 39.0
9,income,FLOAT,,False,"10,350.0 - 192,700.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,Approve_probability_of_credit_request,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,gender,CATEGORY,2,False,
1,race,CATEGORY,5,False,


### 3.1.b Add Model Information to Fiddler

In [15]:
MODEL_ID = 'intersectionally_unfair'

client.add_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Model(id=14312, name='intersectionally_unfair', project_name='danny_credit_approval', organization_name='preprod', info={'name': 'Credit model with systemic racial and gender bias', 'input-type': 'structured', 'model-task': 'binary_classification', 'datasets': ['intersectionally_unfair'], 'inputs': [{'column-name': 'FLAG_OWN_CAR', 'data-type': 'int', 'is-nullable': False, 'value-range-min': 0, 'value-range-max': 1}, {'column-name': 'FLAG_OWN_REALTY', 'data-type': 'int', 'is-nullable': False, 'value-range-min': 0, 'value-range-max': 1}, {'column-name': 'NAME_INCOME_TYPE', 'data-type': 'int', 'is-nullable': False, 'value-range-min': 0, 'value-range-max': 2}, {'column-name': 'NAME_EDUCATION_TYPE', 'data-type': 'int', 'is-nullable': False, 'value-range-min': 0, 'value-range-max': 2}, {'column-name': 'NAME_FAMILY_STATUS', 'data-type': 'int', 'is-nullable': False, 'value-range-min': 0, 'value-range-max': 1}, {'column-name': 'NAME_HOUSING_TYPE', 'data-type': 'int', 'is-nullable': False, 'valu

### 3.2 Create the **package.py** file

The contents of the cell below will be written into our ***package.py*** file.  This is the step that will be most unique based on model type, framework and use case.  The model's ***package.py*** file also allows for preprocessing transformations and other processing before the model's prediction endpoint is called.  For more information about how to create the ***package.py*** file for a variety of model tasks and frameworks, please reference the [Uploading a Model Artifact](https://docs.fiddler.ai/docs/uploading-a-model-artifact#packagepy-script) section of the Fiddler product documentation.

In [16]:
%%writefile model/package.py

import pickle
from pathlib import Path
import pandas as pd

PACKAGE_PATH = Path(__file__).parent

class SklearnModelPackage:

    def __init__(self):
        self.is_classifier = True
        self.is_multiclass = False
        self.output_columns = ['Approve_probability_of_credit_request']
        with open(PACKAGE_PATH / 'model_unfair.pkl', 'rb') as infile:
            self.model = pickle.load(infile)

    def predict(self, input_df):
        if self.is_classifier:
            if self.is_multiclass:
                predict_fn = self.model.predict_proba
            else:
                def predict_fn(x):
                    return self.model.predict_proba(x)[:, 1]
        else:
            predict_fn = self.model.predict
        return pd.DataFrame(predict_fn(input_df), columns=self.output_columns)

def get_model():
    return SklearnModelPackage()

Writing model/package.py


### 3.3  Ensure your model's artifact is in the **/model** directory

Make sure your model artifact (*e.g. the model_unfair.pkl file*) is also present in the model package directory.  The following cell will move this model's pkl file into our */model* directory.

In [17]:
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/models/model_unfair.pkl", "model/model_unfair.pkl")

('model/model_unfair.pkl', <http.client.HTTPMessage at 0x7f7a35033b50>)

### 3.4 Define Model Parameters 

This is done by creating our [DEPLOYMENT_PARAMETERS](https://docs.fiddler.ai/reference/fdldeploymentparams) object.

In [18]:
DEPLOYMENT_PARAMETERS = fdl.DeploymentParams(image_uri="md-base/python/machine-learning:1.0.0",  
                                    cpu=100,
                                    memory=256,
                                    replicas=1)

### Finally, upload the model package directory

Once the model's artifact is in the */model* directory along with the **pacakge.py** file and requirments.txt the model package directory can be uploaded to Fiddler.

In [19]:
client.add_model_artifact(model_dir='model/', project_id=PROJECT_ID, model_id=MODEL_ID, deployment_params=DEPLOYMENT_PARAMETERS)

'960a4eeb-fd3b-4372-9a24-58448826e97c'

Within your Fiddler environment's UI, you should now be able to see the newly created model.

# 4. Get Fairness insights

**You're all done!**
  
Now just head to your Fiddler environment's UI and explore the model's fairness metrics.


Alternatively, you can also run fairness from the Fiddler Python client:

In [24]:
protected_features = ['gender', 'race']
positive_outcome = 1

# fairness_metrics = client.run_fairness(
#     project_id=PROJECT_ID,
#     model_id=MODEL_ID,
#     dataset_id=DATASET_ID,
#     protected_features=protected_features,
#     positive_outcome=positive_outcome
# )

fairness_metrics = client.get_fairness(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    data_source=fdl.DatasetDataSource(dataset_name=DATASET_ID, num_samples=200),
    protected_features=protected_features,
    positive_outcome=positive_outcome,
    score_threshold=0.6
)

fairness_metrics

BadRequest: DB::Exception: Syntax error: failed at position 929 (end of query) (line 6, col 9): . Expected one of: expression with optional alias, element of expression with optional alias, lambda expression, end of query. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xe0c9a75 in /usr/bin/clickhouse
1. ? @ 0x8cce74d in /usr/bin/clickhouse
2. DB::parseQueryAndMovePosition(DB::IParser&, char const*&, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, bool, unsigned long, unsigned long) @ 0x14dcb87f in /usr/bin/clickhouse
3. ? @ 0x1398075c in /usr/bin/clickhouse
4. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x1398012d in /usr/bin/clickhouse
5. DB::TCPHandler::runImpl() @ 0x14708699 in /usr/bin/clickhouse
6. DB::TCPHandler::run() @ 0x1471dbb9 in /usr/bin/clickhouse
7. Poco::Net::TCPServerConnection::start() @ 0x17621e54 in /usr/bin/clickhouse
8. Poco::Net::TCPServerDispatcher::run() @ 0x1762307b in /usr/bin/clickhouse
9. Poco::PooledThread::run() @ 0x177aa407 in /usr/bin/clickhouse
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x177a7e3d in /usr/bin/clickhouse
11. ? @ 0x7f93e306c609 in ?
12. __clone @ 0x7f93e2f91133 in ?




---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.