# Fiddler Quick Start Guide for Explainability (XAI) with Surrogate Models

Fiddler is not only a powerful observability tool for monitoring the health of your ML models in production but also an explainability tool to peak into your black box models. With the ability to **point explain** and **global explain** your model, Fiddler provides powerful insights that can demystify your model's behavior. 


---


You can start exploring Fiddler's XAI capabilities by following these five quick steps:

1. Connect to Fiddler
2. Define your model specifications
3. Add your model
4. Generate a Surrogate Model to Enable Explainability
5. Publish Production Events
6. Get insights


## 0. Imports

In [1]:
!pip install -q fiddler-client

import numpy as np
import pandas as pd
import fiddler as fdl
import time as time

print(f"Running client version {fdl.__version__}")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Running client version 3.2.0


## 1. Connect to Fiddler

Before you can register your model with Fiddler, you'll need to connect using our API client.

---

**We need just two pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

These can be found by navigating to the **Settings** page of your Fiddler environment.

In [2]:
URL = ''  # Make sure to include the full URL (including https://).
TOKEN = ''

Now just run the following code block to connect to the Fiddler API!

In [3]:
fdl.init(
    url=URL,
    token=TOKEN
)

Once you connect, you can create a new project by specifying a project name and initializing it by using `project.create()`

In [4]:
PROJECT_NAME = 'quickstart_surrogate_xai'

project = fdl.Project(
    name=PROJECT_NAME
)

project.create()

<fiddler.entities.project.Project at 0x13aa2fce0>

You should now be able to see the newly created project on the UI.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/surrogate_xai_1.png" />
        </td>
    </tr>
</table>

## 2. Define your model specifications

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.
  
In order to get insights into the model's performance, **Fiddler needs a small sample of data** to learn the schema of incoming data.

In [5]:
PATH_TO_SAMPLE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/churn_data_sample.csv'

sample_df = pd.read_csv(PATH_TO_SAMPLE_CSV)
sample_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,churn,timestamp
0,27acd1c2,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,0.897202,yes,1710428231855
1,27b36d0c,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,0.997441,yes,1710428262096
2,27b5360a,509,New York,Female,29,0,107712.57,2,1,1,92898.17,0.920563,yes,1710428292338
3,27b5d650,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,0.779282,yes,1710428322579
4,27b236a8,699,Florida,Female,25,8,0.00,2,1,1,52404.47,0.825474,yes,1710428352821
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,27b409ba,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,0.760645,yes,1711032910888
19996,27aaff96,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,0.216093,no,1711032941130
19997,27ad3162,794,California,Male,35,6,0.00,2,1,1,68730.91,0.982021,yes,1711032971371
19998,27b076ce,832,California,Male,61,2,0.00,1,0,1,127804.66,0.071598,no,1711033001613


### 2.a Define Model Specification
In order to add your model to Fiddler, simply create a ModelSpec object with information about what each column of your data sample should used for.

Fiddler supports four column types:
1. **Inputs**
2. **Outputs** (Model predictions)
3. **Targets** (Ground truth values)
4. **Metadata**

In [6]:
model_spec = fdl.ModelSpec(
    inputs=[
        'creditscore',
        'geography',
        'gender',
        'age',
        'tenure',
        'balance',
        'numofproducts',
        'hascrcard',
        'isactivemember',
        'estimatedsalary'
    ],
    outputs=['predicted_churn'],
    targets=['churn'],
    metadata=['customer_id', 'timestamp']
)

If you have columns in your ModelSpec that denote prediction IDs or timestamps, then Fiddler can use these to power its analytics accordingly. Let's define those as well:

In [7]:
id_column = 'customer_id'
timestamp_column = 'timestamp'

### 2.b Define Model Task

Fiddler supports a variety of model tasks. In this case, we're adding a binary classification model.

For this, we'll create a ModelTask object and an additional ModelTaskParams object to specify the ordering of our positive and negative labels.

For a detailed breakdown of all supported model tasks, click here.

In [8]:
model_task = fdl.ModelTask.BINARY_CLASSIFICATION

task_params = fdl.ModelTaskParams(target_class_order=['no', 'yes'])

## 3. Add your model

Create a Model object and publish it to Fiddler, passing in:
1. Your data sample
2. Your ModelSpec object
3. Your ModelTask and ModelTaskParams objects
4. Your ID and timestamp columns

In [9]:
MODEL_NAME = 'bank_churn'

model = fdl.Model.from_data(
    name=MODEL_NAME,
    project_id=fdl.Project.from_name(PROJECT_NAME).id,
    source=sample_df,
    spec=model_spec,
    task=model_task,
    task_params=task_params,
    event_id_col=id_column,
    event_ts_col=timestamp_column
)

model.create()

<fiddler.entities.model.Model at 0x12a6e7cb0>

## 4. Generate a Surrogate Model to enable **Explainability**

Fiddler allows users to access model explanations by leveraging your model artifacts or using Fiddler to generate a surrogate model based on your based on the sample data you uploaded earlier. This guide will focus on generating a surrogate model by uploading the sample dataset to Fiddler and using it to generate a Surrogate model with just one API call:

In [10]:
DATASET_NAME = 'baseline' 
#Publishing Pre-prod dataset
job = model.publish(
  source=sample_df, # we are using the same sample dataset we used earlier in model onboarding
  environment=fdl.EnvType.PRE_PRODUCTION, #environemnt parameter is used to designate this dataset as a pre-production data
  dataset_name=DATASET_NAME)
job.wait()

In [11]:
dataset = fdl.Dataset.from_name(name=DATASET_NAME, model_id=model.id)

In [12]:
DEPLOYMENT_PARAMS = {'memory': 1024, 'cpu': 1000}

surrogate_job = model.add_surrogate(
  dataset_id=dataset.id,
  deployment_params=fdl.DeploymentParams(**DEPLOYMENT_PARAMS)
)

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/Model_Schema-Surrogate_Model.png"/>
        </td>
    </tr>
</table>

## 5. Finally Let's Publish Some Events 

In [13]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/churn_events.csv'

production_df = pd.read_csv(PATH_TO_EVENTS_CSV)
# Shift the timestamps of the production events to be as recent as today 
production_df['timestamp'] = production_df['timestamp'] + (int(time.time() * 1000) - production_df['timestamp'].max())

In [14]:
output = model.publish(
  source=production_df, 
  environment=fdl.EnvType.PRODUCTION) #make sure to specify this environment as PRODUCTION 

## 6. Get insights

**You're all done!**
  
Return to your Fiddler environment to get enhanced monitoring and explainability into the surrogate model.  With a surrogate model or an uploaded model artifact, we can unlock advance observability like global and point explanations, PDP Plots, segment-level feature impacts and more.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/surrogate_xai_5.png" />
        </td>
    </tr>
</table>

You can also run explanations and/or get feature impact now from the client...

In [15]:
#grab a row from the baseline to run an explanation on
row = production_df.to_dict(orient='records')[0]
row

{'customer_id': '27c349a2',
 'creditscore': 559,
 'geography': 'California',
 'gender': 'Male',
 'age': 52,
 'tenure': 2,
 'balance': 0.0,
 'numofproducts': 1,
 'hascrcard': 1,
 'isactivemember': 0,
 'estimatedsalary': 129013.59,
 'predicted_churn': 0.0074475368963339,
 'decision': 'low_risk',
 'churn': 'no',
 'timestamp': 1721075084159}

In [16]:
if surrogate_job.status == "SUCCESS":
    explanation = explain_result = model.explain(
            input_data_source=fdl.RowDataSource(row=row),
            ref_data_source=fdl.DatasetDataSource(
                env_type='PRODUCTION'))
    print(explanation)
    
else:
    print ("Please wait for the surrogate model to finish being created. Then re-run this cell.")

Please wait for the surrogate model to finish being created. Then re-run this cell.




---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.