# Fiddler User-Defined Feature Impact Quick Start Guide

In this notebook we demonstrate how to upload your own precomputed feature impact values to a Fiddler model. Previous versions of Fiddler required you create either a surrogate or user model artifact with which to calculate the feature impact values within Fiddler. Both surrogate and user model artifact require extra steps when onboarding a model and may be unnecessary if the feature impact values already exist. 


---

The documentation for the user-defined feature impact upload API can be found online [here](https://docs.fiddler.ai/python-client-3-x/api-methods-30#upload_feature_impact).

User-Defined Feature Impact is supported on Fiddler version 24.12+ using Fiddler Python client API versions 3.3 and higher.

**Please note that you may skip Steps #2 - #5 and resume at [Step #6](#section_06)** if you have already run Fiddler's [Simple Monitoring Quick Start Guide](https://docs.fiddler.ai/quickstart-notebooks/quick-start) and used the default values and sample data.

1. [Connect to Fiddler - Initialization, create a project](#section_01)
2. [Load a Data Sample](#section_02)
3. [Define Your Model Specifications](#section_03)
4. [Set a Model Task](#section_04)
5. [Add Your Model](#section_05)
6. [Upload Your Feature Impact Values](#section_06)

# 0. Imports

In [1]:
%pip install -q fiddler-client

import pandas as pd
import fiddler as fdl

print(f"Running client version {fdl.__version__}")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Running client version 3.4.0


## <a id='section_01'>1. Connect to Fiddler</a>

Before you can add information about your model with Fiddler, you'll need to connect using our Python client API.


---


**We need a couple pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

Your authorization token can be found by navigating to the **Credentials** tab on the **Settings** page of your Fiddler environment.

In [2]:
URL = ''  # Make sure to include the full URL (including https:// e.g. 'https://your_company_name.fiddler.ai').
TOKEN = ''

Constants for this example notebook, change as needed to create your own versions

In [3]:
PROJECT_NAME = 'quickstart_examples'  # If the project already exists, the notebook will create the model under the existing project.
MODEL_NAME = 'bank_churn_simple_monitoring'

# Sample data hosted on GitHub
PATH_TO_SAMPLE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/churn_data_sample.csv'
PATH_TO_FI_VALUES = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/custom_feature_impact_scores.json'
PATH_TO_FI_VALUES_UPDATED = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/custom_feature_impact_scores_alt.json'

Now just run the following to connect to your Fiddler environment.

In [4]:
fdl.init(url=URL, token=TOKEN)

#### 1.a Create New or Load Existing Project

Once you connect, you can create a new project by specifying a unique project name in the fld.Project constructor and call the `create()` method. If the project already exists, it will load it for use.

In [5]:
try:
    # Create project
    project = fdl.Project(name=PROJECT_NAME).create()
    print(f'New project created with id = {project.id} and name = {project.name}')
except fdl.Conflict:
    # Get project by name
    project = fdl.Project.from_name(name=PROJECT_NAME)
    print(f'Loaded existing project with id = {project.id} and name = {project.name}')

Loaded existing project with id = 70b74177-c712-44b1-b431-2377c1b908ab and name = quickstart_examplesx


# <a id='section_02'>2. Load a Data Sample</a>

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.
  
In order to get insights into the model's performance, **Fiddler needs a small sample of data** to learn the schema of incoming data.

In [6]:
sample_data_df = pd.read_csv(PATH_TO_SAMPLE_CSV)
column_list = sample_data_df.columns
sample_data_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,churn,timestamp
0,27acd1c2,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,0.897202,yes,1710428231855
1,27b36d0c,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,0.997441,yes,1710428262096
2,27b5360a,509,New York,Female,29,0,107712.57,2,1,1,92898.17,0.920563,yes,1710428292338
3,27b5d650,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,0.779282,yes,1710428322579
4,27b236a8,699,Florida,Female,25,8,0.00,2,1,1,52404.47,0.825474,yes,1710428352821
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,27b409ba,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,0.760645,yes,1711032910888
19996,27aaff96,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,0.216093,no,1711032941130
19997,27ad3162,794,California,Male,35,6,0.00,2,1,1,68730.91,0.982021,yes,1711032971371
19998,27b076ce,832,California,Male,61,2,0.00,1,0,1,127804.66,0.071598,no,1711033001613


## <a id='section_03'>3. Define Your Model Specifications</a>

In order to add your model to Fiddler, simply create a ModelSpec object with information about what each column of your data sample should used for.

Fiddler supports four column types:
1. **Inputs**
2. **Outputs** (Model predictions)
3. **Target** (Ground truth values)
4. **Metadata**

In [None]:
input_columns = list(
    column_list.drop(['predicted_churn', 'churn', 'customer_id', 'timestamp'])
)
model_spec = fdl.ModelSpec(
    inputs=input_columns,
    outputs=['predicted_churn'],
    targets=[
        'churn'
    ],  # Note: only a single Target column is allowed, use metadata columns and custom metrics for additional targets
    metadata=['customer_id', 'timestamp'],
)
id_column = (
    'customer_id'  # Indicates which column is your unique identifier for each event
)
timestamp_column = (
    'timestamp'  # Indicates which column is your timestamp for each event
)

## <a id='section_04'>4. Set a Model Task</a>

Fiddler supports a variety of model tasks. In this case, we're adding a binary classification model.

For this, we'll create a ModelTask object and an additional ModelTaskParams object to specify the ordering of our positive and negative labels.

*For a detailed breakdown of all supported model tasks, click here.*

In [None]:
model_task = fdl.ModelTask.BINARY_CLASSIFICATION

task_params = fdl.ModelTaskParams(target_class_order=['no', 'yes'])

## <a id='section_05'>5. Add Your Model</a>

Create a Model object and publish it to Fiddler, passing in
1. Your data sample
2. Your ModelSpec object
3. Your ModelTask and ModelTaskParams objects
4. Your ID and timestamp columns

In [None]:
model = fdl.Model.from_data(
    name=MODEL_NAME,
    project_id=project.id,
    source=sample_data_df,
    spec=model_spec,
    task=model_task,
    task_params=task_params,
    event_id_col=id_column,
    event_ts_col=timestamp_column,
)

model.create()
print(f'New model created with id = {model.id} and name = {model.name}')

## <a id='section_06'>6. Upload your feature impact values</a>

**Note:** If skipping Steps #2 - #5 because the Simple Monitoring Quick Start model already exists, you will still need to intantiate the fdl.Model object. Uncomment the next cell and run it.


In [7]:
model = fdl.Model.from_name(name=MODEL_NAME, project_id=project.id)  # Load the model
model 

<fiddler.entities.model.Model at 0x16c209fc0>

Uploading your own feature impact values requires:

1. A Python dict containing each input column defined in your Model's schema and its numeric value
2. A local reference to the fdl.Model

In this example, the feature impact scores are stored as JSON so first they are converted to a dict after reading from the JSON file.

In [8]:
fi_values_series = pd.read_json(PATH_TO_FI_VALUES, typ='series')
fi_values_dict = fi_values_series.to_dict()

feature_impacts = model.upload_feature_impact(
    feature_impact_map=fi_values_dict, update=False
)
feature_impacts

{'feature_names': ['creditscore',
  'geography',
  'gender',
  'age',
  'tenure',
  'balance',
  'numofproducts',
  'hascrcard',
  'isactivemember',
  'estimatedsalary'],
 'feature_impact_scores': [0.010380932471238,
  0.028789225032550003,
  0.0,
  0.165411247771343,
  0.005687963322037,
  0.067009532533907,
  0.125625402157261,
  0.001350373172425,
  0.06077078182683,
  0.009540991695917002],
 'model_task': 'BINARY_CLASSIFICATION',
 'model_input_type': 'TABULAR',
 'created_at': '2024-11-15 16:09:20.986855+00:00',
 'response_type': 'FEATURE_IMPACT_TABULAR_UPLOADED',
 'system_generated': False}

Feature impact values can be updated at any time simply by setting the `update` parameter to True when calling [upload_feature_impact()](https://docs.fiddler.ai/python-client-3-x/api-methods-30#upload_feature_impact). The change takes effect immediately.

In [9]:
fi_values_series = pd.read_json(PATH_TO_FI_VALUES_UPDATED, typ='series')
fi_values_dict = fi_values_series.to_dict()

feature_impacts = model.upload_feature_impact(
    feature_impact_map=fi_values_dict, update=True
)
feature_impacts

{'feature_names': ['creditscore',
  'geography',
  'gender',
  'age',
  'tenure',
  'balance',
  'numofproducts',
  'hascrcard',
  'isactivemember',
  'estimatedsalary'],
 'feature_impact_scores': [-0.30000000000000004,
  0.30000000000000004,
  0.10291689360869001,
  0.30000000000000004,
  -0.30000000000000004,
  0.30000000000000004,
  -0.30000000000000004,
  0.30000000000000004,
  -0.30000000000000004,
  -0.17423663961038602],
 'model_task': 'BINARY_CLASSIFICATION',
 'model_input_type': 'TABULAR',
 'created_at': '2024-11-15 16:09:30.239229+00:00',
 'response_type': 'FEATURE_IMPACT_TABULAR_UPLOADED',
 'system_generated': False}