<a href="https://colab.research.google.com/github/priyankaiiit14/Model-Monitoring/blob/main/content_root/tutorial/quickstart/Fiddler_Quick_Start_Simple_Monitoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fiddler Simple Monitoring Quick Start Guide

Fiddler is the pioneer in enterprise Model Performance Management (MPM), offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and LOB teams to **monitor, explain, analyze, and improve ML deployments at enterprise scale**. 
Obtain contextual insights at any stage of the ML lifecycle, improve predictions, increase transparency and fairness, and optimize business revenue.

---

You can start using Fiddler ***in minutes*** by following these five quick steps:

1. Connect to Fiddler
2. Upload a baseline dataset
3. Add metadata about your model with Fiddler
4. Publish production events
5. Get insights

## 0. Imports

In [1]:
!pip install -q fiddler-client;

import numpy as np
import pandas as pd
import time as time
import fiddler as fdl

print(f"Running client version {fdl.__version__}")

[K     |████████████████████████████████| 155 kB 16.2 MB/s 
[K     |████████████████████████████████| 2.4 MB 50.6 MB/s 
[K     |████████████████████████████████| 132 kB 56.4 MB/s 
[K     |████████████████████████████████| 10.9 MB 54.4 MB/s 
[K     |████████████████████████████████| 69 kB 7.7 MB/s 
[K     |████████████████████████████████| 54 kB 2.7 MB/s 
[K     |████████████████████████████████| 9.1 MB 49.3 MB/s 
[K     |████████████████████████████████| 79 kB 7.5 MB/s 
[K     |████████████████████████████████| 127 kB 61.5 MB/s 
[?25hRunning client version 1.4.0


## 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using our API client.


---


**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler

In [2]:
URL = 'https://mlops.fiddler.ai/' # Make sure to include the full URL (including https://).

2. Your organization ID
3. Your authorization token

Both of these can be found by clicking the URL you entered and navigating to the **Settings** page.

<table>
    <tr>
        <td><img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/1.png" /></td>
        <td><img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/2.png" /></td>
    </tr>
    <tr>
        <td><img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/3.png" /></td>
        <td><img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/4.png" /></td>
    </tr>
</table>

In [3]:
ORG_ID = 'mlops'
AUTH_TOKEN = 'enPAvBKUlC8v3A_EW2kYx5y9T862EBsQCh5ICNUVC-k'

Now just run the following code block to connect to the Fiddler API!

In [4]:
client = fdl.FiddlerApi(
    url=URL,
    org_id=ORG_ID,
    auth_token=AUTH_TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's `create_project` function.

In [5]:
PROJECT_ID = 'quickstart_example_Priyanka_Kumari'

client.create_project(PROJECT_ID)

{'project_name': 'quickstart_example_Priyanka_Kumari'}

You should now be able to see the newly created project on the UI.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/5.png" />
        </td>
    </tr>
</table>

## 2. Upload a baseline dataset

In this example, we'll be considering the case where we're a bank and we have **a model that predicts churn for our customers**.  
We want to know when our model's predictions start to drift—that is, **when churn starts to increase** within our customer base.
  
In order to get insights into the model's performance, **Fiddler needs a small  sample of data that can serve as a baseline** for making comparisons with data in production.


---


*For more information on how to design a baseline dataset, [click here](https://docs.fiddler.ai/docs/designing-a-baseline-dataset).*

In [6]:
PATH_TO_BASELINE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/churn_baseline.csv'

baseline_df = pd.read_csv(PATH_TO_BASELINE_CSV)
baseline_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,decision,churn
0,27acd1c2,545,Texas,Male,37,9,110483.86,1,1,1,127394.67,0.897202,high_risk,yes
1,27b36d0c,497,Texas,Female,55,7,131778.66,1,1,1,9972.64,0.997441,high_risk,yes
2,27b5360a,509,New York,Female,29,0,107712.57,2,1,1,92898.17,0.920563,high_risk,yes
3,27b5d650,743,Hawaii,Nonbinary,39,6,0.00,2,1,0,44265.28,0.779282,high_risk,yes
4,27b236a8,699,Florida,Female,25,8,0.00,2,1,1,52404.47,0.825474,high_risk,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,27b409ba,686,Texas,Male,39,3,129626.19,2,1,1,103220.56,0.760645,high_risk,yes
19996,27aaff96,446,Massachusetts,Female,45,10,125191.69,1,1,1,128260.86,0.216093,low_risk,no
19997,27ad3162,794,California,Male,35,6,0.00,2,1,1,68730.91,0.982021,high_risk,yes
19998,27b076ce,832,California,Male,61,2,0.00,1,0,1,127804.66,0.071598,low_risk,no


Fiddler uses this baseline dataset to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

---

You can construct a `DatasetInfo` object to be used as **a schema for keeping track of this information** by running the following code block.

In [7]:
dataset_info = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=100)
dataset_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,customer_id,STRING,,False,
1,creditscore,INTEGER,,False,350 - 850
2,geography,CATEGORY,6.0,False,
3,gender,CATEGORY,3.0,False,
4,age,INTEGER,,False,18 - 92
5,tenure,INTEGER,,False,0 - 10
6,balance,FLOAT,,False,"0.0 - 250,900.0"
7,numofproducts,INTEGER,,False,1 - 4
8,hascrcard,INTEGER,,False,0 - 1
9,isactivemember,INTEGER,,False,0 - 1


Then use the client's [upload_dataset](https://docs.fiddler.ai/reference/clientupload_dataset) function to send this information to Fiddler!
  
*Just include:*
1. A unique dataset ID
2. The baseline dataset as a pandas DataFrame
3. The [DatasetInfo](https://docs.fiddler.ai/reference/fdldatasetinfo) object you just created

In [9]:
DATASET_ID = 'churn_data'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': baseline_df
    },
    info=dataset_info
)

{'uuid': 'dfbcac3b-6256-4cca-b1dc-5e525e9f9915',
 'name': 'Ingestion dataset Upload',
 'info': {'project_name': 'quickstart_example_Priyanka_Kumari',
  'resource_name': 'churn_data',
  'resource_type': 'DATASET'},
 'status': 'SUCCESS',
 'progress': 100.0,
 'error_message': None}

If you click on your project, you should now be able to see the newly created dataset on the UI.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/6.png" />
        </td>
    </tr>
</table>

## 3. Add information about your model

Now it's time to add your model with Fiddler.  We do this by defining a [ModelInfo](https://docs.fiddler.ai/reference/fdlmodelinfo) object.


---


The [ModelInfo](https://docs.fiddler.ai/reference/fdlmodelinfo) object will contain some **information about how your model operates**.
  
*Just include:*
1. The **task** your model is performing (regression, binary classification, etc.)
2. The **target** (ground truth) column
3. The **output** (prediction) column
4. The **feature** columns
5. Any **metadata** columns which many not be in feature to do slicing later for analyzing dataset
6. Any **decision** columns (these measures the direct business decisions made as result of the model's prediction)


In [10]:
# Specify task
model_task = 'binary'

if model_task == 'regression':
    model_task = fdl.ModelTask.REGRESSION
    
elif model_task == 'binary':
    model_task = fdl.ModelTask.BINARY_CLASSIFICATION

elif model_task == 'multiclass':
    model_task = fdl.ModelTask.MULTICLASS_CLASSIFICATION
    
elif model_task == 'ranking':
    model_task = fdl.ModelTask.RANKING

    
# Specify column types
features = ['geography', 'gender', 'age', 'tenure', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
outputs = ['predicted_churn']
target = 'churn'
decision_cols = ['decision']
metadata_cols = ['customer_id']
    
# Generate ModelInfo
model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    dataset_id=DATASET_ID,
    model_task=model_task,
    features=features,
    outputs=outputs,
    target=target,
    decision_cols=decision_cols, # Optional
    metadata_cols=metadata_cols, # Optional
    binary_classification_threshold=0.5 # Optional
)
model_info

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,churn,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,geography,CATEGORY,6.0,False,
1,gender,CATEGORY,3.0,False,
2,age,INTEGER,,False,18 - 92
3,tenure,INTEGER,,False,0 - 10
4,balance,FLOAT,,False,"0.0 - 250,900.0"
5,numofproducts,INTEGER,,False,1 - 4
6,hascrcard,INTEGER,,False,0 - 1
7,isactivemember,INTEGER,,False,0 - 1
8,estimatedsalary,FLOAT,,False,"11.58 - 200,000.0"

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,predicted_churn,FLOAT,,False,0.0 - 1.0

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,decision,CATEGORY,2,False,

Unnamed: 0,column,dtype,count(possible_values),is_nullable,value_range
0,customer_id,STRING,,False,


Almost done! Now just specify a unique model ID and use the client's [add_model](https://docs.fiddler.ai/reference/clientadd_model) function to send this information to Fiddler.

In [11]:
MODEL_ID = 'churn_classifier'

client.add_model(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    model_id=MODEL_ID,
    model_info=model_info
)

Start initialize monitoring
Init monitoring succeeded:JOB UUID: b2e34abd-f9fd-4d64-a595-fb890dbe40fa task id: e0c1248f-c260-4380-a487-f95f7522bf57 result: {'result': 'SKETCH GENERATION RESULTS: \nNo event-weighted HISTOGRAM sketch generated, which is only applicable for data with class imbalance.\n No event-weighted FREQUENCY sketch generated, which is only applicable for data with class imbalance.\n No event-weighted NULL_COUNT sketch generated, which is only applicable for data with class imbalance.'}


On the project page, you should now be able to see the newly created model.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/7.png" />
        </td>
    </tr>
</table>

## 4. Publish production events

Information about your model is added to Fiddler and now it's time to start publishing some production data!  
Fiddler will **monitor this data and compare it to your baseline to generate powerful insights into how your model is behaving**.


---


Each record sent to Fiddler is called **an event**.
  
Let's load in some sample events from a CSV file.

In [12]:
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/churn_events.csv'

production_df = pd.read_csv(PATH_TO_EVENTS_CSV)

# Shift the timestamps of the production events to be as recent as today 
production_df['timestamp'] = production_df['timestamp'] + (int(time.time() * 1000) - production_df['timestamp'].max())

production_df

Unnamed: 0,customer_id,creditscore,geography,gender,age,tenure,balance,numofproducts,hascrcard,isactivemember,estimatedsalary,predicted_churn,decision,churn,timestamp
0,27c349a2,559,California,Male,52,2,0.00,1,1,0,129013.59,0.007448,low_risk,no,1662569461878
1,27c35cee,482,California,Male,55,5,97318.25,1,0,1,78416.14,0.804852,high_risk,yes,1662571890793
2,27c364f0,651,Florida,Female,46,4,89743.05,1,1,0,156425.57,0.012754,low_risk,no,1662574319709
3,27c3627a,611,Hawaii,Male,38,7,0.00,1,1,1,63202.00,0.882252,high_risk,yes,1662576748624
4,27c34164,696,California,Female,33,4,0.00,2,1,1,73371.65,0.999736,high_risk,yes,1662579177540
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,27c350b4,781,Hawaii,Female,48,0,57098.96,6,1,0,85644.06,0.032330,low_risk,no,1663164546215
246,27c357e4,797,Hawaii,Female,55,10,0.00,9,1,1,49418.87,0.020316,low_risk,no,1663166975131
247,27c36216,554,Hawaii,Male,31,1,0.00,7,0,1,192660.55,0.269628,low_risk,yes,1663169404046
248,27c34d12,701,Hawaii,Nonbinary,37,1,0.00,7,1,0,163457.55,0.769625,high_risk,yes,1663171832962


You can use the client's `publish_events_batch` function to start pumping data into Fiddler!
  
*Just include:*
1. The DataFrame containing your events
2. The name of the column containing event timestamps

In [13]:
client.publish_events_batch(
    project_id=PROJECT_ID,
    model_id=MODEL_ID,
    batch_source=production_df,
    timestamp_field='timestamp',
    id_field='customer_id' # Optional
)

{'status': 202,
 'job_uuid': 'c1953bf6-4ab3-4aae-a866-b61b760e9700',
 'files': ['tmprgqhf95o.csv'],
 'message': 'Successfully received the event data. Please allow time for the event ingestion to complete in the Fiddler platform.'}

## 5. Get insights

**You're all done!**
  
Now just head to your Fiddler URL and start getting enhanced observability into your model's performance.

Run the following code block to get your URL.

In [14]:
print('/'.join([URL, 'projects', PROJECT_ID, 'models', MODEL_ID, 'monitor']))

https://mlops.fiddler.ai//projects/quickstart_example_Priyanka_Kumari/models/churn_classifier/monitor


*Please allow 3-5 minutes for monitoring data to populate the charts.*
  
The following screen will be available to you upon completion.
<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-samples/master/content_root/tutorial/quickstart/images/8.png" />
        </td>
    </tr>
</table>



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

Join our [community Slack](http://fiddler-community.slack.com/) to ask any questions!

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.