# Model Monitoring with Surrogate Model

This tutorial shows how to use Model Monitoring in Fiddler if you only have a Dataset 
and no model artifacts. The tutorial has two parts, part one is to upload your dataset 
and use Fiddler's surrogate model capability to generate a model and part two is to ingest 
the monitoring events using `publish_event` API and use the surrogate model to monitor 
your production traffic.

## Initialize Fiddler Client
We begin this section as usual by establishing a connection to our
Fiddler instance. We can establish this connection either by specifying 
our credentials directly, or by utilizing our `fiddler.ini` file. More
information can be found in the [setup](https://github.com/fiddler-labs/fiddler-samples/blob/master/content_root/tutorial/00%20Setup.ipynb) section.


In [2]:
import fiddler as fdl

# client = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=auth_token)
client = fdl.FiddlerApi()

## Load Baseline

Here we will load in our baseline dataset from a csv called `p2p_loans.csv`. We will
also create a schema using this information.

In [3]:
import pandas as pd
baseline_df = pd.read_csv('/app/fiddler_samples/samples/datasets/p2p_loans/p2p_loans.csv')
baseline_schema = fdl.DatasetInfo.from_dataframe(baseline_df, max_inferred_cardinality=1000)

## Setup Monitoring Using Surrogate Model
Now, we will setup a project, and use Fiddler's surrogate model capability to generate a model. 
Projects are one of the key entities of Fiddler. They are convenient containers 
for housing the models and datasets associated with a given ML use case. Specifics about
projects can be found [here](https://docs.fiddler.ai/components/#project)

In [4]:
project_id = 'tutorial'
model_id = 'loan_status_surrogate'
target='loan_status'
features = ['loan_amnt', 'int_rate', 'annual_inc', 'dti', 'fico_range_low', 'total_acc', 'acc_open_past_24mths']

## setup/cleanup project
if project_id in client.list_projects():
    client.delete_model(project_id, model_id)
    client.delete_dataset(model_id)
else:
    client.create_project(project_id)


client.create_surrogate_model(
    project_id,
    model_id,
    baseline_df,
    baseline_schema,
    target,
    features
)


Validating inputs ...
Uploading dataset ...
Generating surrogate model ...
Triggering model predictions ...


'Surrogate model successfully setup on Fiddler. \n Visit http://localhost:4100/projects/tutorial '

## Send Monitoring Events
In this step, we will be simulating traffic to send for our model monitoring by using 
[publish_event](https://docs.fiddler.ai/api-reference/python-package/#publish-event). 
This will be the equivalent of running our model separately on data, and either 
sending to Fiddler then, or saving this information to a log and sending at a later point.

For this demonstration, we will be going with a log-related approach. 
This log, `p2p_production.log` will contains rows that have inputs and predictions. 
To most accurately simulate this as a time-series event, we will also be calling 
a function to generate a timestamp in the last two weeks. Real data will ideally 
have a timestamp related to when the event took place; otherwise, the current 
time will be used.

**Note**: The timestamp must be in UTC milliseconds. See 
[here](https://docs.fiddler.ai/api-reference/python-package/#publish-event) for more details

In [8]:
import datetime
import time
from IPython.display import clear_output
from random import sample, randint
NUM_EVENTS_TO_SEND = 50

def getTimestampFromPastTwoWeeks():
    """
    Generate a randomized timestamp from the past two weeks. Timestamp is in 
    milliseconds since epoch in UTC.
    """
    TWO_WEEKS_MS = 604800 * 2 * 1000
    current_time_in_ms = round(time.time() * 1000)
    
    random_time_in_past_two_weeks = current_time_in_ms - randint(0, TWO_WEEKS_MS)
    return random_time_in_past_two_weeks

result_df = pd.read_csv('/app/fiddler_samples/samples/datasets/p2p_loans/p2p_production.log')
# Convert this dataframe into a list of dictionary events, where each event is its own dictionary
event_list_dict = result_df.sample(n=NUM_EVENTS_TO_SEND, random_state=42).to_dict(orient='records') 

for ind, event_dict in enumerate(event_list_dict):
    event_ms_time_stamp = getTimestampFromPastTwoWeeks()
    result = client.publish_event(project_id, model_id, event_dict, event_time_stamp=event_ms_time_stamp)
    
    clear_output(wait = True)
    readable_timestamp = datetime.datetime.fromtimestamp(event_ms_time_stamp/1000.0)
    
    print(f'Sending {ind+1} / {NUM_EVENTS_TO_SEND} \n{readable_timestamp} UTC: \n{event_dict}')
    time.sleep(1)

Sending 50 / 50 
2020-12-08 09:39:26.685000 UTC: 
{'loan_amnt': 25925, 'int_rate': 6.99, 'annual_inc': 160000.0, 'dti': 12.99, 'fico_range_low': 675, 'total_acc': 17, 'acc_open_past_24mths': 4, 'probability_fully_paid': 0.893918146366147, '__event_type': 'execution_event', '__occurred_at': 1607420366685}
