# Monitoring a Regression Model with Fiddler

In this notebook, we demonstrate how Fiddler can monitor the performance of regression models, detecting/alerting issues with data drift and performance metrics (ex. Mean Absolute Error).

---

Fiddler is the pioneer in enterprise Model Performance Management (MPM), offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and other LOB teams to **monitor, analyze, and improve ML deployments at enterprise scale**.
Obtain contextual insights at any stage of the ML lifecycle, improve predictions, increase transparency and fairness, and optimize business revenue.

---

You can start using Fiddler ***in minutes*** by following these quick steps:

1. Connect to Fiddler
2. Load a Data Sample
3. Define the Model Specifications
4. Set the Model Task
5. Create a Model
6. Set Up Alerts **(Optional)**
7. Publish a Pre-production Baseline **(Optional)**
8. Configure a Rolling Baseline **(Optional)**
9. Publish Production Events

# 0. Imports

In [None]:
%pip install -q fiddler-client

import time as time

import pandas as pd
import fiddler as fdl

print(f'Running Fiddler Python client version {fdl.__version__}')

## 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using the Fiddler Python client.


---


**We need a couple pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

Your authorization token can be found by navigating to the **Credentials** tab on the **Settings** page of your Fiddler environment.

In [124]:
URL = ''  # Make sure to include the full URL (including https:// e.g. 'https://your_company_name.fiddler.ai').
TOKEN = ''

Constants for this example notebook, change as needed to create your own versions

In [None]:
PROJECT_NAME = 'quickstart_examples'  # If the project already exists, the notebook will create the model under the existing project.
MODEL_NAME = 'airline_delay_model'

STATIC_BASELINE_NAME = 'baseline_dataset'
ROLLING_BASELINE_NAME = 'rolling_baseline_1week'

# Sample data hosted on GitHub
PATH_TO_SAMPLE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/flight_baseline.csv'
PATH_TO_EVENTS_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/flight_events_drift.csv'

Now just run the following to connect to your Fiddler environment.

In [None]:
fdl.init(url=URL, token=TOKEN)

#### 1.a Create New or Load Existing Project

Once you connect, you can create a new project by specifying a unique project name in the fld.Project constructor and calling the `create()` method. If the project already exists, it will load it for use.

In [None]:
project = fdl.Project.get_or_create(name=PROJECT_NAME)

# Check if the project has an ID to distinguish new vs existing
print(f'{"Created new" if project.id is None else "Retrieved existing"} project with id = {project.id} and name = {project.name}')

You should now be able to see the newly created project in the Fiddler UI.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/regression_model_1.png" />
        </td>
    </tr>
</table>

## 2. Load a Data Sample

In this example, we'll be considering the case where we have airline data and we have **a model that predicts the arrival delay**.
  
In order to get insights into the model's performance, **Fiddler needs a small sample of data** to learn the schema of incoming data.

In [None]:
sample_data_df = pd.read_csv(PATH_TO_SAMPLE_CSV)
column_list  = sample_data_df.columns

In [None]:
# Display the columns in dataset for reference
column_list

## 3. Define the Model Specifications

In order to create a model in Fiddler, create a ModelSpec object with information about what each column of your data sample should used for.

Fiddler supports four column types:
1. **Inputs**
2. **Outputs** (Model predictions)
3. **Target** (Ground truth values)
4. **Metadata**

In [None]:
input_columns = list(
    column_list.drop(['predicted_ARRIVAL_DELAY', 'ARRIVAL_DELAY'])
)
output_column = ['predicted_ARRIVAL_DELAY']
target_column = ['ARRIVAL_DELAY']

In [None]:
model_spec = fdl.ModelSpec(
    inputs=input_columns,
    outputs=output_column,
    targets=target_column,  # Note: only a single Target column is allowed, use metadata columns and custom metrics for additional targets
)

If you have columns in your ModelSpec which denote **prediction IDs or timestamps**, then Fiddler can use these to power its analytics accordingly.

Let's call them out here and use them when configuring the Model in step 5.

In [None]:
# The baseline dataset does not have a timestamp, but we can expect a timestamp column for production events.
timestamp_column = 'timestamp'

## 4. Set the Model Task

Fiddler supports a variety of model tasks. In this case, we're adding a **regression** model.

For this, we'll create the regression ModelTask object.

*For a detailed breakdown of all supported model tasks, click [here](https://docs.fiddler.ai/product-guide/task-types).*

In [None]:
model_task = fdl.ModelTask.REGRESSION

## 5. Create a Model

Create a Model object and publish it to Fiddler, passing in
1. Your data sample
2. The ModelSpec object
3. The ModelTask and ModelTaskParams objects
4. The ID and timestamp columns

In [None]:
model = fdl.Model.from_data(
    name=MODEL_NAME,
    project_id=project.id,
    source=sample_data_df,
    spec=model_spec,
    task=model_task,
    event_ts_col=timestamp_column,
)

model.create()
print(f'New model created with id = {model.id} and name = {model.name}')

On the project page, you should now be able to see the newly onboarded model with its model schema.

<table>
    <tr>
        <td>
            <img src="https://github.com/fiddler-labs/fiddler-examples/blob/main/quickstart/images/regression_model_2.png?raw=true" />
        </td>
    </tr>
</table>

## 6. Publish a Static Baseline (Optional)

Since Fiddler already knows how to process data for your model, we can now add a **baseline dataset**.

You can think of this as a static dataset which represents **"golden data,"** or the kind of data your model expects to receive.

Then, once we start sending production data to Fiddler, you'll be able to see **drift scores** telling you whenever it starts to diverge from this static baseline.

***

Let's publish our **original data sample** as a pre-production dataset. This will automatically add it as a baseline for the model.


*For more information on how to design your baseline dataset, [click here](https://docs.fiddler.ai/client-guide/creating-a-baseline-dataset).*

In [None]:
baseline_publish_job = model.publish(
    source=sample_data_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME,
)
print(
    f'Initiated pre-production environment data upload with Job ID = {baseline_publish_job.id}'
)

# Uncomment the line below to wait for the job to finish, otherwise it will run in the background.
# You can check the status on the Jobs page in the Fiddler UI or use the job ID to query the job status via the API.
# baseline_publish_job.wait()

## 7. Set Up Alerts (Optional)

Fiddler allows creating alerting rules when your data or model predictions deviate from expected behavior.

The alert rules can compare metrics to **absolute** or **relative** values.

Please refer to [our documentation](https://docs.fiddler.ai/client-guide/alerts-with-fiddler-client) for more information on Alert Rules.

---
  
Let's set up some alert rules.

The following API call sets up a Performance type rule which triggers an email notification when the Mean Absolute Error (MAE) for published events are below a Warning or Critical threshold.

In [None]:
alert_rule_1 = fdl.AlertRule(
    name='MAE Performance Alert',
    model_id=model.id,
    metric_id='mae',
    bin_size=fdl.BinSize.DAY,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=40,
    critical_threshold=44.0,
    condition=fdl.AlertCondition.GREATER,
)

alert_rule_1.create()
print(
    f'New alert rule created with id = {alert_rule_1.id} and name = {alert_rule_1.name}'
)

# Set notification configuration for the alert rule, a single email address for this simple example
alert_rule_1.set_notification_config(emails=['name@google.com'])

## 8. Configure a Rolling Baseline (Optional)

Fiddler also allows you to configure a baseline based on **past production data.**

This means instead of looking at a static slice of data, it will look into past production events and use what it finds for drift calculation.

Please refer to [our documentation](https://docs.fiddler.ai/client-guide/creating-a-baseline-dataset) for more information on Baselines.

---
  
Let's set up a rolling baseline that will allow us to calculate drift relative to production data from 1 week back.

In [None]:
rolling_baseline = fdl.Baseline(
    model_id=model.id,
    name=ROLLING_BASELINE_NAME,
    type_=fdl.BaselineType.ROLLING,
    environment=fdl.EnvType.PRODUCTION,
    window_bin_size=fdl.WindowBinSize.DAY,  # Size of the sliding window
    offset_delta=7,  # How far back to set our window (multiple of window_bin_size)
)

rolling_baseline.create()
print(
    f'New rolling baseline created with id = {rolling_baseline.id} and name = {rolling_baseline.name}'
)

## 9. Publish Production Events

Finally, let's send in some production data!


Fiddler will **monitor this data and compare it to your baseline to generate powerful insights into how your model is behaving**.


---


Each record sent to Fiddler is called **an event**.
  
Let's load some sample events from a CSV file.

In [None]:
production_data_df = pd.read_csv(PATH_TO_EVENTS_CSV)

# Ensure string timestamp is in proper format
production_data_df['timestamp'] = pd.to_datetime(production_data_df['timestamp'])

# OPTIONAL: shift the production data timestamps to be as recent as today

current_time = pd.Timestamp.now()

# Create a list of timestamps, increasing by one hour for each row
n_rows = len(production_data_df)
shifted_times = [current_time - pd.Timedelta(hours=(n_rows - 1 - i)) for i in range(n_rows)]

# Assign the shifted times to the dataframe
production_data_df['timestamp'] = shifted_times

production_data_df

In [None]:
production_publish_job = model.publish(production_data_df)

print(
    f'Initiated production environment data upload with Job ID = {production_publish_job.id}'
)

# Uncomment the line below to wait for the job to finish, otherwise it will run in the background.
# You can check the status on the Jobs page in the Fiddler UI or use the job ID to query the job status via the API.
# production_publish_job.wait()

# Get Insights
  
Return to your Fiddler environment to get enhanced observability into your model's performance.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/regression_model_3.png" />
        </td>
    </tr>
</table>

**What's Next?**

Try the [LLM Monitoring - Quick Start Notebook](https://docs.fiddler.ai/quickstart-notebooks/simple-llm-monitoring)

---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.