Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
dcab64f
Start monitoring chapter
GiovanniGiacometti Oct 14, 2024
2a99b41
Why we need monitoring, start target and metrics
GiovanniGiacometti Oct 14, 2024
de1256c
Monitoring targets
GiovanniGiacometti Oct 16, 2024
10ccaa2
Finished base monitoring page + detection event and explainability
GiovanniGiacometti Oct 17, 2024
54f4b5e
Add metric description
GiovanniGiacometti Oct 18, 2024
6424315
Rearrange docs
GiovanniGiacometti Oct 21, 2024
73fd313
Complete detection event
GiovanniGiacometti Oct 21, 2024
5895c65
Rename monitoring.md to index.md, drift explainability report
GiovanniGiacometti Oct 22, 2024
315858c
Modification
GiovanniGiacometti Oct 23, 2024
813b30b
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti Oct 23, 2024
abb2c78
Fix links and small impr
GiovanniGiacometti Oct 23, 2024
a70c002
Add customer id in detection event page (will be added soon)
GiovanniGiacometti Oct 23, 2024
36fc8aa
Add section in monitoring index.md to explain how to access drift status
GiovanniGiacometti Oct 23, 2024
1bd4b24
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti Oct 29, 2024
d48f1e9
Quick fixes post pr
GiovanniGiacometti Oct 31, 2024
ee83085
Second set of corrections post pr
GiovanniGiacometti Oct 31, 2024
ecd4b26
Third set of corrections post pr
GiovanniGiacometti Oct 31, 2024
a638bc3
Detection event rules page
GiovanniGiacometti Oct 31, 2024
8ee083c
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti Nov 4, 2024
b0dda3c
Adapted new style; small rewrites
GiovanniGiacometti Nov 4, 2024
ed2152f
Drift Explainability Report Images
GiovanniGiacometti Nov 4, 2024
db4c759
Monitoring overview image
GiovanniGiacometti Nov 4, 2024
31ff43c
Last minor fixes
GiovanniGiacometti Nov 4, 2024
fa3a04f
New monitoring status state diagram image
GiovanniGiacometti Nov 4, 2024
413f74f
New state diagram for monitoring status
GiovanniGiacometti Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions md-docs/imgs/monitoring/drift-explainability/fi.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions md-docs/imgs/monitoring/drift-explainability/score.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions md-docs/imgs/monitoring/overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions md-docs/imgs/monitoring/states.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 9 additions & 1 deletion md-docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,12 @@
background-color: rgb(43, 155, 70);
-webkit-mask-image: var(--md-admonition-icon--code-block);
mask-image: var(--md-admonition-icon--code-block);
}
}

.nice-list ul{
list-style-type: circle;
}

.mermaid {
text-align: center;
}
4 changes: 2 additions & 2 deletions md-docs/user_guide/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Available categories are:
The [Data Schema] created for the [Task] contains a list of Column objects, each of which has a _Role_.
Naturally, there is a relationship between the Column's Role and the Data Category.
In fact, each Data Category comprises a set of Column objects with certain Roles.
So that, when you upload samples belonging to a Data Category, they must contains all the Columns objects declared on the Data Schema to be considered valid.
When you upload samples belonging to a Data Category, they must contain all the Columns objects declared on the Data Schema to be considered valid.

The following table shows these relationships:

Expand Down Expand Up @@ -130,7 +130,7 @@ For RAG Tasks, reference data can be used to indicate the type of data expected
You can set reference data as follow:

``` py
job_id = job_id = client.set_model_reference(
job_id = client.set_model_reference(
model_id=model_id,
from_timestamp=from_timestamp,
to_timestamp=to_timestamp,
Expand Down
52 changes: 0 additions & 52 deletions md-docs/user_guide/detection_event_rules.md

This file was deleted.

4 changes: 2 additions & 2 deletions md-docs/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ A **Task** is specified by several attributes, the most important are:

- `type`: regression, classification, object detection ...
- `data structure`: tabular data, image data, ...
- `optional target`: if the target is not always available. This happen when input samples are labeled and the most part of production data do not have a label
- `optional target`: if the target is not always available. This happens when input samples are labeled and the most part of production data do not have a label
- `data schema`: specifies the inputs and the target of the task, see [Data Schema](data_schema.md) section for more details
- `cost info`: information about the economic costs of the error on the target

Expand Down Expand Up @@ -110,7 +110,7 @@ Now that you have clear the basic concepts we invite you to explore the other ML

Discover how to setup automation rules to increase your reactivity.

[:octicons-arrow-right-24: More info](detection_event_rules.md)
[:octicons-arrow-right-24: More info](monitoring/detection_event_rules.md)

- :material-lock:{ .lg .middle } **Roles and access**

Expand Down
21 changes: 20 additions & 1 deletion md-docs/user_guide/model.md
Original file line number Diff line number Diff line change
@@ -1 +1,20 @@
# Model
# Model




[//]: # ()
[//]: # ()
[//]: # (What is additional probabilistic output?)

[//]: # ()
[//]: # (What is metric?)

[//]: # ()
[//]: # (What is suggestion type?)

[//]: # ()
[//]: # (What is retraining cost?)

[//]: # ()
[//]: # (What is retraining trigger?)
4 changes: 2 additions & 2 deletions md-docs/user_guide/modules/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ Modules can be always active or on-demand: Monitoring module and Drift Explainab

Data drift detection over data.

[:octicons-arrow-right-24: More info](user_guide/company.md)
[:octicons-arrow-right-24: More info](../monitoring/index.md)

- :material-compare:{ .lg .middle } **Drift Explainability**

---

Understand the nature of detected drift.

[:octicons-arrow-right-24: More info](user_guide/modules/index.md)
[:octicons-arrow-right-24: More info](../monitoring/drift_explainability.md)

- :material-speedometer:{ .lg .middle } **Retraining**

Expand Down
49 changes: 49 additions & 0 deletions md-docs/user_guide/monitoring/detection_event.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Detection Event

A Detection Event is raised by the ML cube Platform when a significant change is detected in one of the entities being monitored.

An event is characterized by the following attributes:

- `Event Type`: the type of the event. It's possible values are:
<div class="nice-list">
<ul>
<li> `Warning On`: the monitoring entity is experiencing slight changes that might lead to a drift.</li>
<li> `Warning Off`: the monitoring entity has returned to the reference distribution. </li>
<li> `Drift On`: the monitoring entity has drifted from the reference distribution.</li>
<li> `Drift Off`: the monitoring entity has returned to the reference distribution.</li>
</ul>
</div>
- `Severity`: the severity of the event. It's provided only for drift events and it can be `Low`, `Medium`, or `High`.
- `Monitoring Target`: the [Monitoring Target](index.md#monitoring-metrics) being monitored.
- `Monitoring Metric`: the [Monitoring Metric](index.md#monitoring-metrics) being monitored.
- `Model Name`: the name of the model that raised the event. It's present only if the event is related to a model.
- `Model Version`: the version of the model that raised the event. It's present only if the event is related to a model.
- `Insert datetime`: the time when the event was raised.
- `Sample timestamp`: the timestamp of the sample that triggered the event.
- `Sample customer ID`: the id of the customer that triggered the event.
- `User feedback`: the feedback provided by the user on whether the event was expected or not.

## Retrieve Detection Events

You can access the detection events generated by the Platform in two ways:

- **SDK**: it can be used to retrieve all detection events for a specific task programmatically.
- **WebApp**: navigate to the **`Detection `** section located in the task page's sidebar. Here, all detection events are displayed in a table,
with multiple filtering options available for useful event management. Additionally, the latest detection events identified are shown in the Task homepage,
in the section named "Latest Detection Events".

## User Feedback

When a detection event is raised, you can provide feedback on whether the event was expected or not. This feedback is then used
to tune the monitoring algorithms and improve their performance. The feedback can be provided through the WebApp, in the
**`Detection `** section of the task page, or through the SDK.


## Detection Event Rules

To automate actions upon the reception of a detection event, you can set up detection event rules.
You can learn more about how to configure them in the [Detection Event Rules] section.


[Monitoring]: index.md
[Detection Event Rules]: detection_event_rules.md
66 changes: 66 additions & 0 deletions md-docs/user_guide/monitoring/detection_event_rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Detection Event Rules

This section outlines how to configure automation to receive notifications or start retraining after a [Detection Event] occurs.

When a detection event is produced, the ML cube Platform reviews all the detection event rules you have set
and triggers those matching the event.

Rules are specific to a task and are characterized by the following attributes:

- `Name`: a descriptive label of the rule.
- `Detection Event Type`: the type of event that triggers the rule.
- `Severity`: the severity of the event that triggers the rule. It is only applicable to drift events. If not specified, the rule will be triggered by drift events of any severity.
- `Monitoring Target`: the [Monitoring Target](index.md#monitoring-targets) whose event should trigger the rule.
- `Monitoring Metric`: the [Monitoring Metric](index.md#monitoring-metrics) whose event should trigger the rule.
- `Model name`: the name of the model to which the rule applies. This is only required when the monitoring target is related to a model
(such as `ERROR` or `PREDICTION`).
- `Actions`: A list of actions to be executed sequentially when the rule is triggered.

## Detection Event Actions
Three types of actions are currently supported: notification, plot configuration and retrain.

### Notifications

These actions send notifications to external services when a detection event is triggered. The following notification actions are available:

- `Slack Notification`: sends a notification to a Slack channel via webhook.
- `Discord Notification`: sends a notification to a Discord channel via webhook.
- `Email Notification`: sends an email to the provided email address.
- `Teams Notification`: sends a notification to Microsoft Teams via webhook.
- `Mqtt Notification`: sends a notification to an MQTT broker.

### Plot Configuration

This action consists in creating two plot configurations when a detection event is triggered: the first one includes
data preceding the event, while the second one includes data following the event.

### Retrain

Retrain Action enables the automatic retraining of your model. Therefore, it is only available when the target of the rule is related to a model.
The retrain action does not need any parameter because it is automatically inferred from the `Model Name` attribute of the rule.
Of course, the model must already have a retrain trigger associated before setting up this action.

!!! example
The following code snippet demonstrates how to create a rule that matches high severity drift events on the error of a model.
When triggered, it first sends a notification to the `ml3-platform-notifications` channel on your Slack workspace, using the
provided webhook URL, and then starts the retraining of the model.

```py
rule_id = client.create_detection_event_rule(
name='Retrain model with notification',
task_id='my-task-id',
model_name='my-model',
severity=DetectionEventSeverity.HIGH,
detection_event_type=DetectionEventType.DRIFT_ON,
monitoring_target=MonitoringTarget.ERROR,
actions=[
SlackNotificationAction(
webhook='https://hooks.slack.com/services/...',
channel='ml3-platform-notifications'
),
RetrainAction()
],
)
```

[Detection Event]: detection_event.md
64 changes: 64 additions & 0 deletions md-docs/user_guide/monitoring/drift_explainability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Drift Explainability

[Monitoring] is a crucial aspect of the machine learning lifecycle, as it enables tracking the model's performance and its data over time,
ensuring the model continues to function as expected. However, monitoring only is not enough when it comes to the adaptation phase.

In order to make the right decisions, you need to understand what were the main factors that led to the drift in the first place, so that
the correct actions can be taken to mitigate it.

The ML cube Platform supports this process by offering what we refer to as **Drift Explainability Reports**,
automatically generated upon the detection of a drift and containing several elements that should help you diagnose the root causes
of the change occurred.

You can access the reports in the WebApp, by navigating to the `Drift Explainability` tab in the sidebar of the Task page.

## Structure

A Drift Explainability Report consists in comparing the reference data and the portion of production data where the drift was identified, hence
those belonging to the new data distribution. Notice that these reports are generated after a sufficient amount of samples has been collected
after the drift, in order to ensure statistical reliability of the results.
If the data distribution moves back to the reference before enough samples are collected, the report might not be generated.

Each report is composed of several entities, each providing a different perspective on the data and the drift occurred.
Most of them are specific to a certain `Data Structure`, so they might not be available for all tasks.

These entities can take the form of tables, plots, or textual explanations.
Observed and analyzed together, they should provide a comprehensive understanding of the drift and its underlying causes.
These are the entities currently available:

- `Feature Importance`: it's a barplot that illustrates how the significance of each feature differs between the reference
and the production datasets. Variations in a feature's values might suggest that its contribution to the model's predictions
has changed over time. This entity is available only for tasks with tabular data.

<figure markdown>
![Feature Importance](../../imgs/monitoring/drift-explainability/fi.svg)
<figcaption>Example of a feature importance plot.</figcaption>
</figure>

- `Variable discriminative power`: it's also a bar plot displays the influence of each feature, as well as the target,
in differentiating between the reference and the production datasets.
The values represent how strongly a given feature helps to distinguish the datasets, with higher values representing stronger
separating power. This entity is available only for tasks with tabular data.

<figure markdown>
![Variable discriminative power](../../imgs/monitoring/drift-explainability/concept-fi.svg)
<figcaption>Example of a variable discriminative power plot.</figcaption>
</figure>

- `Drift Score`: it's a line plot that shows the evolution of the drift score over time. The drift score is a
measure of the statistical distance between a sliding window of the production data and the reference data. It also shows the threshold,
which is the value that the drift score must exceed to raise a drift alarm, and all the [Detection Events] that were triggered in
the time frame of the report. This plot helps in understanding how the drift evolved over time and the moments in which the difference
between the two datasets was higher. Notice that some postprocessing is applied on the events to account for the functioning of the drift detection algorithms.
Specifically,
we shift back the drift on events by a certain offset, aiming to point at the precise time when the drift actually started. As a result,
drift on events might be shown before the threshold is exceeded. This explainability entity is available for all tasks.


<figure markdown style="width: 100%">
![Drift score](../../imgs/monitoring/drift-explainability/score.svg)
<figcaption style="width: 100%; text-align: center;">Example of a drift score plot with detection events of increasing severity displayed.</figcaption>
</figure>

[Monitoring]: index.md
[Detection Events]: detection_event.md
Loading