-
Notifications
You must be signed in to change notification settings - Fork 0
Add monitoring section #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
dcab64f
Start monitoring chapter
GiovanniGiacometti 2a99b41
Why we need monitoring, start target and metrics
GiovanniGiacometti de1256c
Monitoring targets
GiovanniGiacometti 10ccaa2
Finished base monitoring page + detection event and explainability
GiovanniGiacometti 54f4b5e
Add metric description
GiovanniGiacometti 6424315
Rearrange docs
GiovanniGiacometti 73fd313
Complete detection event
GiovanniGiacometti 5895c65
Rename monitoring.md to index.md, drift explainability report
GiovanniGiacometti 315858c
Modification
GiovanniGiacometti 813b30b
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti abb2c78
Fix links and small impr
GiovanniGiacometti a70c002
Add customer id in detection event page (will be added soon)
GiovanniGiacometti 36fc8aa
Add section in monitoring index.md to explain how to access drift status
GiovanniGiacometti 1bd4b24
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti d48f1e9
Quick fixes post pr
GiovanniGiacometti ee83085
Second set of corrections post pr
GiovanniGiacometti ecd4b26
Third set of corrections post pr
GiovanniGiacometti a638bc3
Detection event rules page
GiovanniGiacometti 8ee083c
Merge remote-tracking branch 'origin/dev' into dev-monitoring
GiovanniGiacometti b0dda3c
Adapted new style; small rewrites
GiovanniGiacometti ed2152f
Drift Explainability Report Images
GiovanniGiacometti db4c759
Monitoring overview image
GiovanniGiacometti 31ff43c
Last minor fixes
GiovanniGiacometti fa3a04f
New monitoring status state diagram image
GiovanniGiacometti 413f74f
New state diagram for monitoring status
GiovanniGiacometti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,20 @@ | ||
| # Model | ||
| # Model | ||
|
|
||
|
|
||
|
|
||
|
|
||
| [//]: # () | ||
| [//]: # () | ||
| [//]: # (What is additional probabilistic output?) | ||
|
|
||
| [//]: # () | ||
| [//]: # (What is metric?) | ||
|
|
||
| [//]: # () | ||
| [//]: # (What is suggestion type?) | ||
|
|
||
| [//]: # () | ||
| [//]: # (What is retraining cost?) | ||
|
|
||
| [//]: # () | ||
| [//]: # (What is retraining trigger?) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Detection Event | ||
|
|
||
| A Detection Event is raised by the ML cube Platform when a significant change is detected in one of the entities being monitored. | ||
|
|
||
| An event is characterized by the following attributes: | ||
GiovanniGiacometti marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - `Event Type`: the type of the event. It's possible values are: | ||
| <div class="nice-list"> | ||
| <ul> | ||
| <li> `Warning On`: the monitoring entity is experiencing slight changes that might lead to a drift.</li> | ||
| <li> `Warning Off`: the monitoring entity has returned to the reference distribution. </li> | ||
| <li> `Drift On`: the monitoring entity has drifted from the reference distribution.</li> | ||
| <li> `Drift Off`: the monitoring entity has returned to the reference distribution.</li> | ||
| </ul> | ||
| </div> | ||
| - `Severity`: the severity of the event. It's provided only for drift events and it can be `Low`, `Medium`, or `High`. | ||
| - `Monitoring Target`: the [Monitoring Target](index.md#monitoring-metrics) being monitored. | ||
| - `Monitoring Metric`: the [Monitoring Metric](index.md#monitoring-metrics) being monitored. | ||
| - `Model Name`: the name of the model that raised the event. It's present only if the event is related to a model. | ||
| - `Model Version`: the version of the model that raised the event. It's present only if the event is related to a model. | ||
| - `Insert datetime`: the time when the event was raised. | ||
| - `Sample timestamp`: the timestamp of the sample that triggered the event. | ||
| - `Sample customer ID`: the id of the customer that triggered the event. | ||
| - `User feedback`: the feedback provided by the user on whether the event was expected or not. | ||
|
|
||
| ## Retrieve Detection Events | ||
|
|
||
| You can access the detection events generated by the Platform in two ways: | ||
|
|
||
| - **SDK**: it can be used to retrieve all detection events for a specific task programmatically. | ||
| - **WebApp**: navigate to the **`Detection `** section located in the task page's sidebar. Here, all detection events are displayed in a table, | ||
GiovanniGiacometti marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| with multiple filtering options available for useful event management. Additionally, the latest detection events identified are shown in the Task homepage, | ||
| in the section named "Latest Detection Events". | ||
|
|
||
| ## User Feedback | ||
|
|
||
| When a detection event is raised, you can provide feedback on whether the event was expected or not. This feedback is then used | ||
| to tune the monitoring algorithms and improve their performance. The feedback can be provided through the WebApp, in the | ||
| **`Detection `** section of the task page, or through the SDK. | ||
|
|
||
|
|
||
| ## Detection Event Rules | ||
|
|
||
| To automate actions upon the reception of a detection event, you can set up detection event rules. | ||
| You can learn more about how to configure them in the [Detection Event Rules] section. | ||
|
|
||
|
|
||
| [Monitoring]: index.md | ||
| [Detection Event Rules]: detection_event_rules.md | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Detection Event Rules | ||
GiovanniGiacometti marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| This section outlines how to configure automation to receive notifications or start retraining after a [Detection Event] occurs. | ||
|
|
||
| When a detection event is produced, the ML cube Platform reviews all the detection event rules you have set | ||
| and triggers those matching the event. | ||
|
|
||
| Rules are specific to a task and are characterized by the following attributes: | ||
|
|
||
| - `Name`: a descriptive label of the rule. | ||
| - `Detection Event Type`: the type of event that triggers the rule. | ||
| - `Severity`: the severity of the event that triggers the rule. It is only applicable to drift events. If not specified, the rule will be triggered by drift events of any severity. | ||
| - `Monitoring Target`: the [Monitoring Target](index.md#monitoring-targets) whose event should trigger the rule. | ||
| - `Monitoring Metric`: the [Monitoring Metric](index.md#monitoring-metrics) whose event should trigger the rule. | ||
| - `Model name`: the name of the model to which the rule applies. This is only required when the monitoring target is related to a model | ||
| (such as `ERROR` or `PREDICTION`). | ||
| - `Actions`: A list of actions to be executed sequentially when the rule is triggered. | ||
|
|
||
| ## Detection Event Actions | ||
| Three types of actions are currently supported: notification, plot configuration and retrain. | ||
|
|
||
| ### Notifications | ||
|
|
||
| These actions send notifications to external services when a detection event is triggered. The following notification actions are available: | ||
|
|
||
| - `Slack Notification`: sends a notification to a Slack channel via webhook. | ||
| - `Discord Notification`: sends a notification to a Discord channel via webhook. | ||
| - `Email Notification`: sends an email to the provided email address. | ||
| - `Teams Notification`: sends a notification to Microsoft Teams via webhook. | ||
| - `Mqtt Notification`: sends a notification to an MQTT broker. | ||
|
|
||
| ### Plot Configuration | ||
|
|
||
| This action consists in creating two plot configurations when a detection event is triggered: the first one includes | ||
| data preceding the event, while the second one includes data following the event. | ||
|
|
||
| ### Retrain | ||
|
|
||
| Retrain Action enables the automatic retraining of your model. Therefore, it is only available when the target of the rule is related to a model. | ||
| The retrain action does not need any parameter because it is automatically inferred from the `Model Name` attribute of the rule. | ||
| Of course, the model must already have a retrain trigger associated before setting up this action. | ||
|
|
||
| !!! example | ||
| The following code snippet demonstrates how to create a rule that matches high severity drift events on the error of a model. | ||
| When triggered, it first sends a notification to the `ml3-platform-notifications` channel on your Slack workspace, using the | ||
| provided webhook URL, and then starts the retraining of the model. | ||
|
|
||
| ```py | ||
| rule_id = client.create_detection_event_rule( | ||
| name='Retrain model with notification', | ||
| task_id='my-task-id', | ||
| model_name='my-model', | ||
| severity=DetectionEventSeverity.HIGH, | ||
| detection_event_type=DetectionEventType.DRIFT_ON, | ||
| monitoring_target=MonitoringTarget.ERROR, | ||
| actions=[ | ||
| SlackNotificationAction( | ||
| webhook='https://hooks.slack.com/services/...', | ||
| channel='ml3-platform-notifications' | ||
| ), | ||
| RetrainAction() | ||
| ], | ||
| ) | ||
| ``` | ||
|
|
||
| [Detection Event]: detection_event.md | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # Drift Explainability | ||
GiovanniGiacometti marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| [Monitoring] is a crucial aspect of the machine learning lifecycle, as it enables tracking the model's performance and its data over time, | ||
| ensuring the model continues to function as expected. However, monitoring only is not enough when it comes to the adaptation phase. | ||
|
|
||
| In order to make the right decisions, you need to understand what were the main factors that led to the drift in the first place, so that | ||
| the correct actions can be taken to mitigate it. | ||
|
|
||
| The ML cube Platform supports this process by offering what we refer to as **Drift Explainability Reports**, | ||
| automatically generated upon the detection of a drift and containing several elements that should help you diagnose the root causes | ||
| of the change occurred. | ||
|
|
||
| You can access the reports in the WebApp, by navigating to the `Drift Explainability` tab in the sidebar of the Task page. | ||
|
|
||
| ## Structure | ||
|
|
||
| A Drift Explainability Report consists in comparing the reference data and the portion of production data where the drift was identified, hence | ||
| those belonging to the new data distribution. Notice that these reports are generated after a sufficient amount of samples has been collected | ||
| after the drift, in order to ensure statistical reliability of the results. | ||
| If the data distribution moves back to the reference before enough samples are collected, the report might not be generated. | ||
|
|
||
| Each report is composed of several entities, each providing a different perspective on the data and the drift occurred. | ||
| Most of them are specific to a certain `Data Structure`, so they might not be available for all tasks. | ||
|
|
||
| These entities can take the form of tables, plots, or textual explanations. | ||
| Observed and analyzed together, they should provide a comprehensive understanding of the drift and its underlying causes. | ||
| These are the entities currently available: | ||
|
|
||
| - `Feature Importance`: it's a barplot that illustrates how the significance of each feature differs between the reference | ||
GiovanniGiacometti marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| and the production datasets. Variations in a feature's values might suggest that its contribution to the model's predictions | ||
| has changed over time. This entity is available only for tasks with tabular data. | ||
|
|
||
| <figure markdown> | ||
|  | ||
| <figcaption>Example of a feature importance plot.</figcaption> | ||
| </figure> | ||
|
|
||
| - `Variable discriminative power`: it's also a bar plot displays the influence of each feature, as well as the target, | ||
| in differentiating between the reference and the production datasets. | ||
| The values represent how strongly a given feature helps to distinguish the datasets, with higher values representing stronger | ||
| separating power. This entity is available only for tasks with tabular data. | ||
|
|
||
| <figure markdown> | ||
|  | ||
| <figcaption>Example of a variable discriminative power plot.</figcaption> | ||
| </figure> | ||
|
|
||
| - `Drift Score`: it's a line plot that shows the evolution of the drift score over time. The drift score is a | ||
| measure of the statistical distance between a sliding window of the production data and the reference data. It also shows the threshold, | ||
| which is the value that the drift score must exceed to raise a drift alarm, and all the [Detection Events] that were triggered in | ||
| the time frame of the report. This plot helps in understanding how the drift evolved over time and the moments in which the difference | ||
| between the two datasets was higher. Notice that some postprocessing is applied on the events to account for the functioning of the drift detection algorithms. | ||
| Specifically, | ||
| we shift back the drift on events by a certain offset, aiming to point at the precise time when the drift actually started. As a result, | ||
| drift on events might be shown before the threshold is exceeded. This explainability entity is available for all tasks. | ||
|
|
||
|
|
||
| <figure markdown style="width: 100%"> | ||
|  | ||
| <figcaption style="width: 100%; text-align: center;">Example of a drift score plot with detection events of increasing severity displayed.</figcaption> | ||
| </figure> | ||
|
|
||
| [Monitoring]: index.md | ||
| [Detection Events]: detection_event.md | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.