From 22d543814898362709d9c920c152209660ead914 Mon Sep 17 00:00:00 2001 From: Giovanni Giacometti Date: Mon, 4 Nov 2024 15:22:20 +0100 Subject: [PATCH 1/4] New state diagram for monitoring status --- md-docs/stylesheets/extra.css | 6 +++++- md-docs/user_guide/model.md | 21 ++++++++++++++++++++- md-docs/user_guide/monitoring/index.md | 21 +++++++++++++++++---- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/md-docs/stylesheets/extra.css b/md-docs/stylesheets/extra.css index 1677134..49b3f25 100644 --- a/md-docs/stylesheets/extra.css +++ b/md-docs/stylesheets/extra.css @@ -34,4 +34,8 @@ .nice-list ul{ list-style-type: circle; -} \ No newline at end of file +} + +.mermaid { + text-align: center; + } \ No newline at end of file diff --git a/md-docs/user_guide/model.md b/md-docs/user_guide/model.md index ace4641..ba492e2 100644 --- a/md-docs/user_guide/model.md +++ b/md-docs/user_guide/model.md @@ -1 +1,20 @@ -# Model \ No newline at end of file +# Model + + + + +[//]: # () +[//]: # () +[//]: # (What is additional probabilistic output?) + +[//]: # () +[//]: # (What is metric?) + +[//]: # () +[//]: # (What is suggestion type?) + +[//]: # () +[//]: # (What is retraining cost?) + +[//]: # () +[//]: # (What is retraining trigger?) \ No newline at end of file diff --git a/md-docs/user_guide/monitoring/index.md b/md-docs/user_guide/monitoring/index.md index df43f04..1cfd14d 100644 --- a/md-docs/user_guide/monitoring/index.md +++ b/md-docs/user_guide/monitoring/index.md @@ -121,10 +121,23 @@ All the entities being monitored are associated with a status, which can be one The following diagram illustrates the possible transitions between the statuses. Each transition is triggered by a [Detection Event] and the status of the entity is updated accordingly. -
- ![Drift score](../../imgs/monitoring/states.svg) -
Monitoring status state diagram
-
+```mermaid +stateDiagram-v2 + + [*] --> OK : Initial State + + OK --> WARNING : Warning On + WARNING --> OK : Set new reference + WARNING --> OK : Warning Off + + + WARNING --> DRIFT : Drift On + DRIFT --> WARNING : Drift Off + + DRIFT --> OK : Set new reference + DRIFT --> OK : Drift Off +``` + Notice that a Drift OFF event can either bring the entity back to the `OK` status or to the `WARNING` status, depending on the velocity of the change and the monitoring algorithm's sensitivity. The same applies From 89010a4f2568ec696919bacd02f5507164019aac Mon Sep 17 00:00:00 2001 From: Giovanni Giacometti Date: Mon, 4 Nov 2024 18:06:21 +0100 Subject: [PATCH 2/4] Init Model page --- md-docs/user_guide/model.md | 63 ++++++++++++++++++++++++++ md-docs/user_guide/monitoring/index.md | 2 +- 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/md-docs/user_guide/model.md b/md-docs/user_guide/model.md index ba492e2..650639a 100644 --- a/md-docs/user_guide/model.md +++ b/md-docs/user_guide/model.md @@ -1,7 +1,70 @@ # Model +In the ML Cube Platform, a Model is a representation of the actual machine learning model used for making predictions. The data used +for its training usually represent the reference data distribution, while production data comprises the data on which the model +performs inference. +A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. Currently, we only support one model +per Task. +A Model is defined by a name and a version. The version is updated whenever the model is retrained, allowing to +track the latest version of the model and the data used for its training. When predictions are uploaded to the platform, +the model version needs to be appropriately specified, following the guidelines in the [Data Schema] page, to ensure that the +predictions are associated to the correct model version. + +!!! note + You don't need to upload the **real** model on the Platform. We only require its training data and predictions. + The entity you create on the Platform serves more as a placeholder for the model. For this reason, + the ML cube Platform is considered *model agnostic*. + + +### RAG Model + +RAG Tasks represent an exception to the model framework presented before. In this type of Tasks, the model +is a Large Language Model (LLM), that is used to generate responses to user queries. The model is not trained on a specific dataset +but is rather a pre-trained model that is fine-tuned on the user's data, which means that the classic process of training and +retraining does not apply. + +To maintain a coherent Model definition across task types, the RAG model is also represented as a Model, +but an update of its version represents an update of the reference data distribution and not necessarily +an update of the model itself. Moreover, most of the attributes which will be described in the following sections +are not applicable, as they are related to the retraining module, which is not usable in RAG tasks. + +### Probabilistic output + +When creating a model, you can specify if you want to provide also the probabilistic output of the model along with the predictions. +The probabilistic output represents the probability or confidence score associated with the model's predictions. If provided, +the ML cube Platform will use this information to compute additional metrics and insights. + +It is optional and currently supported only for Classification and RAG tasks. If specified, the probabilistic output must be provided +as a new column in the predictions file, following the guidelines in the [Data Schema] page. + +### Metric + +A Model Metric represents the evaluation metric used to assess the performance of the model. +It can both represent a performance or an error. The chosen metric will be used in the various views of the platform to +provide insights on the model's performance. The available options are: + +- `Accuracy`, for classification tasks +- `RMSE`, for regression tasks +- `R2`, for regression tasks +- `Average Precision`, for Object Detection tasks + +RAG tasks have no metric, as in that case the model is an LLM for which classic definitions of metrics are not applicable. + +### Suggestion Type + +The Suggestion Type represents the type of suggestion that the ML cube Platform should provide when computing the +[retraining dataset](modules/retraining.md#retraining-dataset). The available options are: + +- `Sample Weights`: each sample uploaded in ML cube Platform is assigned a weight that can be used as sample weight in a weighted loss function. + The higher the weight, the greater the importance of the sample for the new retraining. +- `Resampled Dataset`: a list of sample ids (using data schema column object with role ID) is provided indicating which data form the retraining dataset. + This format can be used when the training procedure does not support weighted loss or when a fixed size retraining dataset is preferred. + Note that samples ids can appear more than once: this happens when a sample is particularly important for the new retraining. + +[Task]: task.md +[Data Schema]: data_schema.md [//]: # () [//]: # () diff --git a/md-docs/user_guide/monitoring/index.md b/md-docs/user_guide/monitoring/index.md index 1cfd14d..1cba281 100644 --- a/md-docs/user_guide/monitoring/index.md +++ b/md-docs/user_guide/monitoring/index.md @@ -139,7 +139,7 @@ stateDiagram-v2 ``` -Notice that a Drift OFF event can either bring the entity back to the `OK` status or to the `WARNING` status, +Notice that a Drift Off event can either bring the entity back to the `OK` status or to the `WARNING` status, depending on the velocity of the change and the monitoring algorithm's sensitivity. The same applies to the Drift ON events, which can both happen when the entity is in the `WARNING` status or in the `OK` status. From 4ac0545863cebd8f8a5df6c401c904390c125924 Mon Sep 17 00:00:00 2001 From: Giovanni Giacometti Date: Tue, 5 Nov 2024 10:41:53 +0100 Subject: [PATCH 3/4] Model page Small fixes in Retrain Trigger page --- ...retrain_triggers.md => retrain_trigger.md} | 36 ++++++---- md-docs/user_guide/model.md | 65 ++++++++++--------- .../monitoring/detection_event_rules.md | 2 +- 3 files changed, 59 insertions(+), 44 deletions(-) rename md-docs/user_guide/integrations/{retrain_triggers.md => retrain_trigger.md} (81%) diff --git a/md-docs/user_guide/integrations/retrain_triggers.md b/md-docs/user_guide/integrations/retrain_trigger.md similarity index 81% rename from md-docs/user_guide/integrations/retrain_triggers.md rename to md-docs/user_guide/integrations/retrain_trigger.md index b3c3360..31a3668 100644 --- a/md-docs/user_guide/integrations/retrain_triggers.md +++ b/md-docs/user_guide/integrations/retrain_trigger.md @@ -1,8 +1,11 @@ -This section offers an overview of setting up retrain triggers for your models. These triggers enable the automatic initiation of your retraining pipeline from the ML cube Platform. +# Retrain Trigger -A retrain trigger can be utilized within a Detection Event Rule. When specific criteria are met, it automatically generates the retrain report and activates the trigger. Alternatively, you can manually activate the trigger for the model on the retraining tool page. +This section offers an overview of how you can set up a retrain trigger for your model. +Retrain triggers enable the automatic initiation of your retraining pipeline from the ML cube Platform. They are designed as +integrations with external services and thus require credentials with the appropriate privileges to be executed. -A retrain trigger is designed as an integration with an external service and necessitates credentials with the appropriate privileges to execute the action. +A Retrain Trigger can be utilized within a [Detection Event Rule](../monitoring/detection_event_rules.md). Alternatively, +it can be manually activated from the WebApp, in the Retraining section. ## Supported Triggers @@ -94,11 +97,15 @@ The following retrain triggers are supported: **Retrain Trigger Setup** - To integrate Amazon Event Bridge, you will need to create a set of AWS credentials, and add a policy that allows to put events in your event bus. Please refer to [this page](index.md) to know more. + To integrate Amazon Event Bridge, you need to create a set of AWS credentials, and add a policy that allows to put events in + your event bus. Please refer to [this page](index.md) for more information. - Once the credentials and the policy have been created, you can set up the retrain trigger for your model through the SDK or the web application. + Once the credentials and the policy have been created, you can set up the retrain trigger for your model through the SDK + or the web application. - !!! example + ??? code-block "SDK Example" + + Here is an example of how to set up an AWS Event Bridge Retrain Trigger using the SDK: ```py client.set_retrain_trigger( @@ -147,11 +154,15 @@ The following retrain triggers are supported: **Retrain Trigger Setup** - To integrate GCP Pub/Sub, you will need to create a set of GCP credentials, and add a policy that allows to put events in your Pub/Sub topic. Please refer to [this page](index.md) to know more. + To integrate GCP Pub/Sub, you need to create a set of GCP credentials, and add a policy that allows to put events + in your Pub/Sub topic. Please refer to [this page](index.md) for more information. - Once the credentials and the policy have been created, you can set up the retrain trigger for your model through the SDK or the web application. + Once the credentials and the policy have been created, you can set up the retrain trigger for your model through the SDK or + the web application. - !!! example + ??? code-block "SDK Example" + + Here is an example of how to set up a GCP Pub/Sub Retrain Trigger using the SDK: ```py client.set_retrain_trigger( @@ -198,11 +209,14 @@ The following retrain triggers are supported: **Retrain Trigger Setup** - To integrate Azure Event Grid, you will need to create a set of Azure credentials, and add a role that allows to publish events in your Event Grid topic. Please refer to [this page](index.md) to know more. + To integrate Azure Event Grid, you need to create a set of Azure credentials, and add a role that allows to publish events in your Event Grid topic. + Please refer to [this page](index.md) for more information. Once the credentials and the policy have been created, you can set up the retrain trigger for your model through the SDK or the web application. - !!! example + ??? code-block "SDK Example" + + Here is an example of how to set up an Azure Event Grid Retrain Trigger using the SDK: ```py client.set_retrain_trigger( diff --git a/md-docs/user_guide/model.md b/md-docs/user_guide/model.md index 650639a..edafe22 100644 --- a/md-docs/user_guide/model.md +++ b/md-docs/user_guide/model.md @@ -4,8 +4,8 @@ In the ML Cube Platform, a Model is a representation of the actual machine learn for its training usually represent the reference data distribution, while production data comprises the data on which the model performs inference. -A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. Currently, we only support one model -per Task. +A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. +Currently, we support only one model per Task. A Model is defined by a name and a version. The version is updated whenever the model is retrained, allowing to track the latest version of the model and the data used for its training. When predictions are uploaded to the platform, @@ -14,7 +14,7 @@ predictions are associated to the correct model version. !!! note You don't need to upload the **real** model on the Platform. We only require its training data and predictions. - The entity you create on the Platform serves more as a placeholder for the model. For this reason, + The entity you create on the Platform serves more as a placeholder for the actual model. For this reason, the ML cube Platform is considered *model agnostic*. @@ -22,13 +22,13 @@ predictions are associated to the correct model version. RAG Tasks represent an exception to the model framework presented before. In this type of Tasks, the model is a Large Language Model (LLM), that is used to generate responses to user queries. The model is not trained on a specific dataset -but is rather a pre-trained model that is fine-tuned on the user's data, which means that the classic process of training and +but is rather a pre-trained model, sometimes finetuned on custom domain data, which means that the classic process of training and retraining does not apply. To maintain a coherent Model definition across task types, the RAG model is also represented as a Model, but an update of its version represents an update of the reference data distribution and not necessarily -an update of the model itself. Moreover, most of the attributes which will be described in the following sections -are not applicable, as they are related to the retraining module, which is not usable in RAG tasks. +a retraining of the model itself. Moreover, most of the attributes which will be described in the following sections +are not applicable, as they are related to the retraining module, which is not available for RAG tasks. ### Probabilistic output @@ -39,45 +39,46 @@ the ML cube Platform will use this information to compute additional metrics and It is optional and currently supported only for Classification and RAG tasks. If specified, the probabilistic output must be provided as a new column in the predictions file, following the guidelines in the [Data Schema] page. -### Metric +### Model Metric A Model Metric represents the evaluation metric used to assess the performance of the model. -It can both represent a performance or an error. The chosen metric will be used in the various views of the platform to -provide insights on the model's performance. The available options are: +It can both represent a performance or an error. The chosen metric will be used in the various views of the WebApp to +provide insights on the model's performance and in the [Performance View](modules/retraining.md#performance-view) section +of the Retraining Module. -- `Accuracy`, for classification tasks -- `RMSE`, for regression tasks -- `R2`, for regression tasks -- `Average Precision`, for Object Detection tasks +The available options are: + +| Metric | Task Type | +|-------------------|----------------------------| +| Accuracy | Classification tasks | +| RMSE | Regression tasks | +| R2 | Regression tasks | +| Average Precision | For Object Detection tasks | RAG tasks have no metric, as in that case the model is an LLM for which classic definitions of metrics are not applicable. +!!! warning + Model Metrics should not be confused with [Monitoring Metrics](monitoring/index.md#monitoring-metrics), which are + entities being monitoring by the ML cube Platform and not necessarily related to a Model. + ### Suggestion Type The Suggestion Type represents the type of suggestion that the ML cube Platform should provide when computing the -[retraining dataset](modules/retraining.md#retraining-dataset). The available options are: +[Retraining Dataset](modules/retraining.md#retraining-dataset). The available options are provided in the related section. -- `Sample Weights`: each sample uploaded in ML cube Platform is assigned a weight that can be used as sample weight in a weighted loss function. - The higher the weight, the greater the importance of the sample for the new retraining. -- `Resampled Dataset`: a list of sample ids (using data schema column object with role ID) is provided indicating which data form the retraining dataset. - This format can be used when the training procedure does not support weighted loss or when a fixed size retraining dataset is preferred. - Note that samples ids can appear more than once: this happens when a sample is particularly important for the new retraining. -[Task]: task.md -[Data Schema]: data_schema.md +### Retraining Cost -[//]: # () -[//]: # () -[//]: # (What is additional probabilistic output?) +The Retraining Cost represents the cost associated with retraining the model. This information is used by the Retraining Module +to provide gain-cost analysis and insights on the retraining process. The cost is expressed in the same currency as the one used +in the Task cost information. The default value is 0.0, which means that the cost is negligible. -[//]: # () -[//]: # (What is metric?) +### Retrain Trigger -[//]: # () -[//]: # (What is suggestion type?) +You can associate a [Retrain Trigger] to your Model in order to enable the automatic initiation of your retraining pipeline +from the ML cube Platform. More information on how to set up a retrain trigger can be found in the related section. -[//]: # () -[//]: # (What is retraining cost?) -[//]: # () -[//]: # (What is retraining trigger?) \ No newline at end of file +[Task]: task.md +[Data Schema]: data_schema.md#subrole +[Retrain Trigger]: integrations/retrain_trigger.md \ No newline at end of file diff --git a/md-docs/user_guide/monitoring/detection_event_rules.md b/md-docs/user_guide/monitoring/detection_event_rules.md index 719a62b..d284e7f 100644 --- a/md-docs/user_guide/monitoring/detection_event_rules.md +++ b/md-docs/user_guide/monitoring/detection_event_rules.md @@ -41,7 +41,7 @@ data preceding the event, while the second one includes data following the event Retrain Action enables the automatic retraining of your model. Therefore, it is only available when the target of the rule is related to a model. The retrain action does not need any parameter because it is automatically inferred from the `model name` attribute of the rule. -Of course, the model must already have a retrain trigger associated before setting up this action. +Of course, the model must already have a [Retrain Trigger](../integrations/retrain_trigger.md) associated before setting up this action. ??? code-block "SDK Example" The following code demonstrates how to create a rule that matches high severity drift events on the error of a model. From fefb114426a049c1945b78fe5fe8a671b53b92e0 Mon Sep 17 00:00:00 2001 From: Alessandro Lavelli Date: Tue, 5 Nov 2024 10:55:27 -0500 Subject: [PATCH 4/4] few notes --- md-docs/user_guide/model.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/md-docs/user_guide/model.md b/md-docs/user_guide/model.md index edafe22..b8f25ae 100644 --- a/md-docs/user_guide/model.md +++ b/md-docs/user_guide/model.md @@ -1,8 +1,9 @@ # Model -In the ML Cube Platform, a Model is a representation of the actual machine learning model used for making predictions. The data used +In the ML Cube Platform, a Model is a representation of the actual artificial intelligence model used for making predictions. The data used for its training usually represent the reference data distribution, while production data comprises the data on which the model performs inference. +For more information about reference and production data see the [Data] page. A Model is uniquely associated with a [Task] and it can be created both through the WebApp and the Python SDK. Currently, we support only one model per Task. @@ -39,6 +40,10 @@ the ML cube Platform will use this information to compute additional metrics and It is optional and currently supported only for Classification and RAG tasks. If specified, the probabilistic output must be provided as a new column in the predictions file, following the guidelines in the [Data Schema] page. +!!! example + For example, Logistic Regression classification model provides both the probability of belonging to the positive class and the predicted class using a threshold. + In this case, you can upload to ML cube Platform the predicted class as principal prediction and the probability as probabilistic output. + ### Model Metric A Model Metric represents the evaluation metric used to assess the performance of the model. @@ -46,6 +51,9 @@ It can both represent a performance or an error. The chosen metric will be used provide insights on the model's performance and in the [Performance View](modules/retraining.md#performance-view) section of the Retraining Module. +!!! note + Note that model metrics can only be computed when target data are available. + The available options are: | Metric | Task Type | @@ -81,4 +89,5 @@ from the ML cube Platform. More information on how to set up a retrain trigger c [Task]: task.md [Data Schema]: data_schema.md#subrole -[Retrain Trigger]: integrations/retrain_trigger.md \ No newline at end of file +[Retrain Trigger]: integrations/retrain_trigger.md +[Data]: data.md \ No newline at end of file