<left> <img src="https://radicalbit.ai/wp-content/uploads/2024/02/radicalbit-logo-bk.png" width="400" /> </left>


## Radicalbit Quickstart: Monitor a Multiclass Classification Model

### Introduction
This guide provides instructions on how to monitor a ML solution with the Radicalbit OS Platform, through the Python SDK (https://pypi.org/project/radicalbit-platform-sdk/).



In [1]:
from radicalbit_platform_sdk.client import Client
from radicalbit_platform_sdk.models import (
    AwsCredentials,
    CreateModel,
    DataType,
    ModelType,
    ColumnDefinition,
    OutputType,
    Granularity,
)

from datetime import datetime 
import pandas as pd


### Create the Client
In order to communicate with the platform, you need to create the client and indicate the URL where the UI will be available.
Remember that before you need to launch the platform following the instructions in the README.md (https://github.com/radicalbit/radicalbit-ai-monitoring/blob/main/README.md).

In [2]:
# Create the Client
base_url = "http://localhost:9000"
client = Client(base_url)


### The reference dataset
The reference dataset is the name we use to indicate the batch that contains the information we desire to have constantly (or we expect to have) over time. It could be the training set or a chunck of production data where the model has had good performances.

To use the radicalbit-ai-monitoring platform, you need first to prepare your reference data, which should include the following information:

- **Variables**: The list of features used by the model as well as other information like metadata produced by the system
- **Outputs**: The fields returned by the model after the inference. Usually, they are probabilities, a predicted class or numbers.
- **Target**: the ground truth used to validate predictions and evaluate the model quality
- **Timestamp**: The timestamp field used to aggregate data over selected windows.

In this example we will use a dataset built to classify three different classes of heart diseases.



> **_Dataset license:_**  Janosi,Andras, Steinbrunn,William, Pfisterer,Matthias, and Detrano,Robert. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X. Adapted by Radicalbit.


In [7]:
reference_path = "../data/multiclass-classification/3_classes_reference.csv"
reference = pd.read_csv(reference_path)
reference.head(3)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,ground_truth,prediction,prediction_proba,pred_id,timestamp
0,57,0,4,120,354,0,0,163,1,0.6,1,0.0,3.0,0,0,0.855876,848dafa8-4f0d-4be2-b343-812333ebe865,2024-01-09 21:10:00
1,51,0,3,140,308,0,2,142,0,1.5,1,1.0,3.0,0,0,0.748994,f293492c-48b5-4b60-ad09-04ce7014a7b5,2024-01-09 21:30:00
2,51,1,3,125,245,1,2,166,0,2.4,2,0.0,3.0,2,2,0.86421,3ada12d6-35dd-438a-9d5b-84009e96e9f7,2024-01-09 23:10:00


### Create the Model
The next step requires the Model creation.
Here, you have to specify the following information:

- **name:** The name of the model
- **model_type:** The type of the model
- **data_type:** It explains the data type used by the model
- **granularity:** The window used to calculate aggregated metrics with the current data
- **features:** A list column representing the features set
- **outputs:** An OutputType definition to explain the output of the model
- **target:** The column used to represent the model's target
- **timestamp:** The column used to store the when prediction was done
- **frameworks:** An optional field to describe the frameworks used by the model
- **algorithm:** An optional field to explain the algorithm used by the model

In [8]:
# Create the Model
model = CreateModel(
    name=f"Model-{str(datetime.now()).replace(' ', '-').replace(':', '-').split('.')[0]}",
    modelType=ModelType.MULTI_CLASS,
    dataType=DataType.TABULAR,
    granularity=Granularity.DAY,
    description="This is a model to classify between different heart disease types.",
    features=[
        ColumnDefinition(name="age", type="int"),
        ColumnDefinition(name="sex", type="string"),
        ColumnDefinition(name="cp", type="string"),
        ColumnDefinition(name="trestbps", type="int"),
        ColumnDefinition(name="chol", type="int"),
        ColumnDefinition(name="fbs", type="string"),
        ColumnDefinition(name="restecg", type="string"),
        ColumnDefinition(name="thalach", type="int"),
        ColumnDefinition(name="exang", type="string"),
        ColumnDefinition(name="oldpeak", type="int"),
        ColumnDefinition(name="slope", type="string"),
        ColumnDefinition(name="ca", type="string"),
        ColumnDefinition(name="thal", type="string")
    ],
    outputs=OutputType(
        prediction=ColumnDefinition(name="prediction", type="int"),
        output=[
            ColumnDefinition(name="prediction_proba", type="float"),
            ColumnDefinition(name="prediction", type="int"),
            ColumnDefinition(name="pred_id", type="string")
        ],
    ),
    target=ColumnDefinition(name="ground_truth", type="int"),
    timestamp=ColumnDefinition(name="timestamp", type="datetime"),
)

model = client.create_model(model)


In [9]:
print(model.name())
print(model.uuid())
print(model.data_type())
print(model.description())


Model-2024-07-09-09-52-32
2d244420-c59f-4b60-883d-5586ea709795
DataType.TABULAR
This is a model to classify between different heart disease types.


After this action, go to the platform to see:

 - In the **Overview** section, you will see the generated schema of Variables and Outputs

### Load the reference dataset
Once the model has been created, you are ready to upload your reference dataset into the platform. All you need is to run the following code, in which you have to specify the path of your file and set up your AWS credentials as indicated here. 
In this case, we use Minio as a substitute for a real AWS. 



In [10]:
# load the reference dataset
ref = model.load_reference_dataset(
    file_name=reference_path,
    bucket="test-bucket",
    aws_credentials=AwsCredentials(
        access_key_id="minio",
       secret_access_key="minio123",
       default_region="us-east-1",
        endpoint_url="http://localhost:9090"
    )
)


After this action, go to the platform to see:

 - In the **Overview/Summary** section, you will see a summary of your data (missing values, number of rows or columuns and other)y
- in the **Reference** section you will see information about Data Quality and Model Quality

### Load the current dataset
The last step regards the current data uploading. The current dataset is the name we use to indicate the batch that contains fresh information, for example, the most recent production data, predictions or ground truths. We expect that it has the same characteristics (statistical properties) as the reference, which indicates that the model has the performance we expect and there is no drift in the data.
As you can see, the code is pretty similar to the reference one. 

In [11]:
current1_path = "../data/multiclass-classification/3_classes_current1.csv"

# load the current dataset
cur1 = model.load_current_dataset(
    file_name=current1_path,
    correlation_id_column="pred_id",
    bucket="test-bucket",
    aws_credentials=AwsCredentials(
        access_key_id="minio",
       secret_access_key="minio123",
       default_region="us-east-1",
        endpoint_url="http://localhost:9090"
    )
)


After this action, go to the platform to see:

 - in the **Current** section you will see information about Data Quality and Model Quality compared to the Reference information
 - in the **Current/Import** section you will see and browse your uploaded current data