# Evidently 

Once your machine learning model is deployed in production, it needs to be monitored. This is done to catch possible problems and faults with and in the machine learning model as quickly as possible. One of the aspects that should be monitored is the performance of the machine learning model itself. This includes the input data, the model performance and possible drift that has happened in the data or predictions. There are several tools availble to help you with monitoring. One of the best open-source tools is Evidently, a tool which can be easily implemented using its python library. It creates test suites and performance reports using presets which can then be used however you want. In this notebook we will see the Evidently basics. Therefore, the first thing to do will be to import all libraries.

In [3]:
import pandas as pd
import torch
from scipy.stats import wasserstein_distance, kstest, chi2_contingency

from evidently import DataDefinition
from evidently import Dataset
from evidently import Report
from evidently.presets import DataSummaryPreset
from evidently.presets import DataDriftPreset
from evidently.presets import ClassificationPreset
from evidently.tests import *
from evidently.metrics import *
from evidently import BinaryClassification

from utils.data_utils import Data_utils
from utils.model import NN_model
from utils.predict import evaluate_model

  np_bool = np.bool  # type: ignore[attr-defined]


## Prepare dataset
To start, we will use our water_potability dataset. However, we have split it up into two parts: a reference dataset and a current dataset. We have trained our model on the reference datasest. The current dataset is new, possibly drifted data which would normally arrive batch by batch or sample by sample from the sensors. With Evidently you can use the reference dataset to compare the current dataset to. When these datasets deviate from each other in the performed tests, an error will be given. First let's load our dataset into python. 

In [None]:
dataframe_reference = pd.read_csv("dataset/dataset_reference.csv", delimiter=",") 
dataframe_current = pd.read_csv("dataset/dataset_current.csv", delimiter=",")
print("reference data: " + str(dataframe_reference.head()))
print("current data: " + str(dataframe_current.head()))

Next, we are going to run the model on the given datasets and then add the resulting predictions to this datasets. Remember that the model has been trained on the reference dataset. 

In [None]:
data = Data_utils()
# load our data with a batch size of 8
data.load_train_data(8)
data.load_test_data(8)

# create a neural network Class and load the weights of a previously trained model.
model_params = {
            "n_layers": 3,
            "dimensions": [54, 95, 95],
        }
model = NN_model(model_params) 
model.load_state_dict(torch.load("classification_model.pth"))

# Evaluate our reference data on the machine learning model and add the predictions to the dataframe
predictions_reference = evaluate_model(model, data.train_dataloader)

predictions_ref = torch.cat(predictions_reference).tolist()
predictions_ref = [int(x) for x in predictions_ref]
dataframe_reference["prediction"] = predictions_ref

# Evaluate our predictions data on the machine learning model and add the predictions to the dataframe
predictions_current = evaluate_model(model, data.cur_dataloader)
predictions = torch.cat(predictions_current).tolist()
predictions = [int(x) for x in predictions]
dataframe_current["prediction"] = predictions

1186


100%|██████████| 149/149 [00:00<00:00, 13249.48it/s]
100%|██████████| 262/262 [00:00<00:00, 13345.49it/s]


Now that we have added our predictions to our dataframe we are going to change the naming of our dataframe to correspond to the terms that Evidently knows. Our true labels, which corresponds to the potability are changed to be named "target".

In [None]:
dataframe_reference["target"] = dataframe_reference["Potability"].astype(int)
dataframe_reference = dataframe_reference.drop(["Potability"], axis=1)
print(dataframe_reference)

1186
            ph  Hardness    Solids  Chloramines   Sulfate  Conductivity  \
0     0.265850  0.591734  0.295897     0.399690  0.480143      0.834554   
1     0.675542  0.964702  0.285135     0.391213  0.370691      0.440050   
2     0.509336  0.516429  0.224086     0.511483  0.217911      0.421722   
3     0.543586  0.852255  0.524375     0.487242  0.431409      0.494766   
4     0.525234  0.788631  0.297562     0.109985  0.268350      0.337211   
...        ...       ...       ...          ...       ...           ...   
1181  0.689193  0.965843  0.261366     0.436726  0.367388      0.621164   
1182  0.509336  0.630085  0.136719     0.628364  0.480143      0.500508   
1183  0.620726  0.877993  0.229659     0.486982  0.480143      0.574168   
1184  0.858343  0.341949  0.602661     0.651018  0.182131      0.524201   
1185  0.700483  0.926171  0.535994     0.468070  0.480143      0.508279   

      Organic_carbon  Trihalomethanes  Turbidity  prediction  target  
0           0.547991   

In [None]:
dataframe_current["target"] = dataframe_current["Potability"].astype(int)
dataframe_current = dataframe_current.drop(["Potability"], axis=1)
print(dataframe_current)

             ph    Hardness        Solids  Chloramines     Sulfate  \
0           NaN  204.890455  20791.318981     7.300212  368.516441   
1      8.099124  224.236259  19909.541732     9.275884         NaN   
2      8.316766  214.373394  22018.417441     8.059332  356.886136   
3      5.584087  188.313324  28748.687739     7.544869  326.678363   
4     10.223862  248.071735  28749.716544     7.513408  393.663396   
...         ...         ...           ...          ...         ...   
2085   6.069616  186.659040  26138.780191     7.747547  345.700257   
2086   4.668102  193.681735  47580.991603     7.166639  359.948574   
2087   7.808856  193.553212  17329.802160     8.061362         NaN   
2088   5.126763  230.603758  11983.869376     6.303357         NaN   
2089   7.874671  195.102299  17404.177061     7.509306         NaN   

      Conductivity  Organic_carbon  Trihalomethanes  Turbidity  prediction  \
0       564.308654       10.379783        86.990970   2.963135           1   
1  

It is also good practice to create an Evidently dataset version of your pandas dataframes. To be able to use these datasets, you have to create an Evidently data definition of your dataset. Then you can add this definition to the Evidently dataset This can be done as follows:</br>

    data_definition = DataDefinition(
        classification= [BinaryClassification|MulticlassClassification(
            target = "target",
            prediction_labels = "prediction",
            )],
            numerical_columns=["name", "of", "columns"],
            categorical_columns=["name", "of", "columns"],
            datetime_columns=["name", "of", "columns"],
            text_columns=["name", "of", "columns"],
    )
    dataset = Dataset.from_pandas(dataframe, data_definition)

Do this for both datasets.

Note that in data quality and drift detection reports you can also use normal dataframes. However for model quality reports, where we do calculations using the labels and predictions, Evidently needs to know which columns correspond with that information. A second note to make here is that telling which columns have which types is optional but can help Evidently with creating better reports and tests.

### Data Quality check

The first thing you should check is the quality of your current dataset. This can be done on it's own, like checking the ratio of null-values, or this can be done in comparison to a reference dataset, like comparing the input shapes. With Evidently you can easily implement a report of the data quality by using the <code>DataSummaryPreset()</code>. To create a report, you first define it and then run it with the wanted data. Afterwards you can save your report as an HTML file, a JSON file or a python dictionary.
</br>
<code>
report = Report([<i>preset_funtion(),</i>]) </br>
my_eval = report.run(current_dataset, reference_dataset|None)</br></br>
my_eval.save_html("path/to/html")</br>
my_eval.save_json("path/to/json")</br>
my_eval.dict()
</code>

When comparing the reference and current dataset. What do you see? What are the main differences? Are there data cleaning steps that are missed in the current data? you can use the Data exercises to help you clean the data where needed. If you are going to clean the data you are going to have to rerun the model on the cleaned data instead of the uncleaned one. This because we know that the model has been trained on clean data, which would make the predictions of unclean data wrong.

Now that you cleaned your current dataset, which other difference do you see between reference and current dataset? Do you suspect drift?

## Model Quality Check
Next to your data quality, it is also a good idea to check your model quality. This includes aspects like creating a confusion matrix and plotting the ROC curve.
In Evidently you again have a preset for this. It is split up in <code>ClassificationPreset()</code> for classification tasks and <code>RegressionPreset()</code> for regression tasks. To create, call and save your report, you can use the same steps as the <code>DataSummaryPreset()</code>.

With Evidently you can also combine presets. When you create your report, you just have to add both presets.
</br>
<code>
report = Report([<i>Preset_1(), preset_2()</i>])
</code>

Next to combining presets, you can also add tests to your report. These are tests that Evidently will do for you and show their results. For example having the accuracy be above a certain value. Evidently has certain tests presets which you can activate by setting the <code>include_tests</code> parameter to <code>True</code> in the Report function.

## Drift detection

The previous part assumed that you have true labels available of your predictions. However, this is most of the time not the case. Therefore we detect drift in the data and predictions of the machine learning model. Here we run metrics like the wasserstein distance between the datasets to know if the distribution has changed significantly. The function needed for this is the <code>DataDriftPreset()</code>. Again, tests can be added by setting the <code>include_tests</code> parameter to <code>True</code>. Here both the current and reference dataset are needed.

you can also customize your report. For example, you can decide which columns to detect the drift on, or which tests to use. You can also change the tresholds and other parameters of the tests.</br>

<b>Limit columns:</b>

    DataDriftPreset(column=["list","of","columns"])
<b>Choose metrics:</b>

    report = Report([
        Metric_1(),
        Metric_2(column=["list", "of", "columns"], parameter="value")
    ])
A list of metrics can be found [here](https://docs.evidentlyai.com/metrics/all_metrics). As you can see in the example, some metrics have their own parameters. This can for example be the method used to calculate the metric (wasserstein, psi, ks, ...).

<b>Exclude tests:</b></br>
    report = Report([
        Metric_1(column="column", tests=[]),
        Metric_2(column="column"),
    ], 
    include_tests=True)

<b>custom test conditions</b> (use eq (equal), gt(greater than), lt (less than)):

    report = Report([
        Metric_1(column="column", tests=[eq(0)]),
        Metric_2(column="column", tests=[gte(18), lt(35)]),
        Metric_3(column="column", tests=[gte(Reference(relative=0.1))]),
        Metric_4(column="column", tests=[lte(Reference(absolute=10))]),
    ])
more information about the tests can be found [here](https://docs.evidentlyai.com/docs/library/tests).</br>
Create your own custom evidently report. Play around with the different metrics and tests. More information about evidently can be found on their [website](https://docs.evidentlyai.com/docs/library/overview)
