# Evidently 

Once your machine learning model is deployed in production, it needs to be monitored. This to catch possible problems and faults with and in the machine learning model as quickly as possible. One of the things that should be monitored is the performance of the machine learning model itself. This includes the input data, the model performance and possible drift that has happened in the data or predictions. There are several tools availble to help you with this. One of the best open-source tools is Evidently, a tool which can be easily implemented using the python library. It creates test suites and performance reports using presets which can then be used however you want. In this notebook we will see the Evidently basics. Therefore, the first thing to do will be to import all libraries.

[all metrics](https://docs.evidentlyai.com/metrics/all_metrics)

In [1]:
import pandas as pd
import torch
from scipy.stats import wasserstein_distance, kstest, chi2_contingency

from evidently import DataDefinition
from evidently import Dataset
from evidently import Report
from evidently.presets import DataSummaryPreset
from evidently.presets import DataDriftPreset
from evidently.presets import ClassificationPreset
from evidently.tests import *
from evidently.metrics import *
from evidently import BinaryClassification

from utils.data_utils import Data_utils
from utils.model import NN_model
from utils.predict import evaluate_model

## Prepare dataset
To start we will use our water_potability dataset. However, we have split it up into two parts: a reference dataset and a current dataset. We have trained our model on the reference datasest. The current dataset is new possibly drifted data which would normally arrive batch by batch or sample by sample from the sensors. With Evidently you can use the reference dataset to compare the current dataset to. When these datasets deviate from each other in the performed tests, an error should be given. First let's load our dataset into python. 

In [2]:
data_reference = pd.read_csv("dataset/dataset_reference_Evidently.csv", delimiter=",") 
data_current = pd.read_csv("dataset/dataset_current_Evidently.csv", delimiter=",")
print("reference data: " + str(data_reference.head()))
print("current data: " + str(data_current.head()))

reference data:          ph  Hardness    Solids  Chloramines   Sulfate  Conductivity  \
0  0.265850  0.591734  0.295897     0.399690  0.480143      0.834554   
1  0.675542  0.964702  0.285135     0.391213  0.370691      0.440050   
2  0.509336  0.516429  0.224086     0.511483  0.217911      0.421722   
3  0.543586  0.852255  0.524375     0.487242  0.431409      0.494766   
4  0.525234  0.788631  0.297562     0.109985  0.268350      0.337211   

   Organic_carbon  Trihalomethanes  Turbidity  Potability  
0        0.547991         0.490667   0.576793           0  
1        0.353665         0.275912   0.496327           0  
2        0.415250         0.469482   0.405562           0  
3        0.569818         0.688474   0.418282           0  
4        0.588207         0.695189   0.377341           0  
current data:           ph    Hardness        Solids  Chloramines     Sulfate  Conductivity  \
0        NaN  204.890455  20791.318981     7.300212  368.516441    564.308654   
1   8.099124  2

Next, we are going to run the model on the given datasets and then we are going to add the resulting predictions to the datasets. Remember that the model has been trained on the reference dataset. 

In [3]:
data = Data_utils()
data.load_train_data(8)
data.load_test_data(8)
model_params = {
            "n_layers": 3,
            "dimensions": [54, 95, 95],
        }
model = NN_model(model_params) 
model.load_state_dict(torch.load("classification_model.pth"))
predictions_reference = evaluate_model(model, data.train_dataloader)
predictions_current = evaluate_model(model, data.cur_dataloader)

1186


100%|██████████| 149/149 [00:00<00:00, 2067.68it/s]
100%|██████████| 262/262 [00:00<00:00, 2628.58it/s]


In [4]:
predictions = torch.cat(predictions_reference).tolist()
predictions = [int(x) for x in predictions]
print(len(predictions))
data_reference["prediction"] = predictions
data_reference["target"] = data_reference["Potability"].astype(int)
data_reference = data_reference.drop(["Potability"], axis=1)
print(data_reference)

1186
            ph  Hardness    Solids  Chloramines   Sulfate  Conductivity  \
0     0.265850  0.591734  0.295897     0.399690  0.480143      0.834554   
1     0.675542  0.964702  0.285135     0.391213  0.370691      0.440050   
2     0.509336  0.516429  0.224086     0.511483  0.217911      0.421722   
3     0.543586  0.852255  0.524375     0.487242  0.431409      0.494766   
4     0.525234  0.788631  0.297562     0.109985  0.268350      0.337211   
...        ...       ...       ...          ...       ...           ...   
1181  0.689193  0.965843  0.261366     0.436726  0.367388      0.621164   
1182  0.509336  0.630085  0.136719     0.628364  0.480143      0.500508   
1183  0.620726  0.877993  0.229659     0.486982  0.480143      0.574168   
1184  0.858343  0.341949  0.602661     0.651018  0.182131      0.524201   
1185  0.700483  0.926171  0.535994     0.468070  0.480143      0.508279   

      Organic_carbon  Trihalomethanes  Turbidity  prediction  target  
0           0.547991   

In [5]:
predictions = torch.cat(predictions_current).tolist()
predictions = [int(x) for x in predictions]
data_current["prediction"] = predictions
data_current["target"] = data_current["Potability"].astype(int)
data_current = data_current.drop(["Potability"], axis=1)
print(data_current)

             ph    Hardness        Solids  Chloramines     Sulfate  \
0           NaN  204.890455  20791.318981     7.300212  368.516441   
1      8.099124  224.236259  19909.541732     9.275884         NaN   
2      8.316766  214.373394  22018.417441     8.059332  356.886136   
3      5.584087  188.313324  28748.687739     7.544869  326.678363   
4     10.223862  248.071735  28749.716544     7.513408  393.663396   
...         ...         ...           ...          ...         ...   
2085   6.069616  186.659040  26138.780191     7.747547  345.700257   
2086   4.668102  193.681735  47580.991603     7.166639  359.948574   
2087   7.808856  193.553212  17329.802160     8.061362         NaN   
2088   5.126763  230.603758  11983.869376     6.303357         NaN   
2089   7.874671  195.102299  17404.177061     7.509306         NaN   

      Conductivity  Organic_carbon  Trihalomethanes  Turbidity  prediction  \
0       564.308654       10.379783        86.990970   2.963135           1   
1  

Next to adding the prediction, you can also add a data definition to Evidently. This can be done as follows:</br>

    data_definition = DataDefinition(
        classification= [BinaryClassification|MulticlassClassification(
            target = "target",
            prediction_labels = "prediction",
            )],
            numerical_columns=["name", "of", "columns"],
            categorical_columns=["name", "of", "columns"],
            datetime_columns=["name", "of", "columns"],
            text_columns=["name", "of", "columns"],
    )
    dataset = Dataset.from_pandas(dataframe, data_definition)

In [6]:
definiton_data = DataDefinition(
    classification= [BinaryClassification(
        target = "target",
        prediction_labels = "prediction",
        )],
        numerical_columns=["ph", "Hardness", "Solids","Chloramines", "Sulfate", "Conductivity", "Organic_carbon", "Trihalomethanes", "Turbidity", "prediction"],
        categorical_columns=["target"]
)
reference_dataset = Dataset.from_pandas(data_reference, data_definition=definiton_data)
current_dataset = Dataset.from_pandas(data_current, data_definition=definiton_data)

### Data Quality check

The first thing you should check is the quality of your current dataset. This can be done on it's own, like checking aspects like the ratio of null-values, or this can be done in comparison to a reference dataset, like comparing the input shapes. With Evidently you can easily implement a report of the data quality by using the <code>DataSummaryPreset()</code>. To create a report, you first set up your report and then run it with the wanted data. Afterwards you can save your report as an HTML file, a JSON file or a python dictionary.
</br>
<code>
report = Report([<i>preset_funtion(),</i>]) </br>
my_eval = report.run(current_dataset, reference_dataset|None)</br></br>
my_eval.save_html("path/to/html")</br>
my_eval.save_json("path/to/json")</br>
my_eval.dict()
</code>

In [7]:
report = Report([DataSummaryPreset()])
my_eval = report.run(current_dataset, None)

In [8]:
my_eval.save_html("evidently_reports/DataSummaryPreset.html")
json = my_eval.save_json("evidently_reports/DataSummaryPreset.json")
print(my_eval.dict())

{'metrics': [{'id': '00404ffe284d7862baa4095093452630', 'metric_id': 'RowCount()', 'value': 2090.0}, {'id': '91e498263b6502661774b15eb39ea154', 'metric_id': 'ColumnCount()', 'value': 11.0}, {'id': '31b8c78bad2b7108842265cd082d5abf', 'metric_id': 'ColumnCount(column_type=ColumnType.Numerical)', 'value': 10.0}, {'id': 'bb154bb9a843cad0d72fa3dc4983394d', 'metric_id': 'ColumnCount(column_type=ColumnType.Categorical)', 'value': 1.0}, {'id': '012cbfb269361272d4773e1a68559396', 'metric_id': 'ColumnCount(column_type=ColumnType.Datetime)', 'value': 0.0}, {'id': '3f5e595e26bc8e120aadf51967bb0355', 'metric_id': 'ColumnCount(column_type=ColumnType.Text)', 'value': 0.0}, {'id': 'ff1caea5e148b287bd5348d4bb40eccf', 'metric_id': 'DuplicatedRowCount()', 'value': 0.0}, {'id': '02fbcbd017b30a0b075d9d5a72ce2a32', 'metric_id': 'DuplicatedColumnsCount()', 'value': 0.0}, {'id': '4c2efb888b5c5f8afb50895b4b4abcec', 'metric_id': 'AlmostDuplicatedColumnsCount()', 'value': 0.0}, {'id': '9e9fa5ec57528fadbf97a35ad5

In [9]:
my_eval = report.run(current_dataset, reference_dataset)
my_eval.save_html("evidently_reports/DataSummaryPresetReference.html")

When comparing the reference and current dataset. What do you see? What are the main differences? If you feel like something needs to be changed to the current dataset, you can use the Data exercises to help you. Start from reading the current dataset into pandas, change the data, then run the report again.

In [15]:
data_current = pd.read_csv("dataset/dataset_current_Evidently.csv", delimiter=",")

In [16]:
mean = data_current.mean()
data_current.fillna(mean,inplace=True)
print(data_current)

             ph    Hardness        Solids  Chloramines     Sulfate  \
0      7.177442  204.890455  20791.318981     7.300212  368.516441   
1      8.099124  224.236259  19909.541732     9.275884  330.325623   
2      8.316766  214.373394  22018.417441     8.059332  356.886136   
3      5.584087  188.313324  28748.687739     7.544869  326.678363   
4     10.223862  248.071735  28749.716544     7.513408  393.663396   
...         ...         ...           ...          ...         ...   
2085   6.069616  186.659040  26138.780191     7.747547  345.700257   
2086   4.668102  193.681735  47580.991603     7.166639  359.948574   
2087   7.808856  193.553212  17329.802160     8.061362  330.325623   
2088   5.126763  230.603758  11983.869376     6.303357  330.325623   
2089   7.874671  195.102299  17404.177061     7.509306  330.325623   

      Conductivity  Organic_carbon  Trihalomethanes  Turbidity  Potability  
0       564.308654       10.379783        86.990970   2.963135           0  
1    

In [17]:
data_current = data_current.copy()
for column in data_current.columns: 
        if column != "Potability":
                data_current[column] = (data_current[column] - data_current[column].min()) / (data_current[column].max() - data_current[column].min())

print(data_current)

            ph  Hardness    Solids  Chloramines   Sulfate  Conductivity  \
0     0.512674  0.137408  0.364451     0.543891  0.690772      0.651886   
1     0.578509  0.278549  0.348752     0.698543  0.580628      0.383569   
2     0.594055  0.206593  0.386298     0.603314  0.657230      0.281659   
3     0.398863  0.016467  0.506122     0.563043  0.570110      0.129182   
4     0.730276  0.452444  0.506141     0.560580  0.763296      0.135045   
...        ...       ...       ...          ...       ...           ...   
2085  0.433544  0.004398  0.459656     0.578908  0.624969      0.378562   
2086  0.333436  0.055633  0.841409     0.533436  0.666062      0.582120   
2087  0.557775  0.054696  0.302823     0.603473  0.580628      0.335401   
2088  0.366197  0.325004  0.207645     0.465860  0.580628      0.354614   
2089  0.562477  0.065997  0.304147     0.560259  0.580628      0.215719   

      Organic_carbon  Trihalomethanes  Turbidity  Potability  
0           0.313402         0.68047

In [18]:
predictions = torch.cat(predictions_current).tolist()
predictions = [int(x) for x in predictions]
data_current["prediction"] = predictions
data_current["target"] = data_current["Potability"].astype(int)
data_current = data_current.drop(["Potability"], axis=1)
print(data_current)

            ph  Hardness    Solids  Chloramines   Sulfate  Conductivity  \
0     0.512674  0.137408  0.364451     0.543891  0.690772      0.651886   
1     0.578509  0.278549  0.348752     0.698543  0.580628      0.383569   
2     0.594055  0.206593  0.386298     0.603314  0.657230      0.281659   
3     0.398863  0.016467  0.506122     0.563043  0.570110      0.129182   
4     0.730276  0.452444  0.506141     0.560580  0.763296      0.135045   
...        ...       ...       ...          ...       ...           ...   
2085  0.433544  0.004398  0.459656     0.578908  0.624969      0.378562   
2086  0.333436  0.055633  0.841409     0.533436  0.666062      0.582120   
2087  0.557775  0.054696  0.302823     0.603473  0.580628      0.335401   
2088  0.366197  0.325004  0.207645     0.465860  0.580628      0.354614   
2089  0.562477  0.065997  0.304147     0.560259  0.580628      0.215719   

      Organic_carbon  Trihalomethanes  Turbidity  prediction  target  
0           0.313402        

In [19]:
current_dataset = Dataset.from_pandas(data_current, data_definition=definiton_data)

In [20]:
my_eval = report.run(current_dataset, reference_dataset)
my_eval.save_html("evidently_reports/DataSummaryPresetReference.html")

Now that you cleaned your current dataset, which other difference do you see between reference and current dataset? Do you suspect drift?

## Model Quality Check
Next to your data quality, it is also a good idea to check your model quality as well. This includes aspects like creating a confusion matrix and plotting the ROC curve.
In evidently you again have a preset for this. It is split up in <code>ClassificationPreset()</code> for classification tasks and <code>RegressionPreset()</code> for regression tasks. To create, call and save your report, you can use the same steps as the <code>DataSummaryPreset()</code>.

In [21]:
report = Report([ClassificationPreset()])
my_eval = report.run(current_dataset,None)
my_eval.save_html("evidently_reports/ModelPreset.html")

In [22]:
data_current

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity,prediction,target
0,0.512674,0.137408,0.364451,0.543891,0.690772,0.651886,0.313402,0.680472,0.293486,1,0
1,0.578509,0.278549,0.348752,0.698543,0.580628,0.383569,0.562017,0.502868,0.312051,1,0
2,0.594055,0.206593,0.386298,0.603314,0.657230,0.281659,0.622089,0.795739,0.626703,1,0
3,0.398863,0.016467,0.506122,0.563043,0.570110,0.129182,0.237538,0.403560,0.212779,1,0
4,0.730276,0.452444,0.506141,0.560580,0.763296,0.135045,0.444050,0.659860,0.235441,0,0
...,...,...,...,...,...,...,...,...,...,...,...
2085,0.433544,0.004398,0.459656,0.578908,0.624969,0.378562,0.378070,0.451064,0.434840,1,1
2086,0.333436,0.055633,0.841409,0.533436,0.666062,0.582120,0.448062,0.505178,0.588103,0,1
2087,0.557775,0.054696,0.302823,0.603473,0.580628,0.335401,0.678284,0.501359,0.260499,1,1
2088,0.366197,0.325004,0.207645,0.465860,0.580628,0.354614,0.343638,0.598427,0.642685,0,1


In [12]:
data_reference

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity,prediction,target
0,0.265850,0.591734,0.295897,0.399690,0.480143,0.834554,0.547991,0.490667,0.576793,0,0
1,0.675542,0.964702,0.285135,0.391213,0.370691,0.440050,0.353665,0.275912,0.496327,0,0
2,0.509336,0.516429,0.224086,0.511483,0.217911,0.421722,0.415250,0.469482,0.405562,0,0
3,0.543586,0.852255,0.524375,0.487242,0.431409,0.494766,0.569818,0.688474,0.418282,0,0
4,0.525234,0.788631,0.297562,0.109985,0.268350,0.337211,0.588207,0.695189,0.377341,0,0
...,...,...,...,...,...,...,...,...,...,...,...
1181,0.689193,0.965843,0.261366,0.436726,0.367388,0.621164,0.443279,0.581863,0.545274,0,1
1182,0.509336,0.630085,0.136719,0.628364,0.480143,0.500508,0.198622,0.653463,0.425328,0,1
1183,0.620726,0.877993,0.229659,0.486982,0.480143,0.574168,0.328166,0.332729,0.653499,0,1
1184,0.858343,0.341949,0.602661,0.651018,0.182131,0.524201,0.601257,0.360297,0.551950,1,1


With Evidently you can also combine presets. When you create your report, you just have to add both presets.
</br>
<code>
report = Report([<i>Preset_1(), preset_2()</i>])
</code>

In [13]:
report = Report([
    DataSummaryPreset(),
    ClassificationPreset(),
])
my_eval = report.run(current_dataset, reference_dataset)
my_eval.save_html("evidently_reports/CombinedPreset.html")

Next to combining presets, you can also add tests to your report. These are tests that Evidently will do for you and show their results. For example having the accuracy be above a certain value. Evidently has certain tests presets which you can activate by setting the "include_tests" parameter to True in the Report function.

In [14]:
report = Report([
    ClassificationPreset(),
],
include_tests=True)
my_eval = report.run(current_dataset, reference_dataset)
my_eval.save_html("evidently_reports/ClassificationPreset.html")

## Drift detection

The previous part assumed that you have true labels available of your predictions. However, this is most of the time not the case. Therefore we detect drift in the data and predictions of the machine learning model. Here we run metrics like the wasserstein distance between the datasets to know if the distribution has changed significantly. The function needed for this is the <code>DataDriftPreset()</code>. Again, tests can be added by setting the <code>include_tests</code> parameter to True. Here both the current and reference dataset are needed.

In [None]:
report = Report([
    DataDriftPreset(),
],
include_tests=True)
my_eval = report.run(current_dataset, reference_dataset)
my_eval.save_html("evidently_reports/DataDriftpPreset.html")

you can also customize your report. For example, you can decide which columns to detect the drift on, or which tests to use. You can also change the tresholds and other parameters of the tests.</br>

<b>Limit columns:</b>

    DataDriftPreset(column=["list","of","columns"])
<b>Choose metrics:</b>

    report = Report([
        Metric_1(),
        Metric_2(column=["list", "of", "columns"], parameter="value")
    ])
A list of metrics can be found [here](https://docs.evidentlyai.com/metrics/all_metrics). As you can see in the example, some metrics have their own parameters. This can for example be the method used to calculate the metric (wasserstein, psi, ks, ...).

<b>Exclude tests:</b></br>
    report = Report([
        Metric_1(column="column", tests=[]),
        Metric_2(column="column"),
    ], 
    include_tests=True)

<b>custom test conditions</b> (use eq (equal), gt(greater than), lt (less than)):

    report = Report([
        Metric_1(column="column", tests=[eq(0)]),
        Metric_2(column="column", tests=[gte(18), lt(35)]),
        Metric_3(column="column", tests=[gte(Reference(relative=0.1))]),
        Metric_4(column="column", tests=[lte(Reference(absolute=10))]),
    ])
more information about the tests can be found [here](https://docs.evidentlyai.com/docs/library/tests).</br>
Create your own custom evidently report. Play around with the different metrics and tests. More information about evidently can be foun on their [website](https://docs.evidentlyai.com/docs/library/overview)
