**Disclaimer**. This example uses the Evidently API as available in version 0.6.7 or lower. Please ensure you are using the correct version when running this notebook. For updated and new examples using the latest Evidently versions, visit our documentation. 

Evidently docs: https://docs.evidentlyai.com/

Join our Discord: https://discord.com/invite/xZjKRaNp8b

Install Evidently following the instructions for your environment: https://docs.evidentlyai.com/user-guide/install-evidently

In [None]:
try:
    import evidently
except:
    !pip install evidently==0.3.3

In [None]:
import pandas as pd
import numpy as np

from sklearn import datasets, ensemble

from evidently.report import Report

from evidently.metrics import *

# Prepare toy data

Import a toy dataset and fit a simple model. You will get two resulting datasets: `reference` (training dataset) and `current` (test dataset).

In [None]:
#Dataset for Binary Probabilistic Classifcation
bcancer_data = datasets.load_breast_cancer(as_frame='auto')
bcancer = bcancer_data.frame

bcancer_ref = bcancer.sample(n=300, replace=False)
bcancer_cur = bcancer.sample(n=200, replace=False)

bcancer_label_ref = bcancer_ref.copy(deep=True)
bcancer_label_cur = bcancer_cur.copy(deep=True)

model = ensemble.RandomForestClassifier(random_state=1, n_estimators=10)
model.fit(bcancer_ref[bcancer_data.feature_names.tolist()], bcancer_ref.target)

bcancer_ref['prediction'] = model.predict_proba(bcancer_ref[bcancer_data.feature_names.tolist()])[:, 1]
bcancer_cur['prediction'] = model.predict_proba(bcancer_cur[bcancer_data.feature_names.tolist()])[:, 1]

Preview the two datasets.

In [None]:
bcancer_ref.head()

In [None]:
bcancer_cur.head()

# Design the template text fields

These are the text fields that will appear in the model card.

In [None]:
model_details = """
  # Model Details

  ## Description
  * Model name: What is the model name?
  * Model ID: Include model ID.
  * Model version
  * Model author: Who created the model?
  * Model type: What is the model doing?
  * Model architecture: Include any relevant information about algorithms, parameters, etc.
  * Date
  * License
  * Contact details

  ## Intended use
  * Primary use case: What is the use case?
  * Model users: Who are the expected model users?
  * Secondary use cases: What are the secondary use cases, if any?
  * Out of scope: What applications are out of scope?
"""

In [None]:
training_dataset = """
  # Training dataset

  * Training dataset: How was the training dataset created?
  * Training period: From which time period the training dataset comes from?
  * Sub-groups: are there relevant categories, e.g. demographic?
  * Limitations: Are there known limitations?
  * Pre-processing: How was the data pre-processed?
"""

In [None]:
model_evaluation = """
  # Model evaluation

  * Evaluation process: How was the model evaluated?
  * Evaluation dataset: How was the evaluation dataset created?
  * Metrics: What are the key model quality metrics?
  * Decision threshold: What is the decision threshold?
"""

In [None]:
considerations = """
  # Ethical considerations
Include relevant considerations.

  # Caveats and Recommendations
Include relevant considerations.
"""

# Run model card template

This Model Card report includes:
* text fields implemented earlier
* plots and visualizations on data and model quality

The plots and text fields are listed in the same order as they will appear on the model card.

Let's run it to see how it looks!

In [None]:
model_card = Report(metrics=[
    Comment(model_details),
    ClassificationClassBalance(),
    Comment(training_dataset),
    DatasetSummaryMetric(),
    Comment(model_evaluation),
    ClassificationQualityMetric(),
    ClassificationConfusionMatrix(),
    Comment(considerations),
])

model_card.run(current_data=bcancer_cur, reference_data=bcancer_ref)
model_card

# Populate and customize the template

## 1. Populate text fields

First, let's populate the text fields. You can exclude components, modify the proposed structure, and use markdown for additional formatting.

In [None]:
model_details = """
  # Model Details

  ## Description
  * **Model name**: Test model
  * **Model author**: Evidently AI team
  * **Model type**: Probabilistic classification model.
  * **Model architecture**: Random forest.
  * **Date**: June 2023.

  ## Intended use
  * **Primary use case**: Demonstration of how to create the ML model card.
  * **Out of scope**: This model is not intended for production use.
"""

In [None]:
training_dataset = """
  # Training dataset

  * **Training dataset**: breast cancer wisconsin dataset, 300 randomly sampled objects.
  * **Source**: dataset from [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html)
  * **Limitations**: Demo dataset.
"""

In [None]:
model_evaluation = """
  # Model evaluation

  * **Evaluation dataset**: breast cancer wisconsin dataset, 300 randomly sampled objects.
  * **Metrics**: ROC AUC, accuracy, precision, recall.
  * **Decision threshold**: 0.5, objects with predicted probability over 0.5 belong to the target class.
"""

In [None]:
considerations = """
  # Caveats and Recommendations
Not for production use.
"""

Let's see how the model card looks now!

In [None]:
model_card = Report(metrics=[
    Comment(model_details),
    ClassificationClassBalance(),
    Comment(training_dataset),
    DatasetSummaryMetric(),
    Comment(model_evaluation),
    ClassificationQualityMetric(),
    ClassificationConfusionMatrix(),
    Comment(considerations)
])

model_card.run(current_data=bcancer_cur, reference_data=bcancer_ref)
model_card

In [None]:
model_card.save_html("sample_data/file.html")

## 2. Change the composition of plots

You can modify which plots appear on the model card. You can select from multiple "Metrics" available in the Evidently library.
You can see the Metric list here: https://docs.evidentlyai.com/reference/all-metrics, or browse the example notebooks (if you open Colab, you can see pre-rendered plots and can select those you like: https://docs.evidentlyai.com/examples).

Here are the changes we will make now:
* Add plots on **dataset correlations**.
* Add plots to show the **stats for the features we consider important** to highlight ("mean radius," "mean symmetry").
* Add a couple of new plots related to **Probabilistic Classification Quality**: distribution of predicted probabilities, ROC Curve.
* Add a table showing the **alternative classification decision thresholds**, and comment about the ability to modify it.


In [None]:
threshold_comment = """
  **Note**: The model quality metrics are generated using 0.5 decision threshold. It is important to consider the possibility of changing the decision threshold. Here are the alternative considerations:
"""

In [None]:
model_card = Report(metrics=[
    Comment(model_details),
    ClassificationClassBalance(),
    Comment(training_dataset),
    DatasetSummaryMetric(),
    DatasetCorrelationsMetric(),
    ColumnSummaryMetric(column_name="mean radius"),
    ColumnSummaryMetric(column_name="mean symmetry"),
    Comment(model_evaluation),
    ClassificationQualityMetric(),
    ClassificationConfusionMatrix(),
    ClassificationProbDistribution(),
    ClassificationRocCurve(),
    Comment(threshold_comment),
    ClassificationPRTable(),
    Comment(considerations)
])

model_card.run(current_data=bcancer_cur, reference_data=bcancer_ref)
model_card

# Support Evidently

Did you find the example useful? Star Evidently on GitHub to contribute back! This helps us continue creating free open-source tools for the community. https://github.com/evidentlyai/evidently