# MLflow models

In this notebook, we will explore MLflow's model component, which provides a unified way to save, load, and deploy machine learning models. MLflow models offer a standardized format for packaging machine learning models that can be used across different platforms and frameworks. An MLflow model is a directory that contains all the necessary files and dependencies required to run our machine learning model, along with metadata.

In [1]:
import mlflow
import mlflow.sklearn
import mlflow.pyfunc
from mlflow.models.signature import infer_signature
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

import logging
logging.getLogger('mlflow').setLevel(logging.ERROR)

#### Setting up the experiment
This command sets the experiment under which all runs will be recorded. If the experiment doesn’t exist, MLflow will create it.

In [2]:
# Set the MLflow experiment
mlflow.set_experiment("Iris Classification Experiment")

<Experiment: artifact_location='file:///C:/Users/israe/Documents/Codes/Notebooks/mlruns/191308692135956385', creation_time=1724749168776, experiment_id='191308692135956385', last_update_time=1724749168776, lifecycle_stage='active', name='Iris Classification Experiment', tags={}>

#### Load and prepare the data

In [3]:
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Model API
MLflow provides a simple API for saving, logging, and loading models.

#### Saving and logging a model
MLflow allows us to both save and log models in a standardized format which is crucial for model management and deployment in production environments.

- **Saving a model**: Saving a model in MLflow means storing the model locally on our filesystem in a directory structure that MLflow can recognize and use later. When we save a model using the `mlflow.<framework>.save_model` function, MLflow stores not just the model itself but also all the necessary information required to reconstruct the model later. This includes the model's weights, architecture, and any other necessary dependencies. It is not linked to any MLflow run or experiment, and there is no metadata or versioning associated with it. This approach is useful when we want to store a model for later use in the same environment or manually manage the model files.
    - Unlike traditional methods of saving models (e.g., using `joblib` or `pickle`), MLflow’s `save_model` function ensures that the model is saved in a format that includes metadata and dependencies, which makes it easier to deploy the model in different environments. With traditional methods, we would need to manually track the environment and any other dependencies required to run the model.

- **Logging a model**: Logging a model in MLflow goes a step beyond saving. When we log a model using the `mlflow.<framework>.log_model` function, MLflow not only saves the model to a local directory but also logs it as an artifact in the current MLflow run. This means that the model is associated with a specific MLflow experiment and run, and all relevant metadata (such as parameters, metrics, and versioning information) is recorded in the MLflow tracking server. Logging a model is particularly useful in production scenarios where we need to track model versions, ensure reproducibility, and potentially deploy models to different environments. The logged model can be easily retrieved later, either for further training, evaluation, or deployment.

Let's illustrate these concepts with a simple classification example using scikit-learn:

In [4]:
# Train a simple logistic regression model
# Initialize the model
model = LogisticRegression(max_iter=200)

# Start an MLflow run
with mlflow.start_run(run_name="Logistic Regression Model")  as run:
    # Fit the model
    model.fit(X_train, y_train)

    # Predict on the test set
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    # Log the accuracy metric
    mlflow.log_metric("accuracy", accuracy)

    # Save the model locally
    mlflow.sklearn.save_model(model, "sklearn_model")

    # Log the model to MLflow tracking server
    mlflow.sklearn.log_model(model, "logged_model")

    print(f"Model accuracy: {accuracy}")

    # Get the run_id of the current run
    run_id = run.info.run_id

Model accuracy: 1.0


- **`mlflow.sklearn.save_model`**: The model is saved to a local directory (`sklearn_model`). This operation only saves the model locally and does not log it to the MLflow server. This is an example of MLflow’s support for scikit-learn models, where the model and all necessary information are stored in a way that MLflow can manage.

- **`mlflow.sklearn.log_model`**: The model is logged to the MLflow tracking server. This includes saving the model to a local directory and recording it as an artifact in the current MLflow run. This allows the model to be loaded later from the MLflow tracking server.


#### Loading a logged model
Once a model is logged, it can be loaded back into memory for inference or further training. Loading a logged model is ideal when we need to work with models that are tracked within an MLflow experiment. This method provides access to the model along with the run’s metadata.
- **Limitations**: We need access to the MLflow tracking server where the model was logged. Additionally, this method requires knowing the specific run ID from which we want to load the model.

In [5]:
# Load the model from the MLflow tracking server using the retrieved run_id
loaded_logged_model = mlflow.sklearn.load_model(f"runs:/{run_id}/logged_model")

# Make predictions with the loaded model
predictions = loaded_logged_model.predict(X_test)
predictions[:5]

array([1, 0, 2, 1, 1])

- **Source**: The model is loaded from the MLflow tracking server, using the specific run ID and the artifact path where it was logged. `mlflow.sklearn.load_model` function retrieves the logged model from the MLflow tracking server using the specified `run_id` and artifact path. The loaded model can then be used just like any other scikit-learn model.


#### Loading a saved model
A saved model is stored locally on our filesystem in a specific directory. We can load this model back into your environment for further use, such as making predictions or continuing training. Loading a saved model is useful when we want to use the model in the same environment where it was originally saved, or when we want to manually manage the model files.
- **Limitations**: Since this model is not linked to any MLflow run, we won't have access to the run's metadata (e.g., metrics, parameters) or versioning capabilities. This approach is more isolated and does not leverage MLflow’s tracking capabilities.

In [6]:
# Load the saved model from the local directory
loaded_saved_model = mlflow.sklearn.load_model("sklearn_model")

# Make predictions with the loaded model
predictions = loaded_saved_model.predict(X_test)
predictions[:5]

array([1, 0, 2, 1, 1])

- **Source**: The model is loaded from the local directory where it was saved using `mlflow.sklearn.save_model`.


##### Summary of differences

| **Aspect**                     | **Loading from saved model**                                 | **Loading from logged model**                                  |
|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------------|
| **Source**                     | Local filesystem                                             | MLflow tracking server                                         |
| **Metadata access**            | No access to run metadata                                    | Full access to run metadata (parameters, metrics, tags)        |
| **Versioning**                 | No versioning capabilities                                   | Full versioning capabilities                                  |
| **Deployment**                 | Manual deployment required                                   | Easier deployment through MLflow’s deployment tools            |
| **Environment requirements**   | Must be in the same or similar environment where saved       | Can be loaded in different environments via the MLflow server  |
| **Use case**                   | Local use, isolated model management                         | Collaborative projects, production deployment, model comparison |


#### Storage format 

When we save or log a model using MLflow, it is stored in a specific directory structure that includes all the necessary files to recreate the model. This directory will contain:

- **MLmodel**: This is the main file that defines the model format, metadata about the model, including the format version, the flavor of the model, and any relevant dependencies or signatures.
- **model.pkl**: The serialized model artifact (specific to `sklearn` in this case).
- **Managing model dependencies**: MLflow automatically captures and manages the dependencies required to run the model. Files for managing dependencies:
    - **conda.yaml**: Specifies the Conda environment with the dependencies required to run the model. It is particularly useful when deploying the model in environments where Conda is the preferred package manager. For example:
         ```yaml
         channels:
         - defaults
         dependencies:
         - python=3.8.5
         - scikit-learn=0.24.1
         - pip
         - pip:
           - mlflow
         ```
         
    - **requirements.txt**: This file lists the Python packages required to run the model, which can be installed using pip. For example:
         ```
         scikit-learn==0.24.1
         mlflow==1.19.0
         ```

    - **python_env.yaml**: Similar to `conda.yaml`, this file lists the Python version and dependencies in a format that can be used to recreate the environment. For example:

         ```yaml
         python: 3.8.5
         dependencies:
           - pip==20.2.4
           - scikit-learn==0.24.1
           - mlflow==1.19.0
         ```


### Model signatures and input examples
Using model signatures and input examples is a best practice when logging models in MLflow.

#### Model signatures
A model signature is a description of the model's inputs and outputs. It helps to define the schema of the data that should be passed to the model during inference. The signature includes information about the types and shapes of input features and the types and shapes of the output predictions. During deployment, the model can validate incoming data against its signature, ensuring that the inputs are correctly formatted. The signature also acts as documentation, clearly defining what inputs the model expects, which is particularly useful when sharing models.
- **Supported signature types**:
  - **DataFrame-based signatures**: MLflow supports signatures for models that take structured data as input, such as pandas DataFrames or numpy arrays. The signature can include various data types like integers, floats, strings, etc.
  - **Tensor-based signatures**: MLflow also supports signatures for models that expect tensor inputs, which are common in deep learning models built with frameworks like TensorFlow or PyTorch. Tensor-based signatures capture the shape and type of tensor inputs and outputs, ensuring that the model receives data in the expected format.
- **Signature enforcement**: When a model has a signature, MLflow can enforce this schema during model serving to ensure that the input data matches the expected format. This helps to avoid errors and makes the model more robust when deployed. This is not exactly the same as type casting, where data types are automatically converted to match the expected format. Instead, signature enforcement ensures that the input data strictly adheres to the predefined format (e.g., the correct number of features, data types like integers or floats, and even the shape of the input data in tensor-based models).


#### 




#### Input examples
An input example is a sample input that illustrates the kind of data the model expects. This is particularly useful when sharing models, as it provides immediate insight into the model’s input structure. Input examples are stored alongside the model and can be used to generate example requests in APIs or to validate input data during deployment. They are also useful for testing the model's deployment environment, ensuring that the model can process real-world data as expected.

Let’s see how to use model signatures and input examples with MLflow run. We’ll train a simple logistic regression model and log it with both a signature and an input example.

In [7]:
# Convert to pandas DataFrame for better readability in signatures
X_train_df = pd.DataFrame(X_train, columns=iris.feature_names)
X_test_df = pd.DataFrame(X_test, columns=iris.feature_names)

# Initialize the model
model = LogisticRegression(max_iter=200)

# Start an MLflow run
with mlflow.start_run(run_name="Logistic Regression with Signature and Example"):

    # Train a simple logistic regression model
    model.fit(X_train_df, y_train)
    
    # Predict on the test set
    y_pred = model.predict(X_test_df)
    
    # Infer the model signature
    signature = infer_signature(X_train_df, y_pred)
    
    # Define an input example using a subset of the test set
    input_example = X_test_df.iloc[:2]

    # Log the model with the signature and input example
    mlflow.sklearn.log_model(
        model,
        artifact_path="logistic_regression_model",
        signature=signature,
        input_example=input_example
    )

    # Log some metrics
    accuracy = (y_pred == y_test).mean()
    mlflow.log_metric("accuracy", accuracy)
    
    print(f"Model logged with accuracy: {accuracy}")

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

Model logged with accuracy: 1.0


- **Model signature**: The signature of the model is inferred using `infer_signature(X_train_df, y_pred)`. This captures the schema of the input dataframe (`X_train_df`) and the output predictions (`y_pred`). The signature will ensure that any input data provided to the model in the future adheres to the expected format and data types.
- **Input example**: The input example is defined using a small subset of the test data (`X_test_df.iloc[:2]`). This example is logged alongside the model and serves as a reference for what kind of input the model expects.
- **Logging the model**: The model is logged using `mlflow.sklearn.log_model`, where we specify the model itself, the `artifact_path` (which is where the model and related files will be stored), the `signature`, and the `input_example`.

**Storage format changes after logging signatures and input examples**

When we log a model with signatures and input examples, additional files and information are stored to capture these details. Here's what gets added:
- **`input_example.json`**: This file contains an example of the input data that the model expects. It helps users or systems interacting with the model understand the format and structure of the input data. For instance, if our model expects a pandas DataFrame, this file will store a sample DataFrame in a JSON format.
- **`serving_input_payload.json`**: This file stores the input data in a format suitable for model serving. It's typically used during the deployment phase to ensure that the inputs are correctly formatted according to the model's signature. It can serve as a reference for how the input should look when the model is deployed as a REST API.
- **Extra information in the `MLmodel` file**: The `MLmodel` file is a configuration file that contains metadata about the logged model. When we log a model with a signature and input examples, additional information is added to this file, such as:
  - **`saved_input_example_info`**: This section contains metadata about the saved input example, including:
      - **`artifact_path`**: The path to the `input_example.json` file within the model's directory.
      - **`pandas_orient`**: The format in which the pandas DataFrame is serialized (e.g., 'split', 'records').
      - **`serving_input_path`**: The path to the `serving_input_payload.json` file.
      - **`type`**: The type of input data (e.g., 'dataframe' in this case).
  - **`signature`**: This section describes the expected inputs and outputs of the model. It lists:
      - **`inputs`**: A detailed schema of the expected input, including the data types and feature names.
      - **`outputs`**: A schema of the expected output, including the data type and shape (e.g., a tensor with a specific shape).

### Model flavors in MLflow
MLflow model flavors are a key concept that makes it easier to work with machine learning models across different frameworks and tools. A flavor in MLflow refers to a specific format or interface for saving and loading models. Each flavor corresponds to a particular machine learning library or framework, and it allows models saved in that flavor to be easily loaded and used by the same or compatible libraries.
- Interoperability: Flavors enable a model to be used in different environments and frameworks without needing to modify the model itself. For example, a model trained with scikit-learn can be loaded and used in an environment that supports scikit-learn, regardless of where the model was trained.
- Deployment flexibility: When we save a model in MLflow, it can be saved with multiple flavors, making it easier to deploy the model in different contexts. For instance, a model saved with a Python flavor can be deployed in a Python environment, while the same model saved with an MLeap flavor can be deployed in a Java environment.
- Consistency: Flavors provide a standardized way of saving and loading models, ensuring that the process is consistent across different frameworks.

##### Commonly used flavors
- **Python Function (pyfunc)**: The most general flavor that supports models written in Python. This is a catch-all flavor that allows models to be deployed as a generic Python function, regardless of the original framework used. By wrapping the model in a Python function interface, it becomes possible to use the model with different libraries and environments.
- **Scikit-learn**: A specific flavor for models trained with the scikit-learn library. It logs the model specifically for use with the scikit-learn framework. This flavor saves the model in a format that is natively supported by scikit-learn. The logged model can be directly loaded and used with scikit-learn's API. It includes information specific to scikit-learn models, such as hyperparameters and training details.
- **TensorFlow**: A flavor for models trained with TensorFlow. Models logged with this flavor are saved in a way that is compatible with TensorFlow’s APIs, allowing for seamless integration and further training within TensorFlow.
- **PyTorch**: A flavor for models trained with PyTorch.
- **Spark MLlib**: A flavor for models trained with Spark MLlib.
- **MLeap**: A flavor that enables models to be serialized into a format that can be served in a JVM environment, often used with Spark.

A model can only be logged with the flavor corresponding to the framework it was trained in. For example, a model trained in TensorFlow can only be logged with the TensorFlow flavor. You cannot log a TensorFlow model with a PyTorch flavor because these flavors are tightly coupled with their respective frameworks. However, the same model (regardless of whether it was trained in TensorFlow, PyTorch, or another framework) can additionally be logged with the `pyfunc` flavor. This allows the model to be used in a more generalized way across different Python environments. If we log the model only with the TensorFlow flavor, it requires the TensorFlow framework to be available in the environment where the model will be used or deployed. However, by logging it with the pyfunc flavor, the model can be loaded and used in any Python environment that may not have TensorFlow installed but still needs to perform inference using the model.

Let's go through an example where we save and load a model using multiple flavors in MLflow. We'll train a simple scikit-learn model, log it with different flavors, and demonstrate how to load the model using a specific flavor.

In [8]:
# Train a simple logistic regression model
model = LogisticRegression(max_iter=200)

# Start an MLflow run
with mlflow.start_run(run_name="Logistic Regression with Flavors") as run:
    # Fit the model
    model.fit(X_train, y_train)

    # Log the model with scikit-learn flavor
    mlflow.sklearn.log_model(model, "sklearn_model")

    # Log the model with pyfunc flavor (default)
    mlflow.pyfunc.log_model("pyfunc_model", python_model=mlflow.pyfunc.PythonModel(), artifacts={"model": "sklearn_model"})

    print("Model logged with multiple flavors!")

# Get the last run_id
run_id = run.info.run_id

# Loading the model using scikit-learn flavor
loaded_sklearn_model = mlflow.sklearn.load_model(f"runs:/{run_id}/sklearn_model")
print(f"Loaded model with sklearn flavor: {loaded_sklearn_model}")

# Loading the model using pyfunc flavor
loaded_pyfunc_model = mlflow.pyfunc.load_model(f"runs:/{run_id}/pyfunc_model")
print(f"Loaded model with pyfunc flavor: {loaded_pyfunc_model}")

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

Model logged with multiple flavors!
Loaded model with sklearn flavor: LogisticRegression(max_iter=200)
Loaded model with pyfunc flavor: mlflow.pyfunc.loaded_model:
  artifact_path: pyfunc_model
  flavor: mlflow.pyfunc.model
  run_id: f92c7c36f4bd42feb3121ffc7e418eaa



- **Logging the model with different flavors**:
   - **Scikit-learn flavor** (`mlflow.sklearn.log_model`): This command logs the model using the scikit-learn flavor. This means the model will be saved in a format that can be directly loaded and used by scikit-learn.
   - **Python function (pyfunc) flavor** (`mlflow.pyfunc.log_model`): This command logs the model as a generic Python function. The `pyfunc` flavor allows the model to be loaded and used as a general Python function, making it compatible with any Python environment.

- **Loading the model with specific flavors**:
   - **Scikit-learn Flavor** (`mlflow.sklearn.load_model`): This loads the model using the scikit-learn flavor, allowing us to use it just like any other scikit-learn model.
   - **Python function (pyfunc) flavor** (`mlflow.pyfunc.load_model`): This loads the model using the pyfunc flavor. The model can now be used as a generic Python function, making it more versatile. Once loaded, the model is accessible as a Python object with a `predict` method.

### Model evaluation

Model evaluation involves assessing the performance of a trained model using various metrics and visualizations to understand how well the model is likely to perform on unseen data. MLflow provides built-in tools for evaluating models, making it easier to compare different models and select the best one. When you evaluate a model using MLflow, the following gets logged:
- **Metrics**: Quantitative evaluation metrics like accuracy, precision, recall, etc.
- **Artifacts**: Visual artifacts like confusion matrices, ROC and precision-recall curves, feature importance plots, etc., that help in understanding the model’s performance.
- **Model explainability**: If enabled, additional explainability tools such as SHAP values might be logged to help in interpreting the model’s predictions.

MLflow's `mlflow.evaluate` function automatically computes metrics and generates evaluation artifacts for a given model and dataset. MLflow supports evaluating models logged with different flavors, including `pyfunc`, `sklearn`, `tensorflow`, and others.

In [9]:
# Initialize the model
model = LogisticRegression(max_iter=200)

# Start an MLflow run
mlflow.start_run(run_name="Logistic Regression Model Evaluation")

# Train the model
model.fit(X_train, y_train)

# Log the model
mlflow.sklearn.log_model(model, "logistic_regression_model")

# Evaluate the model using mlflow.evaluate
evaluation_results = mlflow.evaluate(
    model="runs:/{}/logistic_regression_model".format(mlflow.active_run().info.run_id),  # Model URI as string
    data=X_test, # Test data
    targets=y_test, # True labels
    model_type="classifier", # Model type as classifier
    evaluators=["default"], # Default evaluator
    evaluator_config={"default": {"log_model_explainability": True}} # Additional configurations
)

# Print the evaluation results
print(f"Evaluation metrics: {evaluation_results.metrics}")
print(f"Evaluation artifacts: {evaluation_results.artifacts.keys()}")

# End the MLflow run
mlflow.end_run()

Evaluation metrics: {'score': 1.0, 'example_count': 30, 'accuracy_score': 1.0, 'recall_score': 1.0, 'precision_score': 1.0, 'f1_score': 1.0, 'log_loss': 0.11128299003032528, 'roc_auc': 1.0}
Evaluation artifacts: dict_keys(['roc_curve_plot', 'precision_recall_curve_plot', 'per_class_metrics', 'confusion_matrix', 'shap_beeswarm_plot', 'shap_summary_plot', 'shap_feature_importance_plot'])


**Evaluating the model**: We use `mlflow.evaluate` to evaluate the model on the test data, and automatically log the results to the MLflow Tracking server.
-  **`model`**: The model to be evaluated, which was logged earlier. It points to the logged model using its URI with the appropriate run ID. This is a string that MLflow can use to locate and load the model internally during the evaluation process. 
- **`data`**: The input data (in this case, `X_test`) used for evaluation. This can be a numpy array, pandas DataFrame, or other supported data formats depending on the model type.
- **`targets`**: The true labels (`y_test`) corresponding to the input data.
- **`model_type`**: Specifies the type of model being evaluated, such as `"classifier"` for classification models or `"regressor"` for regression models. This ensures that appropriate metrics and evaluation techniques are applied.
- **`evaluators`**: A list of evaluators to use during evaluation. In this case, we use `"default"`, which automatically selects appropriate evaluators based on the model type. Other options might include custom evaluators or specialized evaluators like `"mleap"` for certain Spark models.
- **`evaluator_config`**: A dictionary for configuring the evaluators. In this example, we enable `"log_model_explainability"` by setting it to `True`, which allows logging of explainability artifacts like SHAP values (nned to insure that `shap` library is installed).
- **Additional Parameters**:
  - **`batch_size`**: If the model supports batch predictions, this parameter can be used to specify the batch size during evaluation.
  - **`custom_metrics`**: A dictionary of custom metric functions. This allows us to compute and log additional metrics or generate specialized artifacts beyond the default ones provided by MLflow that are specific to your use case.


**Evaluation results**: The results of the evaluation include metrics and artifacts. Metrics might include accuracy, precision, recall, etc., while artifacts could include visualizations and tables like confusion matrices or feature importance plots. The evaluation results are stored in a dictionary, which includes metrics and paths to any generated artifacts.

#### Evaluating a model with a baseline model
In addition to evaluating a model's performance on its own, MLflow allows us to compare it against a baseline model. This comparison helps in understanding how much better (or worse) the candidate model is compared to a simple or well-understood baseline. Using the `mlflow.evaluate` function, we can also apply validation thresholds, which can enforce certain performance criteria for the model. This approach to evaluation ensures that your model is not only accurate but also meaningfully better than a basic model.
- **Baseline model**: The baseline model is typically a simple model that represents the minimum performance we would expect. It serves as a reference point to judge the candidate model's effectiveness. For instance, in a classification problem, a `DummyClassifier` could be used as the baseline model.
- **Validation thresholds**: These thresholds ensure that the candidate model not only performs well but also significantly outperforms the baseline. If the candidate model fails to meet these criteria, it might not be considered suitable for production, even if it has a high overall accuracy.

In [10]:
# Initialize the candidate model (Logistic Regression)
candidate_model = LogisticRegression(max_iter=200)

# Initialize the baseline model (Dummy Classifier)
baseline_model = DummyClassifier(strategy="uniform")

# Start an MLflow run
with mlflow.start_run(run_name="Model Evaluation with Baseline Comparison") as run:
    # Train the candidate model
    candidate_model.fit(X_train, y_train)

    # Train the baseline model
    baseline_model.fit(X_train, y_train)

    # Log the candidate model
    candidate_model_uri = mlflow.sklearn.log_model(candidate_model, "candidate_model").model_uri

    # Log the baseline model
    baseline_model_uri = mlflow.sklearn.log_model(baseline_model, "baseline_model").model_uri

    # Define validation thresholds
    thresholds = {
        "accuracy_score": mlflow.models.MetricThreshold(
            threshold=0.8,  # Candidate model accuracy should be >= 0.8
            min_absolute_change=0.05,  # Accuracy should be at least 0.05 greater than baseline accuracy
            min_relative_change=0.05,  # Accuracy should be at least 5% greater than baseline accuracy
            greater_is_better=True,
        ),
    }

    # Evaluate the candidate model against the baseline model
    evaluation_results = mlflow.evaluate(
        model=candidate_model_uri,
        data=X_test, # Test data
        targets=y_test, # True labels
        model_type="classifier",
        validation_thresholds=thresholds,
        baseline_model=baseline_model_uri,
        evaluators=["default"], # Default evaluator
    )

    # Print the evaluation results
    print(f"Evaluation metrics: {evaluation_results.metrics}")
    print(f"Evaluation artifacts: {evaluation_results.artifacts.keys()}")

Evaluation metrics: {'score': 1.0, 'example_count': 30, 'accuracy_score': 1.0, 'recall_score': 1.0, 'precision_score': 1.0, 'f1_score': 1.0, 'log_loss': 0.11128299003032528, 'roc_auc': 1.0}
Evaluation artifacts: dict_keys(['roc_curve_plot', 'precision_recall_curve_plot', 'per_class_metrics', 'confusion_matrix'])


- **`model`**: The candidate model URI, which is the model we want to evaluate against a baseline. This URI is obtained after logging the model with MLflow.
- **`data`**: The evaluation dataset, which includes the features (`X_test`).
- **`targets`**: Specifies the true labels (`y_test`)
- **`model_type`**: Indicates the type of model, `"classifier"` in this case, since we are working with a classification problem.
- **`validation_thresholds`**: This is a dictionary that defines thresholds and validation criteria for the evaluation metrics. For example, the candidate model's accuracy must be at least 0.8, and it should outperform the baseline model by at least 0.05 in both absolute and relative terms.
- **`baseline_model`**: The URI of the baseline model, which is a simple model that serves as a benchmark. In this example, it's a `DummyClassifier` that predicts uniformly at random.
- **`evaluators`**: Specifies the evaluators to use. `"default"` is used to automatically apply appropriate evaluation metrics and generate artifacts.
- **`evaluator_config`**: Allows additional configurations, such as enabling model explainability tools (e.g., SHAP values).