# Week 1 assignments
**Please do the assignments using the `mlops_eng` environment.**

In this week's assignments, you will train a LightGBM regression model to predict public bike sharing demand given attributes like datetime and weather conditions. The dataset ("bike_sharing_demand.csv", located under the same directory as this notebook) used in this week's assignments is a preprocessed version of [this kaggle dataset](https://www.kaggle.com/competitions/bike-sharing-demand/overview). You'll learn more about data preprocessing next week. Additionally, you'll use MLflow to track the model training and Deepchecks to evaluate the trained model. 

**Guidelines for submitting assignments**:
- For each assignment, a code skeleton is provided. Please put your solutions between the `### START CODE HERE` and `### END CODE HERE` code comments. Please **do not change any code other than those between the `### START CODE HERE` and `### END CODE HERE` comments**. Otherwise your notebook may not pass the tests used in grading.
- Some assignments also require you to capture screenshots in order to earn points. Please put all your screenshots into a single PDF file. For each screenshot, please clearly indicate which assignment it corresponds to in your PDF file.
- Please return this notebook and the PDF file containing your screenshots as your submission. 

In case type hints are new to you, you'll see something below in some code skeletons:
```python
def greeting(name: str) -> str:
    return 'Hello ' + name  
```
The annotation `name: str` means the parameter `name` is expected to be of type `str` and `-> str` means the type of the returned value is also `str`. These type hints help you understand the function's input requirements and expected output in the assignments.

## Assignment 0: Set up the course environment (4 points)
You can earn 4 points for successfully setting up the course environment. To do so, simply assign "yes" to the `setup_ok` variable below.

In [1]:
# TODO: setup_ok = ___
### START CODE HERE
setup_ok = "yes"
### END CODE HERE

In [2]:
assert setup_ok.lower() == "yes"

In [3]:
import pandas as pd
from lightgbm import LGBMRegressor
import mlflow
from deepchecks import SuiteResult, CheckResult
from deepchecks.tabular import Suite
from deepchecks.tabular.checks import TrainTestPerformance, ModelInferenceTime, MultiModelPerformanceReport
from deepchecks.tabular import Dataset
from pathlib import Path
import os
import pickle
import boto3
import logging
import warnings

from typing import Tuple, Dict, Any

PARENT_DIR = Path("").parent.absolute()

# Suppress boto3 logging
boto3.set_stream_logger(name='botocore.credentials', level=logging.ERROR)

In [4]:
# This is just for the grading purpose
def is_being_graded():
    """
    Returns True if the notebook is being executed by the auto-grading tool.
    """
    env = os.environ.get("NBGRADER_EXECUTION")
    return env == "autograde" or env == "validate"


# Suppress loggings and warnings when grading the notebook
if is_being_graded():
    loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
    for logger in loggers:
        logger.setLevel(logging.ERROR)
    mlflow.utils.logging_utils.disable_logging()
    warnings.filterwarnings("ignore")

In [5]:
def delete_file_if_existing(filename: str):
    """
    Delete a file if it's existing
    """
    if os.path.exists(filename):
        print(f"Delete the existing {filename}")
        os.remove(filename)


## Assignment 1: Download and split the dataset (2 points)
Please note that the dataset (bike_sharing_demand.csv) contains data collected from two years so you'll find duplicated data points for most of the timestamps (hour-day-month) if you explore the dataset. 

### 1a) Load the data
First, implement the `pull_data` function that loads a CSV as a Pandas DataFrame from a given location.

In [6]:
def pull_data(dataset_path: Path) -> pd.DataFrame:
    """
    Download the data set from a given path
    Args: 
        dataset_path (Path): Path of the CSV 
    Returns:
        A Pandas DataFrame of the dataset
    """
    ### START CODE HERE
    return pd.read_csv(dataset_path)
    ### END CODE HERE


In [7]:
# You can use this code cell to check if your pull_data function works correctly
dataset_path = PARENT_DIR / "bike_sharing_demand.csv"
df = pull_data(dataset_path)
assert df.shape == (10886, 12)


In [8]:
# Show a concise summary of the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      10886 non-null  int64  
 1   holiday     10886 non-null  int64  
 2   workingday  10886 non-null  int64  
 3   weather     10886 non-null  int64  
 4   temp        10886 non-null  float64
 5   atemp       10886 non-null  float64
 6   humidity    10886 non-null  int64  
 7   windspeed   10886 non-null  float64
 8   count       10886 non-null  int64  
 9   hour        10886 non-null  int64  
 10  day         10886 non-null  int64  
 11  month       10886 non-null  int64  
dtypes: float64(3), int64(9)
memory usage: 1020.7 KB


<details>
    <summary>Expected output</summary>
    <img src="./images/dataset-info.png"/>
</details>

Below is the explanation of each column in the dataset:

**Variables**:

| Column name |  Explanation | type |
|-------------|---------------|----|
| season      | 1 = spring, 2 = summer, 3 = fall, 4 = winter | integer
| holiday     | whether the day is considered a holiday | integer
| workingday  | 1 if day is neither weekend nor holiday, otherwise 0. | integer
| weather     | 1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog | integer
| temp        | temperature in Celsius | float
| atemp       | "feels like" temperature in Celsius | float
| humidity    | relative humidity | integer
| windspeed   | wind speed | float
| hour        | the hours of the datetime| integer
| day         | the day of the datetime| integer
| month       | the month of the datetime| integer

**Targets**: 

| Column name | Explanation                                     | Type
|-------------|-------------------------------------------------| ----|
| count       | number of total rentals                         | integer

### 1b) Split the data into train and test DataFrames
Then implement the `splitData` function that splits the dataset into a training and a test dataset, using the last 168 rows of the dataset as the test data.

In [9]:
def splitData(input_df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Split a DataFrame into training and testing sets
    Args:
        input_df (DataFrame): The DataFrame to be splitted
    Return:
        A tuple of training and testing DataFrame
    """
    ### START CODE HERE
        # Split the last 168 rows as the test set
    test_df = input_df.tail(168)
    
    # Use the remaining rows as the training set
    train_df = input_df.iloc[:-168]
    
    return train_df, test_df
    ### END CODE HERE

In [10]:
df = pull_data(dataset_path)
train, test = splitData(df)

In [11]:
# Check if train and test DataFrames are split correctly
expected_train_shape = (10718, 12)
expected_test_shape = (168, 12)

assert (
    train.shape == expected_train_shape
), "The dimension of the training dataset is not correct"
assert (
    test.shape == expected_test_shape
), "The dimension of the testing dataset is not correct"

expected_columns = [
    "season",
    "holiday",
    "workingday",
    "weather",
    "temp",
    "atemp",
    "humidity",
    "windspeed",
    "count",
    "hour",
    "day",
    "month",
]
assert set(train.columns) == set(
    expected_columns
), "The columns of the training dataset are not correct"
assert set(test.columns) == set(
    expected_columns
), "The columns of the training dataset are not correct"

In [12]:
# Split the training and testing DataFrames into features and targets
target = "count"
input_df = pull_data(PARENT_DIR / "bike_sharing_demand.csv")
train, test = splitData(input_df)
train_x = train.drop([target], axis=1)
test_x = test.drop([target], axis=1)
train_y = train[[target]]
test_y = test[[target]]

## Assignment 2: Offline model evaluation using Deepchecks (2 points)

### 2a) Construct the Dataset objects used by Deepchecks
First, let's construct the Deepchecks Dataset objects (named `train_dataset` and `test_dataset`) from the train and testing DataFrames.

**Note**: Please use the `categorical_features` variable given below to specify the categorical features when you construct the datasets handled by Deepchecks. Please check [here](https://docs.deepchecks.com/stable/tabular/usage_guides/dataset_object.html) for more details. 

In [13]:
# Categorical features, remember to specify the categorical features when you construct the datasets handled by Deepchecks
# See https://docs.deepchecks.com/stable/tabular/usage_guides/dataset_object.html
categorical_features = ["season", "holiday", "workingday", "weather", "hour", "day", "month"]

# TODO: train_test = ...
# test_dataset = ...
### START CODE HERE
train_dataset = Dataset(train_x, label=train_y, cat_features=categorical_features)
test_dataset = Dataset(test_x, label=test_y, cat_features=categorical_features)
### END CODE HERE

In [14]:
# Categorical features should be specified in train_dataset and test_dataset
assert sorted(train_dataset.cat_features) == sorted(categorical_features), "The categorical features of train_dataset are not specified correctly"
assert sorted(test_dataset.cat_features) == sorted(categorical_features),   "The categorical features of test_dataset are not specified correctly"

# train_dataset and test_dataset should have the correct feature and label columns
assert (train_dataset.features_columns.shape) == (10718, 11), "The features columns of train_dataset are not correct"
assert (test_dataset.features_columns.shape) == (168, 11), "The features columns of test_dataset are not correct"
assert (train_dataset.label_name) == "count", "The label name of train_dataset is not correct"
assert (test_dataset.label_name) == "count", "The label name of test_dataset is not correct"

### 2b) Deepchecks Suite with conditions
Implement the `evaluate` function that uses Deepchecks Suite to perform the following two tests:
1) Evaluate the model's MAE and RMSE on both training and testing dataset. This test should fail if the MAE or RMSE drops more than 20% on the testing dataset compared to the training dataset;
2) Evaluate the model's inference time on both training and tests dataset. This test should fail if the average inference time exceeds 0.1 second. 

Finally, this function should return a Deepchecks [SuiteResult](https://docs.deepchecks.com/stable/api/generated/deepchecks.core.SuiteResult.html) containing the evaluation result.

**Hints**:
- [How to add conditions to a test?](https://docs.deepchecks.com/stable/general/usage/customizations/auto_examples/plot_configure_check_conditions.html)
- [Train test performance](https://docs.deepchecks.com/stable/api/generated/deepchecks.tabular.checks.model_evaluation.TrainTestPerformance.html)
- [Model inference time](https://docs.deepchecks.com/stable/tabular/auto_checks/model_evaluation/plot_model_inference_time.html)
- [Condition for comparing model performance between training and testing dataset](https://docs.deepchecks.com/stable/api/generated/deepchecks.tabular.checks.model_evaluation.TrainTestPerformance.add_condition_train_test_relative_degradation_less_than.html#deepchecks.tabular.checks.model_evaluation.TrainTestPerformance.add_condition_train_test_relative_degradation_less_than)
- [Condition for validating inference time](https://docs.deepchecks.com/stable/api/generated/deepchecks.tabular.checks.model_evaluation.ModelInferenceTime.add_condition_inference_time_less_than.html)

In [15]:
def evaluate(train_dataset: Dataset, test_dataset: Dataset, model: LGBMRegressor) -> SuiteResult:
    """
    Use Deepchecks to evaluate 1) model's MAE and RMSE on both training and testing dataset, 2) model's inference time.
    Args:
        train_dataset (Dataset): training Dataset
        test_dataset (Dataset): testing Dataset
        model (LGBMRegressor): The LightGBM regression model to be evaluated
    Return:
        a Deepchecks SuiteResult that contains the results of a Deepchecks suite run
    """
    
    ### START CODE HERE
    # Initialize the train-test performance check
    performance_check = TrainTestPerformance()
    # Add condition to ensure that MAE and RMSE on test set do not degrade by more than 20% compared to training set
    performance_check.add_condition_train_test_relative_degradation_less_than(0.2)
    
    # Initialize the model inference time check
    inference_time_check = ModelInferenceTime()
    # Add condition to ensure that average inference time is below 0.1 seconds
    inference_time_check.add_condition_inference_time_less_than(0.1)

    suite = Suite("Model Evaluation Suite")
    suite.add(performance_check)
    suite.add(inference_time_check)
    
    # Run the suite on the provided model and datasets
    suite_result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset, model=model)
    
    return suite_result
    ### END CODE HERE


In [16]:
# We provide a testing model trained on the same bike demand dataset to help you check if your evaluate function works correctly
test_model = pickle.load(open("test-model.pkl", "rb"))
evaluation_result = evaluate(train_dataset, test_dataset, test_model)

# These tests should pass
failed_checks = evaluation_result.get_not_passed_checks()
assert len(failed_checks) == 1, "The number of failed checks in the evaluation result is not correct"
failed_condition_result = failed_checks[0].conditions_results[0]
assert failed_condition_result.name == "Train-Test scores relative degradation is less than 0.2", "The condition for comparing model performance between the training and test dataset is not correct"

passed_checks = evaluation_result.get_passed_checks()
assert len(passed_checks) == 2, "The number of passed checks in the evaluation result is not correct"
passed_condition_result = passed_checks[0].conditions_results[0]
assert passed_condition_result.name == "Average model inference time for one sample is less than 0.1",  "The condition for evaluating model inference time is not correct"

In [17]:
# Export the evaluation results to an HTML file
evaluation_result_file = "test-result.html"
delete_file_if_existing(evaluation_result_file)

evaluation_result_file = evaluation_result.save_as_html(evaluation_result_file)
print(f"The evaluation result is saved in an HTML file named {evaluation_result_file}")

Delete the existing test-result.html
The evaluation result is saved in an HTML file named test-result.html


After running the above code cell, you should see a file named "test-result.html" appear under the same directory as this notebook.

<details>
    <summary> Expected output when open the file in your browser </summary>
    <br />
    The test of MAE/RMSE should fail:
    <br />
    <img src="./images/deepchecks-train-test-performance.png">
    <br />
    The test of inference time should pass:
    <br />
    <img src="./images/deepchecks-inference-time.png">
</details>

### Screenshots to be submitted for Assignment 2
Like the expected output above, please submit screenshots of the web page showing the passed and failed test(s).

## Assignment 3: Tracking model training in MLflow (2 points)
Similar to what you see in the MLflow tutorial, please complete the `log_to_mlflow` function that performs the following tasks:
1. Use LightGBM to train a regression model to predict the bike sharing demand. The model should be trained using the training DataFrame you prepared previously and with the hyperparameters given as an argument. The hyperparameters are given as a dictionary, e.g. `hyperparams = {"num_leaves": 63, "learning_rate": 0.05, "random_state": 42}`. 
1. In an MLflow Run, use MLflow to track the model training:
    1. Log the used hyperparameters to MLflow, using the keys in the `hyperparams` dictionary as the parameter names as shown below (check the "Parameters" column in the screenshot below).
    2. Use the `evaluation` function you created above to evaluate the trained model. Then export the evaluation results to an HTML file and upload the file to MLflow. The HTML file of the Deepchecks model evaluation results uploaded to MLflow should be named **"evaluation_result.html"**. Please also make sure that the file you upload to MLflow doesn't locate inside any other directory under the MLflow Run.   
    3. Register the trained model to MLflow. The registered model should be named **"Week1LgbmBikeDemand"** and linked to the MLflow Run. 
    4. Finally, return the Run ID of the MLflow Run. (You can refer to the MLflow tutorial on how to get the Run ID of an MLflow Run.)

More illustration:

<img src="./images/ass3-example.png" width=1200/>

Hints:
* [Log multiple parameters to MLflow](https://mlflow.org/docs/2.9.2/python_api/mlflow.html#mlflow.log_params)
* [Log a local file to MLflow](https://mlflow.org/docs/2.9.2/python_api/mlflow.html#mlflow.log_artifact)
* [Log a model to MLflow](https://mlflow.org/docs/2.9.2/python_api/mlflow.lightgbm.html?highlight=log_model#mlflow.lightgbm.log_model)

In [29]:
#mlflow configuration
MLFLOW_S3_ENDPOINT_URL = "http://mlflow-minio.local"
MLFLOW_TRACKING_URI = "http://mlflow-server.local"
AWS_ACCESS_KEY_ID = "minioadmin"
AWS_SECRET_ACCESS_KEY = "minioadmin"
mlflow_experiment_name = "week1-lgbm-bike-demand"

os.environ["MLFLOW_S3_ENDPOINT_URL"] = MLFLOW_S3_ENDPOINT_URL
os.environ["AWS_ACCESS_KEY_ID"] = AWS_ACCESS_KEY_ID
os.environ["AWS_SECRET_ACCESS_KEY"] = AWS_SECRET_ACCESS_KEY
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(mlflow_experiment_name)

<Experiment: artifact_location='s3://mlflow/4', creation_time=1730535660424, experiment_id='4', last_update_time=1730535660424, lifecycle_stage='active', name='week1-lgbm-bike-demand', tags={}>

In [30]:
mlflow_evaluation_result_filename = "evaluation_result.html"
registered_model_name = "Week1LgbmBikeDemand"

def log_to_mlflow(hyperparams: Dict[str, Any]) -> str:
    """
    Train a LightGBM model, log the used hyperparameters, upload the Deepchecks evaluation result HTML file and register the trained model to MLflow
    Args:
        hyperparams: The hyperparameters used to train the model
    Returns:
        The MLflow Run ID
    """
    with mlflow.start_run() as run:
        model = LGBMRegressor(**hyperparams)
        model.fit(train_x, train_y, categorical_feature=categorical_features)

        # TODO: 1) Log hyperparameters
        # 2) Use the "evaluate" function to evaluate the model, export the evaluation results to an HTML file and upload the file
        # 3) Register the model
        # 4) Return the MLflow Run ID
        ### START CODE HERE
        # Log hyperparameters to MLflow
        mlflow.log_params(hyperparams)
        
        # Evaluate the model and export the results to an HTML file
        evaluation_result.save_as_html(mlflow_evaluation_result_filename)

        # Log the HTML evaluation results file to MLflow
        mlflow.log_artifact(mlflow_evaluation_result_filename)
        
        # Register the model with MLflow
        mlflow.lightgbm.log_model(model, artifact_path="model", registered_model_name=registered_model_name)
        
        # Get and return the Run ID of the MLflow run
        run_id = run.info.run_id
        return run_id
        ### END CODE HERE

In [31]:
# model hyperparameters
hyperparams = {
    "num_leaves": 63,
    "learning_rate": 0.05,
    "random_state": 42
}

delete_file_if_existing(mlflow_evaluation_result_filename)
mlflow_run_id = log_to_mlflow(hyperparams=hyperparams)
print(f"MLflow Run ID: {mlflow_run_id}")

Delete the existing evaluation_result.html
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000291 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 291
[LightGBM] [Info] Number of data points in the train set: 10718, number of used features: 11
[LightGBM] [Info] Start training from score 191.275518


Registered model 'Week1LgbmBikeDemand' already exists. Creating a new version of this model...
2024/11/06 14:06:57 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week1LgbmBikeDemand, version 4
Created version '4' of model 'Week1LgbmBikeDemand'.
2024/11/06 14:06:57 INFO mlflow.tracking._tracking_service.client: 🏃 View run sedate-sponge-781 at: http://mlflow-server.local/#/experiments/4/runs/8a266fdee8424dd4b6da4881273162bf.
2024/11/06 14:06:57 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://mlflow-server.local/#/experiments/4.


MLflow Run ID: 8a266fdee8424dd4b6da4881273162bf


In [32]:
# This test simply checks that your mlflow_run_id is not empty
# Additional tests will be used during the grading so please make sure your implementation satisfies the requirements listed in the assignment instructions
assert len(mlflow_run_id) != 0


### Screenshots to be submitted for Assignment 3
To get the points from Assignment 3, please submit the following screenshots:
1. The logs of your MLflow run. Please include the parameters of the model in your screenshot. 
<details>
    <summary>Example</summary>
    <img src="./images/mlflow-run.png" width=1000>
</details>

2. The details of the MLflow run, including the uploaded Deepchecks evaluation result file;
<details>
    <summary>Example</summary>
    <img src="./images/mlflow-run-detail1.png" width=1000>
    <img src="./images/mlflow-run-detail2.png" width=1000>
    <img src="./images/mlflow-run-detail3.png" width=1000>
</details>

3. The registered model.
<details>
    <summary>Example</summary>
    <img src="./images/mlflow-model.png" width=1000>
</details>

## Assignment 4: Evaluate the trained model against another model (2 points)
Suppose your colleague had trained an ElasticNet model for the same use case of bike sharing demand prediction. In this assignment, please complete the `compare_models` function that performs the following tasks:
1. Using Deepchecks [Multi model performance report](https://docs.deepchecks.com/stable/tabular/auto_checks/model_evaluation/plot_multi_model_performance_report.html) to compare MAE and RMSE of your LightGBM model to the ElasticNet model and saving the result as an HTML file. **Use negative MAE and negative RMSE as the scorers**.
1. Uploading the result file to MLflow, the file should be named **"model_comparison.html"** and under the MLflow Run where you trained your LightGBM model in Assignment 3. Similar to `evaluation_result.html`, `model_comparison.html` shouldn't be inside any other folder.
E.g.,

<img src="./images/ass4-example.png" width=300 />

`model_comparison.html` should look like this:

<img src="./images/deepchecks-compare-models.png" >

Finally, the function should return a Deepckecks [CheckResult](https://docs.deepchecks.com/stable/api/generated/deepchecks.core.CheckResult.html) containing the model comparison results. 

Note that the idea here is to attach a file to an existing MLflow Run, not to create a new MLflow Run and then upload the file under the new MLflow Run. 

You may find the following doc helpful: 
- [mlflow.start_run](https://mlflow.org/docs/2.9.2/python_api/mlflow.html#mlflow.start_run) (Pay attention to the use of the `run_id` parameter).

In [33]:
# This is the model you just trained. In practice, the model can be downloaded from MLflow. Here, for simplicity, we just retrain the model
model = LGBMRegressor(**hyperparams)
model.fit(train_x, train_y, categorical_feature=categorical_features)

# Load the old ElasticNet model
old_model = pickle.load(open("old-model.pkl", "rb"))

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000260 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 291
[LightGBM] [Info] Number of data points in the train set: 10718, number of used features: 11
[LightGBM] [Info] Start training from score 191.275518


In [56]:

mlflow_model_comparison_result_filename = "model_comparison.html"

def compare_models(mlflow_run_id: str) -> CheckResult:
    """
    Use Deepchecks to compare the performance of the LightGBM model and the old ElasticNet model
    Args:
        mlflow_run_id: The model comparison result file should be uploaded under the MLflow Run whose Run ID is mlflow_run_id
    Return:
        Deepchecks CheckResult that contains the model comparison results
    """

    # TODO: 1) Use Deepchecks to compare your LightGBM model to the ElasticNet model and save the result to an HTML file
    # 2) Upload the result file to MLflow. The file should be under the MLflow Run where you trained your LightGBM model
    # 3) Return the comparison CheckResult
    ### START CODE HERE
    ## 创建一个 Deepchecks DataSet
    train_ds = Dataset(train_x, train_y, cat_features=categorical_features)
    test_ds = Dataset(test_x, test_y, cat_features=categorical_features)
    
    # 使用 Deepchecks 进行模型性能比较
    report = MultiModelPerformanceReport()
    result = report.run(train_ds, test_ds, [model, old_model])
    
    # 打印结果的 DataFrame 内容
    res_df = result.value
    print("DataFrame content:", res_df)
    print("DataFrame columns:", res_df.columns)
    
    # 保存比较结果为 HTML 文件
    result.save_as_html(mlflow_model_comparison_result_filename)
    
    # 在指定的 MLflow Run 中上传结果文件
    with mlflow.start_run(run_id=mlflow_run_id):
        mlflow.log_artifact(mlflow_model_comparison_result_filename)
    
    # 返回 Deepchecks 的检查结果
    return result
    ### END CODE HERE

In [57]:
delete_file_if_existing(mlflow_model_comparison_result_filename)

os.environ["MLFLOW_S3_ENDPOINT_URL"] = MLFLOW_S3_ENDPOINT_URL
os.environ["AWS_ACCESS_KEY_ID"] = AWS_ACCESS_KEY_ID
os.environ["AWS_SECRET_ACCESS_KEY"] = AWS_SECRET_ACCESS_KEY
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)


Delete the existing model_comparison.html


In [59]:
res = compare_models(mlflow_run_id=mlflow_run_id)

# Check that the returned CheckResult is correct
res_df = res.value
print("DataFrame content:", res_df)
print("DataFrame columns:", res_df.columns)
lgbm_neg_mae = res_df.loc[
    (res_df["Model"] == "LGBMRegressor") & (res_df["Metric"] == "Neg MAE")
]["Value"].values[0]
elasticnet_neg_mae = res_df.loc[
    (res_df["Model"] == "ElasticNet") & (res_df["Metric"] == "Neg MAE")
]["Value"].values[0]
assert (
    lgbm_neg_mae > elasticnet_neg_mae
), "The Deepchecks model comparison report is not correct. The negative MAE of the LightGBM model should be larger than the negative MAE of the ElasticNet model"

lgbm_neg_rmse = res_df.loc[
    (res_df["Model"] == "LGBMRegressor") & (res_df["Metric"] == "Neg RMSE")
]["Value"].values[0]
elasticnet_neg_rmse = res_df.loc[
    (res_df["Model"] == "ElasticNet") & (res_df["Metric"] == "Neg RMSE")
]["Value"].values[0]
assert (
    lgbm_neg_rmse > elasticnet_neg_rmse
), "The Deepchecks model comparison report is not correct. The negative RMSE of the LightGBM model should be larger than the negative RMSE of the ElasticNet model"

delete_file_if_existing(mlflow_model_comparison_result_filename)


DataFrame content:            Model       Value    Metric  Number of samples
0  LGBMRegressor  -73.494141  Neg RMSE                168
1  LGBMRegressor  -54.707325   Neg MAE                168
2  LGBMRegressor    0.812085        R2                168
3     ElasticNet -154.583432  Neg RMSE                168
4     ElasticNet -108.537966   Neg MAE                168
5     ElasticNet    0.168655        R2                168
DataFrame columns: Index(['Model', 'Value', 'Metric', 'Number of samples'], dtype='object')


2024/11/06 14:31:49 INFO mlflow.tracking._tracking_service.client: 🏃 View run sedate-sponge-781 at: http://mlflow-server.local/#/experiments/4/runs/8a266fdee8424dd4b6da4881273162bf.
2024/11/06 14:31:49 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://mlflow-server.local/#/experiments/4.


DataFrame content:            Model       Value    Metric  Number of samples
0  LGBMRegressor  -73.494141  Neg RMSE                168
1  LGBMRegressor  -54.707325   Neg MAE                168
2  LGBMRegressor    0.812085        R2                168
3     ElasticNet -154.583432  Neg RMSE                168
4     ElasticNet -108.537966   Neg MAE                168
5     ElasticNet    0.168655        R2                168
DataFrame columns: Index(['Model', 'Value', 'Metric', 'Number of samples'], dtype='object')
Delete the existing model_comparison.html


### Screenshots to be submitted for Assignment 4
The details of the MLflow run including uploaded Deepchecks model comparison result file.
<details>
    <summary>Example</summary>
    <img src="./images/deepchecks-compare-models.png" />
</details>

## What to submit
- This Jupyter notebook
- The PDF file containing your screenshots for Assignments 2, 3, and 4.