# Week6 Assignments (part 3)
This is a continuation of Assignment 4 from part2. 

In [6]:
from typing import Any, Dict
import os

import kfp
import kfp.dsl as dsl
from kfp import kubernetes

from evidently.ui.remote import RemoteWorkspace

from utils.utils import init_evidently_project, delete_kfp_exp
from utils.config import (
    EVIDENTLY_MONITOR_URL,
    INPUTS_OUTPUTS_BUCKET_NAME,
    GROUND_TRUTH_BUCKET_NAME,
    REGISTERED_MODEL_NAME,
    COLUMN_MAPPING_DICT
)

### 4d) Create a monitoring KFP pipeline
You need to use the three KFP components you just created to create a KKP monitoring pipeline. The KFP components should perform the tasks in the following order:

<img src="./images/kfp_monitoring_pipeline.jpg" width=600 />

(Other inputs the KFP components need are passed as arguments to the `monitoring_pipeline` function.)

**Note**: You need to assign the needed credentials to the combine_ground_truth and monitor tasks so that they can download the needed files from the MinIO service (the one used by MLflow). (Please check the week5 tutorial for more details.) 


Let's first load the KFP components from their YAML files you created in the first part of the assignments.

In [7]:
combine_ground_truth = kfp.components.load_component_from_file(os.path.join("components", "combine_ground_truth.yaml"))
get_production_model_version = kfp.components.load_component_from_file(os.path.join("components", "get_production_model_version.yaml"))
monitor = kfp.components.load_component_from_file(os.path.join("components", "monitor.yaml"))

In [8]:
@dsl.pipeline(
    name="monitoring-pipeline",
    description="Monitoring model performance, target and data drift using Evidently",
)
def monitoring_pipeline(
    year: int,
    quarter: int,
    registered_model_name: str,
    mlflow_tracking_uri: str,
    mlflow_s3_endpoint_url: str,
    inputs_outputs_bucket_name: str,
    ground_truth_bucket_name: str,
    evidently_monitor_uri: str,
    evidently_project_id: str,
    column_mapping_dict: Dict[str, Any],
):
    """
    Args:
        year and quarter: The time range of the data to be monitored
        registered_model_name: The name of the model registered to MLflow
        mlflow_tracking_uri: URI of MLflow's tracking server
        mlflow_s3_endpoint_url: URL of MLflow's artifact store
        inputs_outputs_bucket_name: Name of the bucket where model inputs+outputs data is stored
        ground_truth_bucket_name: Name of the bucket where ground truth is stored
        evidently_monitor_uri: URI of the remote Evidently Workspace
        evidently_project_id: The ID of the Evidently Project where the monitoring results are stored
        column_mapping_dict: A dictionary containing the configuration of the column mapping
    """
    ### START CODE HERE
    ground_truth = combine_ground_truth(
        year=year,
        quarter=quarter,
        s3_endpoint_url=mlflow_s3_endpoint_url,
        inputs_outputs_bucket_name=inputs_outputs_bucket_name,
        ground_truth_bucket_name=ground_truth_bucket_name)
    
    ground_truth.set_env_variable(name="AWS_ACCESS_KEY_ID", value="minioadmin")
    ground_truth.set_env_variable(name="AWS_SECRET_ACCESS_KEY", value="minioadmin")
    
    model_version = get_production_model_version(
        registered_model_name=registered_model_name,
        mlflow_tracking_uri=mlflow_tracking_uri)

    monitoring = monitor(
        evidently_monitor_uri=evidently_monitor_uri,
        evidently_project_id=evidently_project_id,
        column_mapping_dict=column_mapping_dict,
        prod_dataset=ground_truth.outputs["prod_data"],
        prod_model_version=model_version.outputs["model_version"], 
        mlflow_tracking_uri=mlflow_tracking_uri,
        mlflow_s3_endpoint_url=mlflow_s3_endpoint_url,
        mlflow_run_id=model_version.outputs["run_id"],
        year=year,
        quarter=quarter)
    
    monitoring.set_env_variable(name="AWS_ACCESS_KEY_ID", value="minioadmin")
    monitoring.set_env_variable(name="AWS_SECRET_ACCESS_KEY", value="minioadmin")
    ### END CODE HERE

In [9]:
# Init another Evidently Project at the remote Workspace
remote_workspace = RemoteWorkspace(EVIDENTLY_MONITOR_URL)
house_price_project = init_evidently_project(remote_workspace, project_name="house-price-model-monitoring")

arguments = {
    "year": 2018,
    "quarter": 1,
    "registered_model_name": REGISTERED_MODEL_NAME,
    "mlflow_tracking_uri": "http://mlflow.mlflow.svc.cluster.local:5000",
    "mlflow_s3_endpoint_url": "http://mlflow-minio-service.mlflow.svc.cluster.local:9000",
    "inputs_outputs_bucket_name": INPUTS_OUTPUTS_BUCKET_NAME,
    "ground_truth_bucket_name": GROUND_TRUTH_BUCKET_NAME,
    "evidently_monitor_uri": "http://evidently-service.monitoring.svc.cluster.local:8000",
    "evidently_project_id": str(house_price_project.id),
    "column_mapping_dict": COLUMN_MAPPING_DICT
}

run_name = "house-price-monitoring-run"
experiment_name = "house-price-monitoring-experiment"

kfp_client = kfp.Client()

kfp_client.create_run_from_pipeline_func(
    pipeline_func=monitoring_pipeline,
    run_name=run_name,
    experiment_name=experiment_name,
    arguments=arguments, # These are the arguments passed to the pipeline function
    enable_caching=False # Disable caching for this pipeline run
)

RunPipelineResult(run_id=757c7160-1ba9-4b01-a932-4f3456e8026a)

You should see there is one KFP Run created at [http://ml-pipeline-ui.local](http://ml-pipeline-ui.local) (under the "house-price-monitoring-experiment" KFP Experiment). The KFP Run should be like:

<img src="./images/monitoring_kfp_run.png" width=650/>

**Note**: If you create many KFP Runs, you might see some error message of "no space left on the device" from the logs of some failed tasks. You can delete the KFP experiments on Kubeflow Pipelines using the code below and recreate the KFP Run by running the code cell above. 

In [10]:
# # If you want to delete the KFP experiment, uncomment the following code

# experiment_name = "house-price-monitoring-experiment"
# delete_kfp_exp(experiment_name)

If the previous KFP Run is completed successfully, you can run the KFP pipeline three more times to generate the monitoring results for data from the rest three quarters of 2018. 

In [11]:
for quarter in range(2, 5):
    arguments["quarter"] = quarter
    kfp_client.create_run_from_pipeline_func(
        pipeline_func=monitoring_pipeline,
        run_name=run_name,
        experiment_name=experiment_name,
        arguments=arguments, # These are the arguments passed to the pipeline function
        enable_caching=False 
    )

If these three KFP Runs are completed successfully, you should see the familiar dashboard showing the quarterly MAE changes in 2018 under an Evidently Project named "house-price-model-monitoring" at [http://evidently-monitor-ui.local](http://evidently-monitor-ui.local). You should also see four Reports and four Test Suites under the project. 

The dashboard should be like

<img src="./images/dashboard_2018.png" width=800 />

### Screenshots for Assignment4
Please include the dashboard showing the MAE changes in 2018 in your PDF file.

Finally, let's compile the KFP pipeline and save it to a YAML file `pipeline.yaml`. The file should be located in the same directory as this notebook.

In [13]:
compiler = kfp.compiler.Compiler()
compiler.compile(monitoring_pipeline, "pipeline.yaml")

### Wrap-up
Please include the following files in your submission:
- `prometheus-config-patch.yaml` (You should find it in the "manifests" directory)
- `week6_assignments_part2.ipynb` and `week6_assignments_part3.ipynb` (You don't need to return `week6_assignments_part3.ipynb`)
- `pipeline.yaml` (this file will be generated when you complete the third part of the assignments)
- The PDF containing your screenshots for Assignments 1, 3, 4. 