# Using MLFlow model in Keboola Transformation

This notebook demonstrates how to use a model registered and deployed in Keboola using MLFlow. Ensure you have gone through the **Getting Started** - **Working_With_MLFlow_in_Keboola** notebook to register and deploy your model first.

## Steps Included:
1. Initialize MLFlow
2. Load the dataset using the Keboola Common Interface
3. Invoke the deployed model
4. Log model monitoring metrics
5. End the MLFlow run

---
## Initialize MLFlow and Set Experiment

You can log every usage of your deployed model into an MLFlow experiment as runs. This step is optional, but highly recommended for better tracking and monitoring. Please enter an experiment name or select one from the existing experiments.


In [None]:
# Import necessary packages
import mlflow
import datetime
from keboola.component import CommonInterface
import pandas as pd
import requests
import ipywidgets as widgets
from IPython.display import display

# Initialize Keboola Common Interface
ci = CommonInterface()

# Widgets for experiment name
experiment_name_widget = widgets.Text(
    value='',
    placeholder='Enter the experiment name',
    description='Experiment Name:',
    disabled=False
)
display(experiment_name_widget)

# Widgets to confirm logging to MLFlow
log_to_mlflow_widget = widgets.ToggleButtons(
    options=['Yes', 'No'],
    description='Log to MLFlow?',
    disabled=False,
    button_style=''
)
display(log_to_mlflow_widget)


In [None]:
# Initialize MLFlow and set experiment
if log_to_mlflow_widget.value == 'Yes':
    experiment_name = experiment_name_widget.value
    if experiment_name:
        mlflow.set_experiment(experiment_name)
    else:
        raise ValueError("The experiment name must be provided if logging to MLFlow is enabled.")

    # Start MLFlow run
    mlflow.start_run()

    # Log initial parameters
    mlflow.log_param('experiment_name', experiment_name)
    mlflow.log_param('start_time', datetime.datetime.now().isoformat())
    print(f"MLFlow experiment '{experiment_name}' started.")
else:
    print("Skipping MLFlow logging.")


---
## Load the Dataset

We will use the Keboola Common Interface to load the input dataset. Ensure you have configured your input mapping in Keboola.


In [None]:
# Load input tables
input_tables = ci.get_input_tables_definitions()

# Create a dropdown widget for selecting a table
if input_tables:
    table_list = [table.full_path for table in input_tables]
    tables_dropdown = widgets.Dropdown(options=table_list, value=table_list[0], description='Table:', disabled=False)
    display(tables_dropdown)
else:
    logging.warning("No tables found. Please ensure you have loaded tables into the workspace using the table input mapping.")


---
## Specify the Model Endpoint URL

Enter the endpoint URL of the deployed model.


In [None]:
# Widget to enter the endpoint URL
endpoint_url_widget = widgets.Text(
    value='',
    placeholder='Enter the endpoint URL of the deployed model',
    description='Endpoint URL:',
    disabled=False
)
display(endpoint_url_widget)


In [None]:
# Load the selected dataset
table_path = tables_dropdown.value
dataframe = pd.read_csv(table_path)

# Display the first few rows of the dataset
display(dataframe.head())

---
## Invoke the Deployed Model

We will send the processed dataframe to the model endpoint and receive predictions.


In [None]:
# Define the endpoint URL
endpoint_url = endpoint_url_widget.value

# Ensure the endpoint URL is provided
if not endpoint_url:
    raise ValueError("The endpoint URL must be provided.")

# Send request to the model endpoint and store the result
response = requests.post(
    url=endpoint_url,
    headers={'Content-Type': 'application/json; format=pandas-split'},
    data=dataframe.to_json(orient='split')
)

# Check for errors in the response
if response.status_code != 200:
    raise ValueError(f"Error in model invocation: {response.text}")

# Add the predicted scores to the dataframe
dataframe['score'] = response.json()

# Save the results to the output path
output_path = 'out/tables/nlp_stats_score.csv'
dataframe.to_csv(output_path, index=False)

print(f"Results saved to {output_path}")


---
## Log Model Monitoring Metrics

Log the descriptive statistics of the dataframe for model monitoring.
**Applicable if you selected MLFlow logging and entered the experiment name**.

In [None]:
# Log model monitoring metrics
if log_to_mlflow_widget.value == 'Yes':
    dataframe_desc = dataframe.describe()
    vals = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']
    for col in dataframe.columns:
        if col != 'score':  # Do not log metrics for the score column
            for val in vals:
                mlflow.log_metric(f'{col}_{val.replace("%", "_pct")}', dataframe_desc.loc[val, col])
    print("Model monitoring metrics logged to MLFlow.")
else:
    print("Skipping logging of model monitoring metrics.")


---
## End the MLFlow Run

End the MLFlow run and log the execution time.


In [None]:
# End MLFlow run
if log_to_mlflow_widget.value == 'Yes':
    end_time = datetime.datetime.now()
    time_delta = (end_time - mlflow.active_run().info.start_time).total_seconds()

    mlflow.log_metric('execution_time', time_delta)
    mlflow.end_run()

    print("MLFlow run ended.")
else:
    print("Skipping ending MLFlow run.")
