## Using MLFlow in Keboola Python Workspaces

This notebook demonstrates how to use MLFlow within Keboola Python Workspaces. MLFlow is a platform for managing machine learning experiments, including tracking metrics, parameters, and model artifacts.

### Prerequisites

- Ensure that ML/AI Services are enabled in your Keboola project. If not, please contact the Keboola Support team to enable these services.
- Basic understanding of Python and machine learning concepts.

### Key Features of MLFlow:

1. **Setting Up Experiments**: Organize and track your experiments.
2. **Logging Parameters and Metrics**: Record parameters and metrics used in your experiments.
3. **Saving and Registering Models**: Log and register models for deployment.

In this notebook, we'll demonstrate these features using sample data from the Scikit-Learn library.


In [None]:
import mlflow
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)

# Load sample data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


---
### Setting Up an Experiment

We'll start by setting up an experiment in MLFlow. This helps in organizing and tracking different runs of your machine learning experiments.


In [None]:
# Set up an experiment
experiment_name = "Iris_Classification_Experiment"
mlflow.set_experiment(experiment_name)

logging.info(f"Experiment '{experiment_name}' is set.")


---
### Starting and Ending a Run

Within an experiment, you can start and end multiple runs. Each run will log its own set of parameters, metrics, and artifacts.


In [None]:
# Start a new run
with mlflow.start_run():
    # Initialize the model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    
    # Log parameters
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("random_state", 42)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Log model
    mlflow.sklearn.log_model(model, "random_forest_model")
    
    # Evaluate the model
    accuracy = model.score(X_test, y_test)
    
    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    
    logging.info(f"Model accuracy: {accuracy}")

    # End the run
    logging.info("Run ended. Check the MLFlow UI for details.")


---
### Viewing Results in MLFlow UI

In the MLFlow UI, you can see the logged parameters, metrics, and model artifacts. You can access the UI through the Keboola platform from your Workspace Configuration.


---
### Logging and Registering a Model

We'll demonstrate how to log a model and register it for deployment. This allows you to manage and version your models. Additionally, we'll log an `input_example` to illustrate how the model can be used with sample inputs.


In [None]:
# Start a new run
with mlflow.start_run():
    # Initialize the model
    model = RandomForestClassifier(n_estimators=150, random_state=42)
    
    # Log parameters
    mlflow.log_param("n_estimators", 150)
    mlflow.log_param("random_state", 42)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Create an input example
    input_example = X_train.head(1)
    
    # Log model with input example
    mlflow.sklearn.log_model(model, "random_forest_model_v2", input_example=input_example)
    
    # Evaluate the model
    accuracy = model.score(X_test, y_test)
    
    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    
    # Register the model
    model_uri = f"runs:/{mlflow.active_run().info.run_id}/random_forest_model_v2"
    mlflow.register_model(model_uri, "RandomForestClassifierIris")

    logging.info(f"Model accuracy: {accuracy}")
    logging.info("Model registered in MLFlow UI.")


---
### Deploying the registered models
Registered models can be deployed (served) in Keboola Platform. In Keboola UI navigate to Transformations - ML/AI Services and click Deploy Model.

The UI will list all models registered in Keboola MLFlow server. It will deploy it as a service and provide you with a URL that can be used to send requests to the model.

---
### Summary

In this notebook, we've demonstrated how to use MLFlow within Keboola Python Workspaces. We've covered setting up experiments, starting and ending runs, logging parameters and metrics, and registering models. You can view all the logged information in the MLFlow UI through the Keboola platform.

For more information on MLFlow, visit the [MLFlow Documentation](https://www.mlflow.org/docs/latest/index.html).
