This mlflow test-notebook was created by the [quickstart](https://mlflow.org/docs/latest/getting-started/intro-quickstart/index.html)

## Step 1. Get MLflow

```bash
pip install mlflow
```

or 

```bash
poetry add mlflow
```

## Step 2. Start a Tracking Server

Launch a server via:
```bash
mlflow server --host 127.0.0.1 --port 8080
```

## Step 3. Train a model and prepare metadata for logging

- Load and prepare the Iris dataset for modeling.
- Train a Logistic Regression model and evaluate its performance.
- Prepare the model hyperparameters and calculate metrics for logging.

In [1]:
import mlflow
from mlflow.models import infer_signature

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score,\
    recall_score, f1_score

# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
    "solver": "lbfgs",
    "max_iter": 1000,
    "multi_class": "auto",
    "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)

## Step 4. Log the model and its metadata to MLflow

In this next step, we’re going to use the model that we trained, the hyperparameters that we specified for the model’s fit, and the loss metrics that were calculated by evaluating the model’s performance on the test data to log to MLflow.

The steps that we will take are:

- Initiate an MLflow `run` context to start a new run that we will log the model and metadata to.
- `Log` model `parameters` and performance `metrics`.
- `Tag` the run for easy retrieval.
- `Register` the model in the MLflow Model Registry while `logging` (saving) the model.

`Signature` provides clear documentation of the model's expected inputs and outputs [*](https://www.restack.io/docs/mlflow-knowledge-mlflow-signature-guide).

MLflow signatures provide a way to define the expected input schema for ML models, ensuring that the model's inputs are consistent with the training data. 

This feature is crucial for model validation and debugging.

In [3]:
# Set our tracking server uri for logging
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")

# Create a new MLflow Experiment
mlflow.set_experiment("MLflow Quickstart")

# Start an MLflow run
with mlflow.start_run():
    # Log the hyperparameters
    mlflow.log_params(params)
    
    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)
    
    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")
    
    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))
    
    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="tracking-quickstart-new",
    )

Successfully registered model 'tracking-quickstart-new'.
2024/04/19 11:13:41 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: tracking-quickstart-new, version 1
Created version '1' of model 'tracking-quickstart-new'.


In [5]:
model_info

<mlflow.models.model.ModelInfo at 0x7f4dc9728390>

- `artifact_path` can be the same for all runs, since now one run - one model
- `registered_model_name` if it is the same as in another run, create new version of the same registered model. 
If not, create new model with version 1.

## Step 5. Load the model as a Python Function (pyfunc) and use it for inference

After logging the model, we can perform inference by:

- `Loading` the model using MLflow’s pyfunc flavor.
- Running `Predict` on new data using the loaded model.

In [6]:
# Load the model back for predictions as a generic Python Function model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

predictions = loaded_model.predict(X_test)

iris_feature_names = datasets.load_iris().feature_names

result = pd.DataFrame(X_test, columns=iris_feature_names)
result["actual_class"] = y_test
result["predicted_class"] = predictions

result[:4]

Downloading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2024/04/16 11:12:34 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),actual_class,predicted_class
0,6.1,2.8,4.7,1.2,1,1
1,5.7,3.8,1.7,0.3,0,0
2,7.7,2.6,6.9,2.3,2,2
3,6.0,2.9,4.5,1.5,1,1


## Step 6. View the Run in the MLflow UI

In order to see the results of our run, we can navigate to the MLflow UI. Since we have already started the Tracking Server at http://localhost:8080, we can simply navigate to that URL in our browser.

See detailed description [here](https://mlflow.org/docs/latest/getting-started/intro-quickstart/index.html)