[![](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-mlops/blob/main/docs/mlflow.ipynb)

# MLFlow

We show how LaminDB can be integrated with [MLflow](https://mlflow.org/) to track the training process and associate datasets & parameters with models.

In [None]:
# !pip install 'lamindb[jupyter]' torchvision lightning wandb
!lamin init --storage ./lamin-mlops

In [None]:
import lamindb as ln
import mlflow
import lightning

from typing import Any
from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder

ln.track()

## Define a model

We use a basic PyTorch Lightning autoencoder as an example model.

````{dropdown} Code of LitAutoEncoder
```{eval-rst}
.. literalinclude:: autoencoder.py
   :language: python
   :caption: Simple autoencoder model
```
````

## Query & download the MNIST dataset

We saved the MNIST dataset in [curation notebook](/mnist) which now shows up in the Artifact registry:

In [None]:
ln.Artifact.filter(kind="dataset").df()

You can also find it on lamin.ai if you were connected your instance.

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/LlMSvBjHuXbs36TBGoCM.png" alt="instance view" width="800px">

Let's get the dataset:

In [None]:
mlflow_model_af = ln.Artifact.get(key="testdata/mnist")
mlflow_model_af

And download it to a local cache:

In [None]:
path = mlflow_model_af.cache()
path

Create a PyTorch-compatible dataset:

In [None]:
dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset

## Monitor training with MLflow

Train our example model and track the training progress with `MLflow`.

In [None]:
def convert_mlflow_params(params: dict[str, str]) -> dict[str, Any]:
    """Concerts MLFlow str parameters to their actual type"""

    def _convert_value(value):
        if value == "None":
            return value
        if value.lower() in ("true", "false"):
            return value.lower() == "true"
        try:
            return (
                float(value) if ("." in value or "e" in value.lower()) else int(value)
            )
        except ValueError:
            return value

    return {
        param_name: _convert_value(param_value)
        for param_name, param_value in params.items()
    }

In [None]:
# enable MLFlow PyTorch autologging
mlflow.pytorch.autolog()

MODEL_CONFIG = {
    "hidden_size": 32,
    "bottleneck_size": 16,
    "batch_size": 32,
    "lr": 0.001,
    "eps": 1e-08,
}

hyperparameter_type = ln.Feature(name="Autoencoder hyperparameter", is_type=True).save()
for param_name, param_value in MODEL_CONFIG.items():
    ln.Feature(
        name=param_name, dtype=type(param_value).__name__, type=hyperparameter_type
    ).save()


# Track MLflow run
@ln.tracked()
def execute_mlflow_run(
    hidden_size: int, bottleneck_size: int, batch_size: int
) -> ln.Artifact:
    with mlflow.start_run() as mlflow_run:
        train_dataset = MNIST(
            root="./data", train=True, download=True, transform=ToTensor()
        )
        train_loader = utils.data.DataLoader(train_dataset, batch_size=batch_size)

        # Initialize model
        autoencoder = LitAutoEncoder(hidden_size, bottleneck_size)

        # Create checkpoint callback
        from lightning.pytorch.callbacks import ModelCheckpoint

        checkpoint_callback = ModelCheckpoint(
            dirpath="model_checkpoints",
            filename=f"{mlflow_run.info.run_id}_last_epoch",
            save_top_k=1,
            monitor="train_loss",
        )

        # Train model
        trainer = lightning.Trainer(
            accelerator="cpu",
            limit_train_batches=3,
            max_epochs=2,
            callbacks=[checkpoint_callback],
        )

        trainer.fit(model=autoencoder, train_dataloaders=train_loader)

        # Get run information
        run_id = mlflow_run.info.run_id
        metrics = mlflow.get_run(run_id).data.metrics
        params = mlflow.get_run(run_id).data.params
        params = convert_mlflow_params(mlflow.get_run(run_id).data.params)

        # Create hyperparameter and metric features
        for param_name, param_value in params.items():
            if param_name not in MODEL_CONFIG:
                ln.Feature(
                    name=param_name,
                    dtype=type(param_value).__name__,
                    type=hyperparameter_type,
                ).save()
        metric_type = ln.Param(name="Autoencoder metric", is_type=True).save()
        ln.Feature(name="train_loss", dtype="float", type=metric_type).save()

        local_artifact_path = mlflow_run.info.artifact_uri.removeprefix("file://")

        # save model artifacts
        mlflow_run_afs = ln.Artifact.from_dir(
            local_artifact_path,
            key=f"testmodels/mlflow/{local_artifact_path}",
        )
        ln.save(mlflow_run_afs)

        # save checkpoint as a model
        mlflow_model_af = ln.Artifact(
            f"model_checkpoints/{run_id}_last_epoch.ckpt",
            key="testmodels/mlflow/litautoencoder.ckpt",
            kind="model",
        ).save()

        # annotate artifact with hyperparameters and metrics
        mlflow_model_af.features.add_values(params)
        mlflow_model_af.features.add_values(metrics)

        return mlflow_model_af, mlflow_run

**See the training progress in the `mlflow` UI:**

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/C0seowxsq4Du2B4T0000.png" alt="MLFlow training UI" width="800px">

## Save model in LaminDB

In [None]:
mlflow_model_af, mlflow_run = execute_mlflow_run(
    batch_size=MODEL_CONFIG["batch_size"],
    bottleneck_size=MODEL_CONFIG["bottleneck_size"],
    hidden_size=MODEL_CONFIG["hidden_size"],
)

# look at Artifact annotations
mlflow_model_af.describe()

**See the checkpoints:**

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/n0xxFoMRtZPiQ7VT0001.png" alt="MLFlow checkpoints UI" width="800px">

If later on, you want to re-use the checkpoint, you can download it like so:

In [None]:
ln.Artifact.get(key="testmodels/mlflow/litautoencoder.ckpt").cache()

Or on the CLI:
```
lamin get artifact --key 'testmodels/litautoencoder'
```

In [None]:
ln.finish()

In [None]:
#!rm -rf ./lamin-mlops
#!lamin delete --force lamin-mlops