# TD MLFlow - Getting started

## I. Installation
#### Install MLflow on your machines (via uv, poetry or pip)
Note: you can also use the Docker image: https://github.com/mlflow/mlflow?tab=readme-ov-file#official-mlflow-docker-image

In [None]:
import os

import mlflow
import pandas as pd

# If you have a MLflow remote server:
# os.environ['user'] = 'marc'
# os.environ['MLFLOW_TRACKING_URI']= 'http://my-mlflow-server.com'
# mlflow.set_tracking_uri(address)

#### View MLFlow interface
This will open the MLflow user interface in your default web browser. In the user interface, you'll be able to view the metrics, parameters and models saved for each run, and compare them with each other to track the performance of your model.

Go to: http://127.0.0.1:5000

- do not use conda or google cloud to run this jupyter or the command
- on address already in use, kill the process running on the port or change the port (so change any occurrence of 5000 to 5001 for instance)
- instead of running the mlflow server command, add python -m at the beginning.

## II. First steps
See :
- https://mlflow.org/docs/latest/tracking.html
- https://mlflow.org/docs/latest/projects.html

#### Analyze the source code of the MLFlow example repo: https://github.com/martin-prillard/mlflow-example.git. What MLFlow functions can you identify?

- ```with mlflow.start_run():``` : `with statement` for MLFlow
- ```mlflow.log_param(<parameter>, alpha)```: parameter tracking
- ```mlflow.log_metric(<metric>, rmse)```: metrics tracking
- ```mlflow.sklearn.log_model(<model_sklearn>, "model")```: tracking of the model in `artefacts``

#### Launch a first repo run

Let's setup our first MLflow experiment. Experiments help organize and group related runs for easier tracking and comparison.


In [None]:
mlflow.set_experiment("1st_experiment")

We will tell MLflow to run a project from a GitHub repository. The repository must contain an MLproject file that defines how to run the project. That way, we ensure that we can get the same results and potentially improve the training process according to our needs, by changing hyperparameters for example.

In [None]:
!!mlflow run https://github.com/martin-prillard/mlflow-example.git \
    -P alpha=5.0 \
    --env-manager=local \
    -P tags="{'auteur': 'Jean Dupont', 'version_modèle': '1.0'}"

#### What is the git commit of this run?

Git Commit: 228123c1a36275b3deff9979e2c8766206ecf927

#### Let's imagine there have been other commits on the repo in the meantime. What would be the mlflow command to replay exactly the previous run?

In [None]:
!mlflow run \
    https://github.com/martin-prillard/mlflow-example.git \
    -v 228123c1a36275b3deff9979e2c8766206ecf927 \
    -P alpha=5.0 \
    --env-manager=local

#### What do you notice?

- The execution should be much faster because the repository has been fetched and the environment created

#### Perform a second run, changing the alpha parameter to 10 and the l1_ratio to 0.4

In [None]:
!mlflow run \
    https://github.com/martin-prillard/mlflow-example.git \
    -P alpha=10.0 \
    -P l1_ratio=0.4 \
    --env-manager=local

#### Compare the results on the interface. What do you notice?

- You can see the evolution of performance metrics as a function of alpha (Parallel Coordinates plot)

## III. Inference of the best model

Let's retrieve the best model. Note that we can use filters to manage the conditions that the model needs to satisfy in order to be considered.

In [None]:
#reading Pandas Dataframe from mlflow
df = mlflow.search_runs(filter_string="metrics.rmse < 0.86")
df

In [None]:
#get the Run ID of the best model
best_run = df.loc[df['metrics.rmse'].idxmin()]
print(f"run avec le rmse minimum ({df['metrics.rmse'].min()}): {best_run['run_id']}")

In [None]:
run_id = best_run['run_id']
artifact_path = "model"  # path used in mlflow.log_model()

model_uri = f"runs:/{run_id}/{artifact_path}"
model = mlflow.pyfunc.load_model(model_uri)

You can also use download artifacts.

In [None]:
from mlflow.artifacts import download_artifacts
#reading Pandas Dataframe from mlflow
df = mlflow.search_runs(filter_string="metrics.rmse < 1")

#get the Run ID of the best model
best_run = df.loc[df['metrics.rmse'].idxmin()]
print(f"run avec le rmse minimum ({df['metrics.rmse'].min()}): {best_run['run_id']}")
run_id = best_run['run_id']
artifact_path = "model"
local_path = download_artifacts(run_id=run_id, artifact_path=artifact_path)
model = mlflow.pyfunc.load_model(local_path)

#### Make an inference by reloading the model with ``mlflow.pyfunc.load_model``

In [None]:
data = [
    {
        "fixed acidity": 7,
        "volatile acidity": 0.27,
        "citric acid": 0.36,
        "residual sugar": 20.7,
        "chlorides": 0.045,
        "free sulfur dioxide": 45,
        "total sulfur dioxide": 170,
        "density": 1.001,
        "pH": 3,
        "sulphates": 0.45,
        "alcohol": 8.8
    },
    {
        "fixed acidity": 4,
        "volatile acidity": 0.53,
        "citric acid": 0.23,
        "residual sugar": 22,
        "chlorides": 0.065,
        "free sulfur dioxide": 47,
        "total sulfur dioxide": 185,
        "density": 1.014,
        "pH": 4,
        "sulphates": 0.55,
        "alcohol": 12.5
    },
]

#Predict on a Pandas DataFrame..
model.predict(pd.DataFrame(data))

## IV. Serving (API)
See :
- https://mlflow.org/docs/latest/models.html
- https://mlflow.org/docs/latest/models.html#deploy-mlflow-models

Prerequisite: the model must be saved in the artifacts with this command: 

``mlflow.sklearn.log_model(lr, "model")``

The function ``mlflow.sklearn.log_model`` produces two files in ``(...)/mlruns/0/{run_id}/artifacts/model``.
- The `MLmodel` file is a metadata file which tells MLflow how to load the model.
- The `model.pkl` file is a serialized version of the linear regression model we've trained.

### 1. Deploy an API locally

See: https://mlflow.org/docs/latest/cli.html#mlflow-models-serve

Prerequisite: install Pyenv binary (https://github.com/pyenv/pyenv#installation)

In this example, we'll use this MLmodel format with MLflow to deploy a local REST server that can perform inference.

MLFlow makes it easy:

In [None]:
!mlflow models serve \
    -m {model_uri} \
    --host 0.0.0.0 \
    -p 1235 \
    --env-manager=local

#### Inference with the served model

Once the server has been deployed, you can send it a sample of data and see the predictions. The following example uses ``curl`` to send a JSON-serialized pandas DataFrame:

**From a local terminal** 
    
On linux / macos :
```bash
curl -X POST -H "Content-Type:application/json" --data '{"dataframe_split": {"columns":["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"], "data":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}' http://127.0.0.1:1235/invocations
```

On Windows :
```bash
curl -X POST -H "Content-Type:application/json" --data "{\"dataframe_split\": {\"columns\":[\"fixed acidity\", \"volatile acidity\", \"citric acid\", \"residual sugar\", \"chlorides\", \"free sulfur dioxide\", \"total sulfur dioxide\", \"density\", \"pH\", \"sulphates\", \"alcohol\"],\"data\":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}" http://127.0.0.1:1235/invocations
```

## IV. Template management
See: https://mlflow.org/docs/latest/model-registry.html

#### Take the best model and click on "Register Model", then "Create a new model", and name it "Wine"

You can also use the command line:
```python
model_name = "Wine
model_url = f "runs://{ mlflow_run.info.run_id }/model"
registred_model_version = mlflow.register_model(model_uri, model_name)
```

In [None]:
model_name = "Wine"
best_model_uri = f"runs:/{run_id}/{artifact_path}"
registred_model_version = mlflow.register_model(best_model_uri, model_name)

In the registry ("Models" tab), you can add tags and have several versions of the model (Staging, Production, etc.).

In [None]:
#Get the latest model version
client = mlflow.MlflowClient()
latest_version = client.get_latest_versions(name=model_name, stages=["None"])[-1].version
# Chemin pPath to load modelri = f"models:/{model_name}/{latest_version}"
print(model_uri)
# Charger le modèle
model = mlflow.pyfunc.load_model(model_uri)
print(model)

#### Transition on model lifecycle states

See: https://www.mlflow.org/docs/latest/model-registry.html#model-registry-workflows

During its lifecycle, a model evolves, moving from development to production. You can tag a registered model with specific tags.

In [None]:
client.set_model_version_tag(
        name='Wine',
        version=2,
        key='task',
        value='new'
    )

There are CIs/CDs that automatically fetch the best model, put it in the Model Registry, run integration tests and so on.

Congratulations ! You learnt how to use a model already shared on git to perform inference. Now, it's time to build your own. You can load the second tutorial (2 - MLFlow - MNIST_EN.ipynb).