# 2. How to deploy from MLflow with python

## 2.1 MLflow Models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools - for example: 
- real-time serving through a REST API, or 
- batch inference on Apache Spark

The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.

All of the flavors that a particular model supports are defined in its MLmodel file in YAML format. For example, `mlflow.sklearn` outputs models as follows:

```
# Directory written by mlflow.sklearn.save_model(tree, "model")
model/
├── MLmodel
├── model.pkl
├── conda.yaml
└── requirements.txt
```

For environment recreation, we automatically log `conda.yaml` and `requirements.txt` files whenever a model is logged. These files can then be used to reinstall dependencies using either `conda` or `pip`. 

And its `MLmodel` file describes two 'flavors':

```yaml
time_created: 2018-05-25T17:28:53.35

flavors:
  sklearn:
    sklearn_version: 0.19.1
    pickled_model: model.pkl
  python_function:
    loader_module: mlflow.sklearn
```

This model can then be used with any tool that supports either the `sklearn` or `python_function` model flavor. For example, the `mlflow models serve` command can serve a model with the `python_function` flavor:

```bash
(mlflow)$ mlflow models serve -m model
```

## 2.2 The MLflow Model Registry

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.

- **Model**: A MLflow Model is created from an experiment or run that is logged with one of the model flavor’s `mlflow.\<model_flavor\>.log_model()` methods. Once logged, this model can then be registered with the Model Registry.

- **Registered Model**: A MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, associated transitional stages, model lineage, and other metadata.

## 2.3. Register a model

In [1]:
import mlflow

server_uri = "http://localhost:5003"     # port 5003: a local tracking server 
#server_uri = "http://localhost:5007"     # port 5007: a local docker container running a tracking server

mlflow.set_tracking_uri(server_uri)       # or set the MLFLOW_TRACKING_URI in the env

First we need to create a registered model:

In [2]:
from mlflow.tracking import MlflowClient
from mlflow.exceptions import RestException

model_name = "penguins_clf"

client = MlflowClient()
try:
    registered_model = client.create_registered_model(model_name)
    print(registered_model)
except RestException:
    print(f"Model '{model_name}' already exists in registry.")

Model 'penguins_clf' already exists in registry.


<img src='../img/mlflow_ui_pinguins_created_models_list.png' alt='' width='1000'>

Now we can register experiment runs to that model. Pick a run ID from your tracking log and add it here.

In [3]:
# Use YOUR last run id:
run_id = "8e0f51052f4648f99bece2c8ae6e55cc"

result = mlflow.register_model(
    f"runs:/{run_id}/model",
    f"{model_name}"
)

Registered model 'penguins_clf' already exists. Creating a new version of this model...
2023/03/26 21:12:25 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: penguins_clf, version 1
Created version '1' of model 'penguins_clf'.


In [4]:
print(result)

<ModelVersion: creation_timestamp=1679857945836, current_stage='None', description='', last_updated_timestamp=1679857945836, name='penguins_clf', run_id='8e0f51052f4648f99bece2c8ae6e55cc', run_link='', source='./mlruns/1/8e0f51052f4648f99bece2c8ae6e55cc/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>


List of registered models:

<img src='../img/mlflow_ui_pinguins_created_models_list_with_version.png' alt='' width='1000'>

List of versions of the model `penguins_clf`:

<img src='../img/mlflow_ui_pinguins_created_model_list_of_versions.png' alt='' width='1000'>

Details of model `penguins_clf` Version 1:

<img src='../img/mlflow_ui_pinguins_created_model_version_1_details.png' alt='' width='1000'>

## 2.4. Serve a Model from the registry

**Open a new termina 4 and activate the `mlflow` conda env:**

```bash
$ conda activate mlflow
(mlflow)$ 
```

**Cd to your `lab2/mlflow` folder**

```bash
(mlflow)$ cd mlflow
```

**Serve a given model version:**

```bash
# Set environment variable for the tracking URL where the Model Registry resides
# Serve the production model from the model registry

(mlflow)$ export MLFLOW_TRACKING_URI=http://localhost:5003 
(mlflow)$ mlflow models serve --no-conda -m "models:/penguins_clf/1" -p 4242

2023/03/26 21:16:29 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2023/03/26 21:16:29 INFO mlflow.pyfunc.backend: === Running command 'exec gunicorn --timeout=60 -b 127.0.0.1:4242 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2023-03-26 21:16:29 +0200] [2191368] [INFO] Starting gunicorn 20.1.0
[2023-03-26 21:16:29 +0200] [2191368] [INFO] Listening at: http://127.0.0.1:4242 (2191368)
[2023-03-26 21:16:29 +0200] [2191368] [INFO] Using worker: sync
[2023-03-26 21:16:29 +0200] [2191415] [INFO] Booting worker with pid: 2191415
```
(This serves version 1 of the model)


**Or alternatively, serve by model stage:**

```bash
(mlflow)$ export MLFLOW_TRACKING_URI=http://localhost:5003 
(mlflow)$ mlflow models serve --no-conda -m "models:/penguins_clf/Production" -p 4242
```

(This serves the model in Production state)

## 2.5 Query the REST API

The REST API defines 4 endpoints:

- `/version` used for getting the mlflow version
- `/ping` used for health check
- `/health` (same as /ping)
- `/invocations` used for scoring


The REST API server accepts `csv` or `json` input. The input format must be specified in `Content-Type` header. The value of that header must be either `application/json` or `application/csv`.

- **The `csv` input** must be a valid `pandas.DataFrame` csv representation. For example, `data = pandas_df.to_csv()`.

- **The `json` input** must be a dictionary with exactly one of the following fields that further specify the type and encoding of the input data
    - `dataframe_split` field with pandas DataFrames in the split orientation. For example: 
    
      `data = {"dataframe_split": pandas_df.to_dict(orient='split')`

    - `dataframe_records` field with pandas DataFrame in the records orientation. For example: 
    
       `data = {"dataframe_records": pandas_df.to_dict(orient='records')`
       
       *We do not recommend using this format because it is not guaranteed to preserve column ordering.*
       
    - `instances` field with tensor input formatted as described in [TF Serving’s API docs](https://www.tensorflow.org/tfx/serving/api_rest#request_format_2) where the provided inputs will be cast to Numpy arrays.
    - `inputs` field with tensor input formatted as described in [TF Serving’s API docs](https://www.tensorflow.org/tfx/serving/api_rest#request_format_2) where the provided inputs will be cast to Numpy arrays.

> NOTE:
> Since JSON loses type information, MLflow will cast the JSON input to the input type specified in the model’s schema if available. If your model is sensitive to input types, it is recommended that a schema is provided for the model to ensure that type mismatch errors do not occur at inference time. In particular, DL models are typically strict about input types and will need model schema in order for the model to score correctly. For complex data types, see Encoding complex data below.

**Query the API with cURL:**

```bash
$ curl http://127.0.0.1:4242/version
2.1.1

$ curl http://127.0.0.1:4242/health
````

**Query the model with cURL:**

```bash
# split-oriented DataFrame input
$ curl http://127.0.0.1:4242/invocations -H 'Content-Type: application/json' -d '{
  "dataframe_split": {
      "columns": ["Culmen Length (mm)", "Culmen Depth (mm)"],
      "data": [[1, 3], [14, 120]]
  }
}'
{"predictions": ["Adelie", "Adelie"]}
$

# record-oriented DataFrame input (fine for vector rows, loses ordering for JSON records)
curl http://127.0.0.1:4242/invocations -H 'Content-Type: application/json' -d '{
  "dataframe_records": [
    {"Culmen Length (mm)": 1,"Culmen Depth (mm)": 3},
    {"Culmen Length (mm)": 14,"Culmen Depth (mm)": 120}
  ]
}'
{"predictions": ["Adelie", "Adelie"]}
$

Or we can **call the API directly from python:**

In [5]:
import requests
import json

scoring_uri = "http://127.0.0.1:4242/invocations"

# `sample_input` is a JSON-serialized pandas DataFrame with the `split` orientation
sample_input = {  
    "dataframe_split": {
      "columns": ["Culmen Length (mm)", "Culmen Depth (mm)"],
      "data": [[1, 3], [14, 120]]
    }
}
response = requests.post(
              url=scoring_uri, data=json.dumps(sample_input),
              headers={"Content-type": "application/json"})

print(response.status_code)
#print(response.text)
response_json = json.loads(response.text)
print(response_json)

# 200
# {'predictions': ['Adelie', 'Adelie']}

200
{'predictions': ['Adelie', 'Adelie']}


In [6]:
import requests
import json

scoring_uri = "http://127.0.0.1:4242/invocations"

# `sample_input` is a record-oriented DataFrame input (fine for vector rows, loses ordering for JSON records)
sample_input = {  
    "dataframe_records": [
      {"Culmen Length (mm)": 1,"Culmen Depth (mm)": 3},
      {"Culmen Length (mm)": 14,"Culmen Depth (mm)": 120}
    ]
}
response = requests.post(
              url=scoring_uri, data=json.dumps(sample_input),
              headers={"Content-type": "application/json"})

print(response.status_code)
#print(response.text)
response_json = json.loads(response.text)
print(response_json)

# 200
# {'predictions': ['Adelie', 'Adelie']}

200
{'predictions': ['Adelie', 'Adelie']}


**Serving with MLServer**

Python models can be deployed using [Seldon’s MLServer](https://mlserver.readthedocs.io/en/latest/) as alternative inference server. 

MLServer is integrated with two leading open source model deployment tools: 

- [Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/graph/protocols.html#v2-kfserving-protocol) and 
- [KServe (formerly known as KFServing)](https://kserve.github.io/website/modelserving/v1beta1/sklearn/v2/), 

and can be used to test and deploy models using these frameworks. **This is especially powerful when building docker images since the docker image built with MLServer can be deployed directly with both of these frameworks.**

MLServer exposes the same scoring API through the `/invocations` endpoint. In addition, it supports the standard V2 Inference Protocol.

> **Note:**
>
> To use MLServer with MLflow, please install mlflow as:
>
> ```bash
> $ pip install mlflow[extras]
> ```

To serve a MLflow model using MLServer, you can use the --enable-mlserver flag, such as:

```bash
(mlflow)$ mlflow models serve -m my_model --enable-mlserver
```

Similarly, to build a Docker image built with MLServer you can use the `--enable-mlserver` flag, such as:

```bash
(mlflow)$ mlflow models build -m my_model --enable-mlserver -n my-model
```

To read more about the integration between MLflow and MLServer, please check the [end-to-end example in the MLServer documentation](https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html) or visit the [MLServer docs](https://mlserver.readthedocs.io/en/latest/).

## 2.6. Other deployment targets

- **AzureML**: [Deploy a python_function model on Microsoft Azure ML](https://mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-microsoft-azure-ml) 
- **Sagemaker**: [Deploy a python_function model on Amazon SageMaker](https://mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-amazon-sagemaker)
- Kubernetes
- ...

## 2.7. Transition a models stages

Over the course of the model’s lifecycle, a model evolves—from development to staging to production. 

You can transition a registered model to one of the stages: **Staging, Production or Archived.**

In [None]:
client = MlflowClient()

client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Production"
)

<img src='../img/mlflow_ui_pinguins_model_version_1_promoted_to_production.png' alt='' width='1000'>

<img src='../img/mlflow_ui_pinguins_model_version_1_promoted_to_production_2.png' alt='' width='1000'>

<img src='../img/mlflow_ui_pinguins_model_version_1_promoted_to_production_3.png' alt='' width='1000'>