In [13]:
# author: Milan Mulji 

In [8]:
# Wine Quality example

import os
import warnings
import sys
import glob
import shutil

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow              as mf
import mlflow.sklearn

# ML Flow Models API 

## Introduction

In this section we take a look at the Models component of MLflow and then deploy a sample model number using a number of different mechanisms.

## ML Flow Models

MLFlow Models are:

- Cross library format to package machine learning models
- Model converter from multiple input types to multiple output types
- Deployment to Rest endpoint or Apache Spark, AWS Sagemaker, Azure ML


## Supported Models 

ML Flow models supports the following model types:

- Custom models
- Python Function (python_function)
- R Function (crate)
- H2O (h2o)
- Keras (keras)
- MLeap (mleap)
- PyTorch (pytorch)
- Scikit-learn (sklearn)
- Spark MLlib (spark)
- TensorFlow (tensorflow)
- ONNX (onnx)

## What does the ML Flow model look like?

Each model run, remember the tracking id's, contains an **artifacts/model** directory. Should be something like this ...

```sh
│   └── mlruns
│       └── 1
│           ├── be96e970153640ee92907f30aef92374
│           │   └── artifacts
│           │       └── model
│           │           ├── MLmodel
│           │           ├── conda.yaml
│           │           └── model.pkl
```


That directory contains 3 files:

- conda.yaml file
- MLmodel file
- model.pkl file


### Conda.yaml File

The conda file contains the required dependencies to run this particular model.

```yaml
channels:
- defaults
dependencies:
- python=3.7.4
- scikit-learn=0.21.3
- pip:
  - mlflow
  - cloudpickle==1.2.2
name: mlflow-env
```

### MLmodel File

The model contains all the metadata regarding the model in a human readable format. Note the it also records the **run id** where this model was generated.

```yaml
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.7.4
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.21.3
run_id: be96e970153640ee92907f30aef92374
utc_time_created: '2019-11-26 12:00:39.822403'

```

The above example has saves two flavours of our model: **python_function** and **sklearn**

### model.pkl file

The model.pkl is a pickle serialised Python representation of the model.


## Command Line API

The command line API supports the use and deployment of an MLFlow model to a number of different targets.  
To deploy models locally:

```sh
mlflow models build-docker  	# **EXPERIMENTAL**: Builds a Docker image whose default...
mlflow models predict       	# Generate predictions in json format using a saved MLflow...
mlflow models serve         	# Serve a model saved with MLflow by launching a webserver on...
```

To deploy models remotely:
```sh
mlflow azureml      # Serve models on Azure ML
mlflow sagemaker    # Serve models on SageMaker
```

## Hands On: Let's do some Model Training ...

In the example below we will train a model on wine quality.

In [11]:
# We start off by telling MLFlow that we want to track this experiment
mf.set_tracking_uri("http://127.0.0.1:5000")

In [12]:
# We give the experiment a descriptive name 
experiment_id = mf.set_experiment("Models - Wine Quality")

INFO: 'Models - Wine Quality' does not exist. Creating a new experiment


In [13]:
warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file from the URL
csv_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
try:
    data = pd.read_csv(csv_url, sep=';')
    print ("Sample Wine Quality data\n\n")
    print (data)
except Exception as e:
    logger.exception(
        "Unable to download training & test CSV, check your internet connection. Error: %s", e)


Sample Wine Quality data


      fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.4             0.700         0.00             1.9      0.076   
1               7.8             0.880         0.00             2.6      0.098   
2               7.8             0.760         0.04             2.3      0.092   
3              11.2             0.280         0.56             1.9      0.075   
4               7.4             0.700         0.00             1.9      0.076   
...             ...               ...          ...             ...        ...   
1594            6.2             0.600         0.08             2.0      0.090   
1595            5.9             0.550         0.10             2.2      0.062   
1596            6.3             0.510         0.13             2.3      0.076   
1597            5.9             0.645         0.12             2.0      0.075   
1598            6.0             0.310         0.47             3.6      0.067   



In [14]:
# Function to calculate our model quality metrics

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

In [15]:
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x  = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y  = test[["quality"]]

alpha    =  0.5
l1_ratio =  0.5

with mlflow.start_run():
    
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("\nElasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.sklearn.log_model(lr, "model")


Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614


## Hands On: Deploy your MLFlow model locally

In this hands-on demo you will locally deploy an mlflow model to a REST endpoint and test it by invoking it using the Curl utility.

For those of you who don't have Curl, please install a copy from here: https://curl.haxx.se/
Alternatively you can use Postman or a similar tool to send HTTP requests.


In [17]:
# First find the run-id of the model that we want to serve

# Get a list of all run-ids in the mlruns directory
runIds = glob.glob("./mlruns/*/*")

# Print out what we've found
print("All runId's:")
print(runIds)

# Extract the first, plus rest directories
head, *tail = runIds
selected_run_id = os.path.basename(head)

# Print out what we've found
print("\n")
print("First run directory is : " + head)
print("Run Id                 :  " + selected_run_id)

All runId's:
['./mlruns/1/bf376c8115904525ac9a0b92d89b28ea', './mlruns/1/be96e970153640ee92907f30aef92374', './mlruns/1/e3ad359f9d2b408da08c3ef0c770464d', './mlruns/2/8afdf33e64e24f7cb8e6e5fbaf503042']


First run directory is : ./mlruns/1/bf376c8115904525ac9a0b92d89b28ea
Run Id                 :  bf376c8115904525ac9a0b92d89b28ea


In [None]:
# Substitute the run-id into the serve instruction
# !mlflow models serve -m "runs://1//e3ad359f9d2b408da08c3ef0c770464d//model" -p 1234
! mlflow models serve -m runs:/my-run-id/model-path &


Remember the columns from our Dataset:

```sh
fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.4             0.700         0.00             1.9      0.076   
 
      free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                    11.0                  34.0  0.99780  3.51       0.56   

      alcohol  quality
0         9.4        5  
```

We can now call the deployed rest api. Here we define which columns the data will come into the API and then the 2 sets of data to perform inference on.

```sh
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"columns": ["a", "b", "c"], "data": [[1, 2, 3], [4, 5, 6]]}'
```

In [20]:

! curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"columns": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality"], "data": [[7.4, 0.700, 0.00, 1.9, 0.076, 11.0, 34.0, 0.99780, 3.51, 0.56, 9.4, 5]]}'

curl: (7) Failed to connect to 127.0.0.1 port 5000: Connection refused


## Deploying to AWS Sagemaker

```sh
Usage: mlflow sagemaker [OPTIONS] COMMAND [ARGS]...

  Serve models on SageMaker.

  To serve a model associated with a run on a tracking server, set the
  MLFLOW_TRACKING_URI environment variable to the URL of the desired server.

Options:
  --help  Show this message and exit.

Commands:
  build-and-push-container  Build new MLflow Sagemaker image, assign it a...
  delete                    Delete the specified application.
  deploy                    Deploy model on Sagemaker as a REST API...
  run-local                 Serve model locally running in a...
```

In [None]:
# ! mlflow sagemaker deploy -m my_model --app-name [other options]