## MLFlow Scikit-Learn Model Deployment via Web Endpoint 
> - Tested on macOS Monterey version 12.1 Macbook Pro, 2.2 GHz Quad-Core Intel Core i7, Memory 16GB DDR3
> - Please go to https://github.com/maximuslee1226/mlflow for notebooks and artifacts

## MLFlow Models

### Introduction

In this notebook, we will look at the MLFlow Models component and deploy a model in various ways

### MLFlow Models

MLFlow Models are:

- Cross library format to package machine learning models
- Model converter from multiple input types to multiple output types
- Deployment to Rest endpoint or Apache Spark, AWS Sagemaker, Azure ML

### MLFLow Model Types

ML Flow models supports the following model types:

- H2O (h2o)
- Scikit-learn (sklearn)
- Spark MLlib (spark)
- TensorFlow (tensorflow)
- Keras (keras)
- PyTorch (pytorch)
- ONNX (onnx)
- Python Function (python_function)
- R Function (crate)
- MLeap (mleap)
- Custom models


### MLFlow model format

Under model run number directory, there are runIds directories that contains model artifacts under artifacts/model directory.

```sh
   └── artifact_store
       └── 14
           ├── 000db1149e0e49039ed3f44080dbab74
           │   └── artifacts
           │       └── model
           │           ├── MLmodel
           │           ├── conda.yaml
           │           └── model.pkl
```

The final model directory contains three files:

> - conda.yaml file
> - model.pkl file
> - MLmodel file

### conda.yaml file

The conda.yaml file contains information for the environment you want to build including environment name, channels, dependencies

```yaml
lname: mlflowtest
channels: 
  - defaults
  - pytorch
  - conda-forge
dependencies:
  - python=3.7
  - numpy=1.17.3
  - pandas=0.25.3
  - jupyterlab=1.0.10 
  - scikit-learn=0.21.3
  - matplotlib=3.1.2
  - mlflow=1.4.0
  = torch=1.3.1
  - pytorch-cpu=1.1.0
  - torchvision=0.4.2
  - xgboost==0.90
```
### model.pkl file

The model.pkl is a serialized python model format.


### MLmodel File

The model contains all the metadata of the model in a human readable format. It also records the run id where this model was created.

```yaml
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.7
  sklearn:
    pickled_model: model.pkl
    serialization_format: pickle
    sklearn_version: 0.21.3
run_id: 000db1149e0e49039ed3f44080dbab74
utc_time_created: '2021-12-26 1:03:48.337733'

```

### Command Line API

The command line API deploys MLFlow model to several deployable targets
MLFlow Model deployment:

> - mlflow models build-docker: builds Docker images
> - mlflow models predict: creates predictions in json format using MLflow model
> - mlflow models serve: serves MLFlow models via REST web endpoint


### Wine Quality Model Training

> - Train ML model on wine quality

In [1]:
import os
import sys
import glob
import shutil

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

In [2]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

import mlflow              as mf
import mlflow.sklearn

In [3]:
mf.set_tracking_uri("http://127.0.0.1:5000")

In [4]:
experiment_id = mf.set_experiment("Wine Quality Models")

In [5]:
np.random.seed(40)

csv_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
try:
    data = pd.read_csv(csv_url, sep=';')
    print ("Wine Quality data\n\n")
    print (data)
except Exception as E:
    logger.exception(
        "ERROR: Download training & test data failed %s", E)

Wine Quality data


      fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.4             0.700         0.00             1.9      0.076   
1               7.8             0.880         0.00             2.6      0.098   
2               7.8             0.760         0.04             2.3      0.092   
3              11.2             0.280         0.56             1.9      0.075   
4               7.4             0.700         0.00             1.9      0.076   
...             ...               ...          ...             ...        ...   
1594            6.2             0.600         0.08             2.0      0.090   
1595            5.9             0.550         0.10             2.2      0.062   
1596            6.3             0.510         0.13             2.3      0.076   
1597            5.9             0.645         0.12             2.0      0.075   
1598            6.0             0.310         0.47             3.6      0.067   

      f

In [6]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

In [7]:
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x  = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y  = test[["quality"]]

alpha    =  0.5
l1_ratio =  0.5

with mlflow.start_run():
    
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("ElasticNet model (alpha=%f, l1_ratio=%f):"%(alpha, l1_ratio))
    print("RMSE: %s" % rmse)
    print("MAE: %s" % mae)
    print("R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.sklearn.log_model(lr, "model")


Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319587
  R2: 0.10862644997792614


## Deploy MLFlow model in local mode

> - Let's deploy a mlflow model to a local REST endpoint and feed the real data for inference using POSTMAN

In [None]:
!mlflow models serve --no-conda -m "/Users/brandonl/projects/mlflow/fastcamp/artifact_store/14/fd845c569ffe4f90ab4baa533cbc9520/artifacts/model" -p 1234

2022/01/16 14:45:16 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2022/01/16 14:45:16 INFO mlflow.pyfunc.backend: === Running command 'gunicorn --timeout=60 -b 127.0.0.1:1234 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2022-01-16 14:45:17 -0800] [71076] [INFO] Starting gunicorn 20.1.0
[2022-01-16 14:45:17 -0800] [71076] [INFO] Listening at: http://127.0.0.1:1234 (71076)
[2022-01-16 14:45:17 -0800] [71076] [INFO] Using worker: sync
[2022-01-16 14:45:17 -0800] [71079] [INFO] Booting worker with pid: 71079
[2022-01-16 14:46:44 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:71079)
[2022-01-16 14:46:44 -0800] [71079] [INFO] Worker exiting (pid: 71079)
[2022-01-16 14:46:44 -0800] [71087] [INFO] Booting worker with pid: 71087
[2022-01-16 18:59:56 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:71087)
[2022-01-16 18:59:56 -0800] [71087] [INFO] Worker exiting (pid: 71087)
[2022-01-16 18:59:58 -0800] [74233] [INFO] Booting worker with pid: 74233
[202

[2022-01-17 04:27:47 -0800] [77124] [INFO] Booting worker with pid: 77124
[2022-01-17 04:41:02 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:77124)
[2022-01-17 04:41:02 -0800] [77124] [INFO] Worker exiting (pid: 77124)
[2022-01-17 04:41:03 -0800] [77185] [INFO] Booting worker with pid: 77185
[2022-01-17 05:50:41 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:77185)
[2022-01-17 05:50:41 -0800] [77185] [INFO] Worker exiting (pid: 77185)
[2022-01-17 05:50:42 -0800] [77226] [INFO] Booting worker with pid: 77226
[2022-01-17 06:00:16 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:77226)
[2022-01-17 06:00:17 -0800] [77226] [INFO] Worker exiting (pid: 77226)
[2022-01-17 06:00:17 -0800] [77304] [INFO] Booting worker with pid: 77304
[2022-01-17 07:00:01 -0800] [71076] [CRITICAL] WORKER TIMEOUT (pid:77304)
[2022-01-17 07:00:01 -0800] [77304] [INFO] Worker exiting (pid: 77304)
[2022-01-17 07:00:02 -0800] [78002] [INFO] Booting worker with pid: 78002
[2022-01-17 07:21:54 -0800] [71076] [CRITICAL] WOR

### Curl command
curl http://127.0.0.1:1234/invocations -H 'Content-Type: application/json' -d '{"columns": ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"], "data": [[8.0, 0.164, 0.03, 2.9, 0.162, 12.0, 54.0, 0.97660, 4.36, 0.32, 8.9]]}'