[![Presented by Aim2](aim2.png)](https://www.youtube.com/watch?v=p4z2FDSfZb4)

# The Machine Learning Workflow
![MLworkflow](ml_workflow.png)

# Managing the complex ML lifecycle with
![MLFLow](MLFlow-logo-final-black-50.png)
## Three components: Tracking, Projects, Models

## Tracking

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. MLflow Tracking lets you log and query experiments using both Python and REST APIs.

By default, wherever you run your program, the tracking API writes data into files into an mlruns directory. You can then run MLflow’s Tracking UI:

*mlflow ui*

and view it at http://localhost:5000

Alternatively, you can configure MLflow to log runs to a remote server to manage your results centrally or share them across a team.

The MLflow Tracking API lets you log metrics and artifacts (files) from your data science code and see a history of your runs. 

In [1]:
import os
from mlflow import log_metric, log_param, log_artifact
import random
import mlflow

In [2]:
with mlflow.start_run():
    # Log a parameter (key-value pair)
    log_param("param1", 5)

    # Log a metric; metrics can be updated throughout the run
    log_metric("foo", 1)
    log_metric("foo", 2)
    log_metric("foo", 3)


    # Log an artifact (output file)
    with open("output.txt", "w") as f:
        f.write("Hello world!")
    log_artifact("output.txt")

In [3]:
def mock_loss(min_x, max_x):
    for x in range(min_x, max_x):
        y = 1/x
        random_factor = 0.2
        yield y*random.uniform(1 - random_factor, 1 + random_factor)

with mlflow.start_run():
    #log the value of a function (e.g. a loss function)
    num_steps = 10
    mlflow.log_param("num_steps",num_steps)
    for y in mock_loss(1,num_steps):
        log_metric("loss_function", y)

### Scikit-learn

In [4]:
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow.sklearn

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

np.random.seed(40)

# Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
wine_path = "/Users/luca/mlops/sklearn_elasticnet_wine/wine-quality.csv"
data = pd.read_csv(wine_path)

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

def train_wine_regression(alpha, l1_ratio,exp_id):
    mlflow.set_experiment(exp_id)
    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, "model")

In [5]:
train_wine_regression(alpha=0.1, l1_ratio=1.9, exp_id="wine")

INFO: 'wine' does not exist. Creating a new experiment
Elasticnet model (alpha=0.100000, l1_ratio=1.900000):
  RMSE: 0.8008614421065388
  MAE: 0.6215339293635124
  R2: 0.17161039226431196


In [6]:
train_wine_regression(alpha=0.2, l1_ratio=1.77, exp_id="wine")

Elasticnet model (alpha=0.200000, l1_ratio=1.770000):
  RMSE: 0.8442722944179337
  MAE: 0.638440439200057
  R2: 0.07937037119958534


### Log on remote server
It's possible to log to a remote server, setting the server URI as:

*mlflow.set_tracking_uri(URI)*

And an mlflow server can be simply started via:

*mlflow server 
    --file-store /mnt/persistent-disk 
    --default-artifact-root s3://my-mlflow-bucket/ 
    --host 0.0.0.0*
    
The file store (exposed as --file-store) is where the server stores run and experiment metadata. It defaults to the local ./mlruns directory (the same as when running mlflow run locally), but when running a server, make sure that this points to a persistent (that is, non-ephemeral) file system location.

The artifact store is a location suitable for large data (such as an S3 bucket or shared NFS file system) and is where clients log their artifact output (for example, models). 

## Projects
MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file to specify its dependencies and how to run the code. For example, projects can contain a conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version executed (for example, Git commit) and any parameters. 
### MLProject
name: tutorial

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: float
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"

### Conda yaml
name: tutorial
channels:
  - defaults
dependencies:
  - numpy=1.14.3
  - pandas=0.22.0
  - scikit-learn=0.19.1
  - pip:
    - mlflow
    
*mlflow run sklearn_elasticnet_wine/ -P alpha=0.42 -P l1_ratio=0.2* --experiment-id xxx

## Models
MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. 

mlflow pyfunc serve -m /Users/luca/mlops/mlruns/1/a88084eb6dbd418bbe6aa55167089274/artifacts/model -p 1234

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations