# FOSSCOMM 21 <a href="https://pretalx.2021.fosscomm.gr/fosscomm-2021/talk/X9VPTZ/"> <img src="images/logo.png" alt="Header" style="width: 100px;"/> </a>

## Workshop - MLOps in practice w/ <a href="https://mlflow.org"> <img src="images/mlflow.png" alt="Header" style="width: 100px;"/> </a>

MLOps are becoming an essential component in order to automate your ML project lifecycle.
As machine learning models become part of real-world applications, is vital for engineers to shift from the research oriented approach to business and product needs.

The purpose of the workshop is to demonstrate MLFlow capabilities to machine learning and open source community. MLFlow is a powerful tool, that can integrate with most of the modern ML frameworks, while is adopted by many famous organizations, to leverage a machine learning lifecycle e.g. keep track of ML projects, log different models with numerous of parameters, register and deploy models to production.

The workshop will demonstrate MLFlow API covering the following topics:
- install MLFlow and walkthrough in MLFlow server
- develop a ml pipeline
- train/evaluate models
- use MLFlow to track parameters and log trained models and datasets
- deploy models and serve them with the built-in MLFlow API
- consume the deployed models through its MLFlow built-in endpoint

## Install project dependencies

In [None]:
!pip install --user poetry
!poetry install

In [None]:
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

import mlflow
import mlflow.sklearn

from tools import eval_metrics, dump_pickled_data

In [None]:
# set tracking server uri
mlflow.set_tracking_uri("http://127.0.0.1:5000")

In [None]:
# Run in terminal
"""
mlflow server \
    --backend-store-uri sqlite:///imagine.sqlite \
    --default-artifact-root ./mlruns
"""

In [None]:
# set experiment name
mlflow.set_experiment("fosscomm_runs")

# Linear Regression

In [None]:
model_name="lr_model"
#model_name="rfr_model"

## ETL

In [None]:
data = load_iris()

In [None]:
classes = data["target_names"]
feature_names = data["feature_names"]
x = data["data"]
y = data["target"]

In [None]:
feature_names

In [None]:
classes

## Transform

In [None]:
scaler =  MinMaxScaler()
x_sc = scaler.fit_transform(x)

## Split dataset

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x_sc, y, test_size=0.2, random_state=0)

In [None]:
train = {'x':x_train, 'y':y_train}
test = {'x':x_test, 'y':y_test}

In [None]:
dump_pickled_data('data/train_dataset', train)
dump_pickled_data('data/test_dataset', test)

## 1. Regular Train Model

In [None]:
params = {"C": 1.0, "random_state": 42}

In [None]:
regression_model = LogisticRegression(**params).fit(x_train, y_train)

In [None]:
train_accuracy = regression_model.score(x_train, y_train)
train_accuracy

In [None]:
test_accuracy = regression_model.score(x_test, y_test)
test_accuracy

In [None]:
y_preds = regression_model.predict(x_test)
y_preds

In [None]:
acc, f1 = eval_metrics(y_test, y_preds)

## 2. MLflow training

What do we track ?

- Experiments: experiment names, run names
- Parameters: (hyper)parameters inputs of code/model
- Metrics: numeric values accuracy, f1, loss, etc (updated over time)
- Artifacts: files, data, logs and models
- Configuration: deployment environment yaml, dependency libraries
- Version: Code version, Model version, model stage 
- Tags & Notes: Auxiliary information and description about a run


### MLflow Client

In [None]:
from mlflow.tracking import MlflowClient

# create an mlflow client
client = MlflowClient()

### Start mlflow run

In [None]:
with mlflow.start_run(run_name=model_name) as train_run:
       
    regression_model = LogisticRegression().fit(x_train, y_train)
    
    # predictions
    y_preds = regression_model.predict(x_test)       
    
    acc, f1 = eval_metrics(y_test, y_preds)    
    
    # mlflow logs
    mlflow.log_params(params)
    mlflow.log_artifact("data/train_dataset", artifact_path="datasets")
    mlflow.log_artifact("data/test_dataset", artifact_path="datasets")
    
    mlflow.sklearn.log_model(
        sk_model=regression_model,
        artifact_path="model",
        registered_model_name=model_name
    )
    
    mlflow.log_metrics({"acc":acc, "f1":f1})
    
    # load the latest model version
    for mv in client.get_latest_versions(model_name, ["None"]):
        model_version = mv.version
        
    # transition model to production
    client.transition_model_version_stage(
        name=model_name, version=model_version, stage="Production", archive_existing_versions=True
    )

## 3. Deploy the model

### Model server

> mlflow models serve -m "models:/lr_model/Production" -p 5001 --no-conda

> curl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[[0.21666667, 0.53333333, 0.69491525, 0.0095833333]]'

In [None]:
import json
import requests

In [None]:
headers = {"Content-Type": "application/json; format=pandas-records"}
base_url = "http://127.0.0.1:5001/invocations"

In [None]:
data = [[0.21666667, 0.53333333, 0.69491525, 0.95833333], [0.21666667, 0.53333333, 0.69491525, 0.0095833333]]
data = json.dumps(data)

### Send a request to MLflow API

In [None]:
response = requests.post(base_url, data=data, headers=headers)
prediction = response.json()
print(f"Flower:{classes[prediction]}")

### Local Inference

In [None]:
prediction = regression_model.predict([[0.21666667, 0.53333333, 0.69491525, 0.95833333], [0.21666667, 0.53333333, 0.69491525, 0.0095833333]])
print(f"Flower:{classes[prediction]}")