<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/mlflow/mlflow_scikit_learn_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Quickstart MLflow scikit-learn

Our goal is to deploy an end-to-end pipeline helping us to:

* Train a linear regression model
* Package the code that trains the model in a reusable and reproducible model format
* Deploy an inference endpoint

In [1]:
!pip install mlflow -q

[K     |████████████████████████████████| 16.0MB 14.7MB/s 
[K     |████████████████████████████████| 1.1MB 30.8MB/s 
[K     |████████████████████████████████| 6.0MB 49.3MB/s 
[K     |████████████████████████████████| 81kB 9.9MB/s 
[K     |████████████████████████████████| 153kB 50.4MB/s 
[K     |████████████████████████████████| 92kB 12.4MB/s 
[K     |████████████████████████████████| 460kB 50.9MB/s 
[K     |████████████████████████████████| 51kB 8.4MB/s 
[K     |████████████████████████████████| 81kB 11.2MB/s 
[K     |████████████████████████████████| 204kB 54.8MB/s 
[K     |████████████████████████████████| 71kB 10.7MB/s 
[?25h  Building wheel for alembic (setup.py) ... [?25l[?25hdone
  Building wheel for sqlalchemy (setup.py) ... [?25l[?25hdone
  Building wheel for prometheus-flask-exporter (setup.py) ... [?25l[?25hdone
  Building wheel for simplejson (setup.py) ... [?25l[?25hdone
  Building wheel for querystring-parser (setup.py) ... [?25l[?25hdone
  Building 

## Dataset

Predict wine's quality based on quantitative features like acidity, pH, residual sugar, etc.

In [0]:
CSV_URL = 'https://raw.githubusercontent.com/martin-fabbri/colab-notebooks/master/data/winequality_white.csv'

## Import dependencies

In [26]:
import os
import warnings
import sys
import pandas as pd
import numpy as np

import mlflow
import mlflow.sklearn

import sklearn
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import logging
logging.basicConfig(level=logging.WARN)
warnings.filterwarnings('ignore')
SEED = 40
rs = np.random.seed(SEED)

print('mlflow:', mlflow.__version__)
print('sklearn:', sklearn.__version__)
print('random seed:', SEED)

mlflow: 1.7.2
sklearn: 0.22.2.post1
random seed: 40


x


In [0]:
def eval_metrics(actual, pred):
  rmse = np.sqrt(mean_squared_error(actual, pred))
  mae = mean_absolute_error(actual, pred)
  r2 = r2_score(actual, pred)
  return rmse, mae, r2

Read the wine-quality csv file fromt the URL

In [17]:
wine = pd.read_csv(CSV_URL)
wine.head(3)

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,ph,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6


Split the data into training and test sets

In [24]:
train, test = train_test_split(wine)

train_x = train.drop('quality', axis=1)
test_x = test.drop('quality', axis=1)
train_y = train['quality']
test_y = test['quality']

print('train X shape:', train_x.shape)
print('train Y shape:', train_y.shape)
print('test X shape:', test_x.shape)
print('test Y shape:', test_y.shape)

train X shape: (3673, 11)
train Y shape: (3673,)
test X shape: (1225, 11)
test Y shape: (1225,)


## Hyperparameters

* Alpha
* l1 ratio

In [0]:
ALPHA = 0.5
L1_RATIO = 0.5

## Mlflow training

In [0]:
def mlflow_train(alpha, l1_ratio):
  with mlflow.start_run():
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=rs)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    rmse, mae, r2 = eval_metrics(test_y, predicted_qualities)
    print(f'Elasticnet model (alpha={ALPHA:.2}, l1_ratio={L1_RATIO:.2}):')
    print(f' - RMSE: {rmse:.5f}')
    print(f' - MAE:  {mae:.5f}')
    print(f' - R2:   {r2:.5f}')

    mlflow.log_param('alpha', alpha)
    mlflow.log_param('l1_ratio', l1_ratio)
    mlflow.log_metric('rmse', rmse)
    mlflow.log_metric('r2', r2)
    mlflow.log_metric('mae', mae)

    mlflow.sklearn.log_model(lr, 'model')

In [44]:
mlflow_train(0.5, 0.5)

Elasticnet model (alpha=0.5, l1_ratio=0.5):
 - RMSE: 0.82290
 - MAE:  0.62878
 - R2:   0.13003
