# MLflow Training Tutorial

This `train.pynb` Jupyter notebook predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

> This is the Jupyter notebook version of the `train.py` example

Attribution
* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.


In [1]:
import os
import warnings
import sys

import pandas as pd
import numpy as np

import mlflow

In [2]:
from train import train

In [3]:
mlflow.set_tracking_uri('https://gid-mlflow.appspot.com')

In [4]:
train(10, 'mse', 5, 100)

RandomForest model (n_estimators=10, criterion=mse, max_depth=5, min_samples_leaf=100):
  RMSE: 0.7232394896332006
  MAE: 0.5718319310838061
  R2: 0.32440856599531964


In [5]:
train(100, 'mse', 5, 100)

RandomForest model (n_estimators=100, criterion=mse, max_depth=5, min_samples_leaf=100):
  RMSE: 0.7181980386392155
  MAE: 0.5688364033209303
  R2: 0.33379436494538484


In [6]:
train(100, 'mse', 10, 100)

RandomForest model (n_estimators=100, criterion=mse, max_depth=10, min_samples_leaf=100):
  RMSE: 0.7170892232150337
  MAE: 0.5670507642899372
  R2: 0.3358498672184913


In [7]:
train(100, 'mse', 5, 50)

RandomForest model (n_estimators=100, criterion=mse, max_depth=5, min_samples_leaf=50):
  RMSE: 0.7029933680179214
  MAE: 0.5592862556233429
  R2: 0.36170369820963666


In [8]:
train(100, 'mse', 5, 10)

RandomForest model (n_estimators=100, criterion=mse, max_depth=5, min_samples_leaf=10):
  RMSE: 0.6957291148440388
  MAE: 0.5519695817226459
  R2: 0.3748269783533582


In [9]:
train(100, 'mse', 5, 1)

RandomForest model (n_estimators=100, criterion=mse, max_depth=5, min_samples_leaf=1):
  RMSE: 0.6960255220067484
  MAE: 0.5534443871137903
  R2: 0.3742941697321329


In [10]:
train(100, 'mse', 10, 10)

RandomForest model (n_estimators=100, criterion=mse, max_depth=10, min_samples_leaf=10):
  RMSE: 0.6527604398337965
  MAE: 0.5035381511934496
  R2: 0.4496645041744419


In [11]:
train(100, 'mse', 20, 10)

RandomForest model (n_estimators=100, criterion=mse, max_depth=20, min_samples_leaf=10):
  RMSE: 0.6446068079000254
  MAE: 0.4941467459389938
  R2: 0.46332712152187105


In [12]:
train(100, 'mse', 15, 10)

RandomForest model (n_estimators=100, criterion=mse, max_depth=15, min_samples_leaf=10):
  RMSE: 0.644905830345854
  MAE: 0.4944867683904041
  R2: 0.4628290986771407


In [13]:
train(500, 'mse', 15, 10)

RandomForest model (n_estimators=500, criterion=mse, max_depth=15, min_samples_leaf=10):
  RMSE: 0.6435238039674076
  MAE: 0.494309263777565
  R2: 0.4651289348998978


In [14]:
loaded_model = mlflow.pyfunc.load_model('gs://gid-mlflow-artifacts/2/26a29ee08e934a4a8a2e87df96574414/artifacts/model_wine')

In [15]:
test_data = pd.read_csv('wine-quality.csv').sample(10)

In [19]:
pd.DataFrame({'real':test_data['quality'],
              'pred':loaded_model.predict(test_data.drop('quality',axis=1))})

Unnamed: 0,real,pred
4324,6,6.127025
319,6,5.540489
4489,6,6.76884
910,5,5.27169
3006,6,6.183674
818,5,5.128141
2011,5,5.279254
3998,6,5.875551
793,7,6.57535
1049,6,6.961622


In [17]:
test_data['quality']

4324    6
319     6
4489    6
910     5
3006    6
818     5
2011    5
3998    6
793     7
1049    6
Name: quality, dtype: int64