# AML Workshop Tutorial


### This tutorial uses a dataset to predict the quality of wine based on quantitative features like the wine’s “fixed acidity”, “pH”, “residual sugar”, and so on.
#### The dataset is from UCI’s machine learning repository. 1

#### This tutorial showcases how you can use MLflow end-to-end to:

* Train a linear regression model
* Package the code that trains the model in a reusable and reproducible model format
* Deploy the model into a simple HTTP server that will enable you to score predictions


##### This `train.pynb` Jupyter notebook predicts the quality of wine using [sklearn.linear_model.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).  

> This is the Jupyter notebook version of the `train.py` example


In [1]:
import os
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

np.random.seed(40)

#### Reading wine quality data to be used for training the model

In [2]:
# Read the wine-quality csv file from the URL
csv_url =\
    'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
try:
    data = pd.read_csv(csv_url, sep=';')
except Exception as e:
    print("Unable to download training & test CSV, check your internet connection. Error: %s", e)

##### Splitting data into training and test set

In [3]:
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

#### Training a Scikit learn liner regression model and registering it in model registry

In [5]:
# Enable autolog to log metrics, params, artifacts, models automatically
mlflow.autolog(log_input_examples=True)

alpha = 0.5
l1_ratio = 0.5

with mlflow.start_run() as run:
    # Execute ElasticNet
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    mlflow.register_model(
        "runs:/{}/model".format(run.info.run_id),
        "ElasticnetWineModel-notebook"
    )

2022/05/19 21:19:07 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2022/05/19 21:19:07 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.
2022/05/19 21:19:07 INFO mlflow.pyspark.ml: No SparkSession detected. Autologging will log pyspark.ml models contained in the default allowlist. To specify a custom allowlist, initialize a SparkSession prior to calling mlflow.pyspark.ml.autolog() and specify the path to your allowlist file via the spark.mlflow.pysparkml.autolog.logModelAllowlistFile conf.
2022/05/19 21:19:07 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.ml.
Registered model 'ElasticnetWineModel-notebook' already exists. Creating a new version of this model...
2022/05/19 21:19:15 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: ElasticnetWineModel-notebook, version 1
Created version '1' of model 'ElasticnetWineModel