# Wine Quality Prediction

### The wine quailty dataset is used to predict the wine quality when given the following features
1. fixed acidity	
2. volatile acidity	
3. citric acid	
4. residual sugar	
5. chlorides	
6. free sulfur dioxide	
7. total sulfur dioxide	
8. density	
9. pH	
10. sulphates	
11. alcohol

### We will be using the ElasticNet model of sklearn to model this regression problem. 
Let's import the relevant libraries.

In [None]:
import os
import warnings
import sys
import argparse

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

The evaluation metric calculations
1. rmse: root mean squared error
2. mae : mean absolute error
3. r2  : r2 score

In [17]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

In [18]:
np.random.seed(40)

In [19]:
# alpha=0.5
parser = argparse.ArgumentParser()
parser.add_argument("--alpha")
parser.add_argument("--l1-ratio")
args = parser.parse_args("")

Read the wine-quality csv file (make sure you're running this from the root of MLflow!)

In [22]:
wine_path = "wine-quality.csv"
data = pd.read_csv(wine_path)

In [23]:
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [6]:
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

In [7]:
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

### Creating an experiment for this projects
#### Run the below cell only on the first instance

In [27]:
EXPERIMENT_NAME = "mlflow-demo-winequality"
EXPERIMENT_ID = mlflow.create_experiment(EXPERIMENT_NAME)

In [10]:
print(EXPERIMENT_ID)

1


Change the alpha and l1 ratio values as required

In [30]:
alpha = float(0.6)
l1_ratio = float(0.5)

In [31]:
with mlflow.start_run(experiment_id=EXPERIMENT_ID, run_name=2) as run:
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    mlflow.sklearn.log_model(lr, "model")

Elasticnet model (alpha=0.600000, l1_ratio=0.500000):
  RMSE: 0.8346426818559769
  MAE: 0.6333262824569528
  R2: 0.10025166430470633


In [None]:
Run the below code on the terminal for the UI

In [13]:
# !mlflow ui --host <port>
