# **📘Model registration and versioning with MLFlow**

### **Description**
This lab introduces **MLflow**, an open-source platform that helps manage the **machine learning lifecycle**.  

You will train a **Wine Quality Prediction Model** using **ElasticNet Regression** and use MLflow to:
- Track different experiments
- Log hyperparameters and evaluation metrics
- Register and version models
- Visualize experiment results in the MLflow UI

### **Scenario: Why Do We Need MLflow?**

Imagine you are a **Data Scientist** working on a project to **predict the quality of wine** based on its **chemical properties**.

To build your model, you decide to use **ElasticNet Regression** from scikit-learn. You experiment with different **hyperparameters** such as:
- `alpha`
- `l1_ratio`

After multiple experiments, you realize:
- Some models perform better than others.
- Keeping track of model parameters and results manually is difficult.
- Comparing different runs is time-consuming.
- Sharing and reproducing experiments is challenging.

#### **The Challenge**
How do you:
- Keep track of different models and experiments?
- Store and retrieve model versions?
- Ensure reproducibility for future use?
- Easily deploy the best model?

#### **Why This Lab?**
Machine learning projects often involve experimenting with multiple models and hyperparameters. Tracking these experiments manually can be **challenging** and **error-prone**.

With **MLflow**, you can:
- Organize and compare multiple experiments
- Keep track of different model versions
- Ensure reproducibility for better collaboration
- Deploy models more efficiently

By the end of this lab, you will have hands-on experience using MLflow to track, log, and manage machine learning experiments effectively.

#### **📝 Step 1: Installing Required Libraries**  

**What does this code do?**  
Before running our experiment, we need to install the required Python libraries to ensure we have all the dependencies needed for model training and experiment tracking.

This step will:  
- Install core machine learning libraries (`numpy`, `scipy`, `scikit-learn`, `joblib`)  
- Install **MLflow** for experiment tracking and model logging  
- Install `pandas` for data manipulation  
- Upgrade `pip`, `setuptools`, and `wheel` for smooth package management  

**Running the code** 🔽  


In [None]:
# Step 1: Install required libraries (Run only if they are not installed)
!pip install --no-cache-dir numpy==1.26.4 scipy==1.11.3 scikit-learn==1.4.2 joblib==1.3.2 mlflow pandas setuptools wheel pip --upgrade


Collecting scipy==1.11.3
  Downloading scipy-1.11.3-cp312-cp312-win_amd64.whl.metadata (60 kB)
Collecting pandas
  Downloading pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting setuptools
  Downloading setuptools-75.8.0-py3-none-any.whl.metadata (6.7 kB)
Collecting wheel
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading scipy-1.11.3-cp312-cp312-win_amd64.whl (43.7 MB)
   ---------------------------------------- 0.0/43.7 MB ? eta -:--:--
   ---------------------------------------- 0.3/43.7 MB ? eta -:--:--
    --------------------------------------- 1.0/43.7 MB 3.1 MB/s eta 0:00:14
   - -------------------------------------- 1.8/43.7 MB 3.6 MB/s eta 0:00:12
   -- ------------------------------------- 2.9/43.7 MB 3.7 MB/s eta 0:00:11
   --- ------------------------------------ 3.9/43.7 MB 4.0 MB/s eta 0:00:10
   ---- ----------------------------------- 4.7/43.7 MB 4.0 MB/s 

ERROR: To modify pip, please run the following command:
C:\Users\Supriya\anaconda3\python.exe -m pip install --no-cache-dir numpy==1.26.4 scipy==1.11.3 scikit-learn==1.4.2 joblib==1.3.2 mlflow pandas setuptools wheel pip --upgrade


#### **📝 Step 2: Importing Required Libraries**  

**What does this code do?**  
This step imports all the necessary Python libraries required for data handling, model training, evaluation, and MLflow experiment tracking.

This step includes:
- **OS and System Libraries**: Handles system operations and warnings.
- **Data Processing**: `pandas` and `numpy` for data handling and numerical computations.
- **Machine Learning**: `scikit-learn` for model training and evaluation.
- **MLflow**: Helps with logging, tracking, and managing experiments.

**Running the code** 🔽  


In [None]:
# Step 2: Import necessary libraries
import os
import warnings
import logging
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn


#### **📝 Step 3: Setting Up Logging and Evaluation Functions**  

**What does this code do?**  
This step sets up a **logging system** to track errors and define an **evaluation function** to measure the performance of the machine learning model.

This step includes:
- **Logging Setup**: Helps track errors and warnings during execution.
- **Evaluation Function**: Computes model performance metrics (RMSE, MAE, R² Score).

**Running the code** 🔽  

In [None]:
# Step 3: Set up logging and evaluation function
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

def eval_metrics(actual, pred):
    """Function to calculate RMSE, MAE, and R² score."""
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


#### **📝 Step 4: Loading the Dataset**  

**What does this code do?**  
This step loads the **Wine Quality Dataset** from the **UCI Machine Learning Repository**.  
The dataset contains **chemical properties of red wine** and their respective **quality ratings (3-9)**.  

This step includes:
- **Suppressing warnings** to keep the output clean.
- **Setting a random seed** to ensure reproducibility.
- **Downloading the dataset** from an online URL.
- **Handling errors** in case the dataset cannot be loaded.

**Running the code** 🔽  

In [None]:
# Step 4: Load dataset from UCI repository
warnings.filterwarnings("ignore")
np.random.seed(40)

# Dataset URL
csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"

# Download the dataset
try:
    data = pd.read_csv(csv_url, sep=";")
except Exception as e:
    logger.exception(
        "Unable to download the dataset. Check your internet connection. Error: %s", e
    )


#### **📝 Step 5: Splitting Data into Training and Test Sets**  

**What does this code do?**  
This step **splits the dataset** into training and test sets.  
The **training set (75%)** is used to train the model, while the **test set (25%)** is used to evaluate model performance.  

This step includes:
- **Splitting the dataset** into training and test sets.
- **Separating the features (`X`) and target variable (`y`)**.

**Running the code** 🔽  


In [None]:
# Step 5: Split dataset into training and test sets
train, test = train_test_split(data, test_size=0.25, random_state=42)

# Features (X) and target variable (y)
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]


#### **📝 Step 6: Training the Model and Logging with MLflow**  

**What does this code do?**  
This step **trains an ElasticNet regression model** and logs the model's **parameters and performance metrics** using **MLflow**.  

This step includes:
- **Training the ElasticNet model** using the training data.
- **Making predictions** on the test data.
- **Evaluating the model** using RMSE, MAE, and R² score.
- **Logging the experiment details** in MLflow.

**Running the code** 🔽  


In [None]:
# Step 6: Train an ElasticNet regression model and log metrics in MLflow
alpha = 0.5
l1_ratio = 0.5

with mlflow.start_run():

    # Train the ElasticNet Model
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    # Make Predictions
    predicted_qualities = lr.predict(test_x)

    # Evaluate Model
    rmse, mae, r2 = eval_metrics(test_y, predicted_qualities)

    print(f"ElasticNet model (alpha={alpha}, l1_ratio={l1_ratio}):")
    print(f"  RMSE: {rmse}")
    print(f"  MAE: {mae}")
    print(f"  R2: {r2}")

    # Log Parameters & Metrics in MLflow
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)


ElasticNet model (alpha=0.5, l1_ratio=0.5):
  RMSE: 0.7436470916334205
  MAE: 0.6042761768399746
  R2: 0.10601910075094545


#### **📝 Step 7: Registering the Trained Model in MLflow**  

**What does this code do?**  
This step **registers the trained model in MLflow** so it can be stored, versioned, and retrieved for future use.

This step includes:
- **Checking MLflow tracking URI** to determine storage type.
- **Logging and registering the model** in MLflow.

**Running the code** 🔽  


In [None]:
# Step 7: Register the trained model in MLflow
tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

if tracking_url_type_store != "file":
    mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticNetWineModel")
else:
    mlflow.sklearn.log_model(lr, "model")
