# MLFlow
- it is a platform for machine learning cycle
- deploy the model
- track the model
- version the model
- register the model


In [1]:
!pip install mlflow



In [2]:
!mlflow

Usage: mlflow [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  artifacts    Upload, list, and download artifacts from an MLflow...
  db           Commands for managing an MLflow tracking database.
  deployments  Deploy MLflow models to custom targets.
  doctor       Prints out useful information for debugging issues with MLflow.
  experiments  Manage experiments.
  gc           Permanently delete runs in the `deleted` lifecycle stage.
  models       Deploy MLflow models locally.
  recipes      MLflow Recipes is deprecated and will be removed in MLflow...
  run          Run an MLflow project from the given URI.
  runs         Manage runs.
  sagemaker    Serve models on SageMaker.
  server       Run the MLflow tracking server.


In [3]:
!mlflow --version

mlflow, version 2.22.0


In [4]:
# step 2 import required packages
import pandas as pd
import numpy as np

In [5]:
data=pd.read_csv("/content/winequality-red.csv")
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [6]:
data.columns
# quality is our target feature

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

In [7]:
data.shape

(1599, 12)

# Algorithm name : ElasticNet
- ElasticNet is a regression model
-- mean_squared_error
-- mean_absolute_error
-- r2_score
--- these are some fixed metrics in any regression problem

In [8]:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
import mlflow
import mlflow.sklearn

In [9]:
data.isnull().sum()

Unnamed: 0,0
fixed acidity,0
volatile acidity,0
citric acid,0
residual sugar,0
chlorides,0
free sulfur dioxide,0
total sulfur dioxide,0
density,0
pH,0
sulphates,0


In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


# step 4. Develop a ml model using MLflow
- Process
- We divide data into two 2 parts : train_data and test_data
- Both train and test data include : input column(X) and output column (y)
- next we divide train_data to x_train and y_train
- Next we divide test_data to x_test and y_test
- Model will be developed based on train_data
- Model predictions performed on X_test data and the result is called as y_predictions
- Finally we compare y_pred with y_test

# MLFLOW Experiments:
- MLflow experiment is access all the mlruns such as model,metric logs, model version

In [11]:
exp=mlflow.set_experiment('/mlflow/syed1')

2025/05/27 14:51:16 INFO mlflow.tracking.fluent: Experiment with name '/mlflow/syed1' does not exist. Creating a new experiment.


In [12]:
# we need to setup experiment
# otherwise mlflow will get default experiment based on uuid

In [13]:
dir(mlflow)

['ActiveRun',
 'Image',
 'LazyLoader',
 'MLFLOW_CONFIGURE_LOGGING',
 'MlflowClient',
 'MlflowException',
 'RunOperations',
 'TYPE_CHECKING',
 'VERSION',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_configure_mlflow_loggers',
 'active_run',
 'add_trace',
 'anthropic',
 'artifacts',
 'autogen',
 'autolog',
 'azure',
 'bedrock',
 'catboost',
 'client',
 'config',
 'contextlib',
 'create_experiment',
 'crewai',
 'data',
 'delete_expectation',
 'delete_experiment',
 'delete_feedback',
 'delete_prompt',
 'delete_prompt_alias',
 'delete_run',
 'delete_tag',
 'disable_system_metrics_logging',
 'diviner',
 'doctor',
 'dspy',
 'enable_system_metrics_logging',
 'end_run',
 'entities',
 'environment_variables',
 'evaluate',
 'exceptions',
 'fastai',
 'flush_artifact_async_logging',
 'flush_async_logging',
 'flush_trace_async_logging',
 'gateway',
 'gemini',
 'get_artifact_uri',
 'get_cu

In [14]:
  exp.experiment_id

'519024991821697514'

In [15]:
exp.artifact_location

'file:///content/mlruns/519024991821697514'

In [16]:
exp.name

'/mlflow/syed1'

In [17]:
exp.lifecycle_stage

'active'

In [18]:
train,test=train_test_split(data,test_size=0.3)
train.shape,test.shape

((1119, 12), (480, 12))

In [19]:
# now we divide data into train_x and train_y and test_x and train_y

In [20]:
train.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
425,6.6,0.84,0.03,2.3,0.059,32.0,48.0,0.9952,3.52,0.56,12.3,7
231,8.0,0.38,0.06,1.8,0.078,12.0,49.0,0.99625,3.37,0.52,9.9,6
387,8.3,0.66,0.15,1.9,0.079,17.0,42.0,0.9972,3.31,0.54,9.6,6
354,6.1,0.21,0.4,1.4,0.066,40.5,165.0,0.9912,3.25,0.59,11.9,6
320,9.8,0.66,0.39,3.2,0.083,21.0,59.0,0.9989,3.37,0.71,11.5,7


In [21]:
# quality is our output column
# train_x means without quality column
# from the train data we need to drop quality column , then remaining data will be train_x
# from train data we need to access quality column, that will be train_y
train_x=train.drop('quality',axis=1)
train_y=train['quality']


In [22]:
test_x=test.drop('quality',axis=1)
test_y=test['quality']

In [23]:
train_x.shape,train_y.shape
# both are input data

((1119, 11), (1119,))

In [24]:
train_y.shape,test_y.shape

((1119,), (480,))

In [25]:
train_x.head(2)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
425,6.6,0.84,0.03,2.3,0.059,32.0,48.0,0.9952,3.52,0.56,12.3
231,8.0,0.38,0.06,1.8,0.078,12.0,49.0,0.99625,3.37,0.52,9.9


# day 8

In [26]:
def train_model(alpha_v,l1_ratio_v):
  #------------------ divide data into train and test
  train,test=train_test_split(data,test_size=0.3)
  train_x=train.drop('quality',axis=1)
  train_y=train['quality']
  test_x=train.drop('quality',axis=1)
  test_y=train['quality']

  #------------------- initiate the mlflow
  with mlflow.start_run(experiment_id=exp.experiment_id,run_name="regression",description="Performing regression analysis"):
    #-----------------------model building ----------------
    lr=ElasticNet(alpha=alpha_v,l1_ratio=l1_ratio_v)
    lr.fit(train_x,train_y)
    # ----------------------predict the model
    predicted_data=lr.predict(test_x)
    #--------- Model performance------------
    #r2 is accuracy of model
    rmse=np.sqrt(mean_squared_error(test_y,predicted_data))
    mae=mean_absolute_error(test_y,predicted_data)
    r2=r2_score(test_y,predicted_data)
    print("model parameters alpha={} and l1_ratio={}".format(alpha_v,l1_ratio_v))
    print("model performance : rmse={},mae={},r2={}".format(rmse,mae,r2))

    # log the metric
    mlflow.log_param("alpha",alpha_v)
    mlflow.log_param("l1_ratio",l1_ratio_v)
    mlflow.log_metric("rmse",rmse)
    mlflow.log_metric("mae",mae)
    mlflow.log_metric("r2",r2)
    # log the model
    mlflow.sklearn.log_model(lr,"model",registered_model_name="ElasticNet")

In [27]:
train_model(0.1,0.6)
# this is called fine tuning
# alpha and l1_ratio is always 0 to 1

model parameters alpha=0.1 and l1_ratio=0.6
model performance : rmse=0.7063836972621673,mae=0.5563424916611066,r2=0.24617089388118263


Successfully registered model 'ElasticNet'.
Created version '1' of model 'ElasticNet'.


- whenever we run the code, it will create a folder called mlruns
- whenever we create an experiment, it will create an artifact artifact_location
- inside the artifact location, whenever you run the pipeline it will create a new runid
-- - artifacts
-- - metrics
-- - params
-- - tags
-- - meta.yaml

# ARTIFACTS
- inside artifact we have a folder name as model
- inside model folder we have
  -- mlmodel
  -- conda.yaml
  -- python_env.yaml
  -- model.pkl
  -- requirement.txt

# metrics
- it will log all the metrics we provided under mlflow

# params
- it will log all the params we provided under mlflow



# NGROK
- to connect this google collab notebook to mlflowUI we need to create a tunnel
- tunnel name : ngrok
- ngrok will provide the path from google collab to mlflowUI



In [28]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.8-py3-none-any.whl.metadata (10 kB)
Downloading pyngrok-7.2.8-py3-none-any.whl (25 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.8


In [30]:
from pyngrok import ngrok
auth_token="2xgR91BRWTC4lsXkfvQTVI1QMuQ_53QQvZmCQiypj8nTvfE4J"
ngrok.set_auth_token(auth_token)
ngrok_tunnel=ngrok.connect(addr="5000",proto="http")
print("MLFLOW UI ",ngrok_tunnel.public_url)

MLFLOW UI  https://bc61-35-229-76-103.ngrok-free.app


In [32]:
!mlflow ui

[2025-05-27 15:00:22 +0000] [2942] [INFO] Starting gunicorn 23.0.0
[2025-05-27 15:00:22 +0000] [2942] [INFO] Listening at: http://127.0.0.1:5000 (2942)
[2025-05-27 15:00:22 +0000] [2942] [INFO] Using worker: sync
[2025-05-27 15:00:22 +0000] [2943] [INFO] Booting worker with pid: 2943
[2025-05-27 15:00:22 +0000] [2944] [INFO] Booting worker with pid: 2944
[2025-05-27 15:00:22 +0000] [2945] [INFO] Booting worker with pid: 2945
[2025-05-27 15:00:22 +0000] [2946] [INFO] Booting worker with pid: 2946
[2025-05-27 15:24:17 +0000] [2942] [INFO] Handling signal: int

Aborted!
[2025-05-27 15:24:18 +0000] [2943] [INFO] Worker exiting (pid: 2943)
[2025-05-27 15:24:18 +0000] [2946] [INFO] Worker exiting (pid: 2946)
[2025-05-27 15:24:18 +0000] [2945] [INFO] Worker exiting (pid: 2945)
[2025-05-27 15:24:19 +0000] [2942] [INFO] Shutting down: Master
