## MLOps using MLFlow

MLOps, short for Machine Learning Operations, is a set of practices designed to create an assembly line for building and running machine learning models. It combines the principles of DevOps with the intricacies of machine learning.

MLOps aims to automate tasks, deploy models swiftly, and ensure smooth collaboration among data scientists, engineers, and IT professionals.

In [1]:
pip show mlflow

/bin/bash: /home/mjanuadi/anaconda3/envs/image-dl/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Name: mlflow
Version: 2.12.1
Summary: MLflow is an open source platform for the complete machine learning lifecycle
Home-page: 
Author: 
Author-email: 
License: Copyright 2018 Databricks, Inc.  All rights reserved.
        
                                        Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "

In [2]:
!mlflow --version

/bin/bash: /home/mjanuadi/anaconda3/envs/image-dl/lib/libtinfo.so.6: no version information available (required by /bin/bash)
mlflow, version 2.12.1


### Load Model And Test to Test Dataset

Let's load our model using pickle and the dataset once more to test the MLOps.

In [3]:
# Let's import our model

import pickle

model = pickle.load(open("model_selected.pkl", "rb"))

In [4]:
# Create a function to return metrics of our model

def get_metrics(y_true, y_pred):
    from sklearn.metrics import accuracy_score, precision_score, recall_score
    acc = accuracy_score(y_true, y_pred)
    prec = precision_score(y_true, y_pred, average='micro')
    recall = recall_score(y_true, y_pred, average='micro')
    return {'accuracy' : round(acc, 2), 'precision': round(prec, 2), 'recall': round(recall, 2)}


In [5]:
def prepare_X_y_test(data_url):
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    import numpy as np

    scaler = StandardScaler()

    data = pd.read_csv(data_url)
    X = data.drop(columns= ['id', 'Class'])
    y = data['Class']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state =42)
    # X_test = np.array(X_test)
    # X_test = X_test.reshape(1,-1)
    # X_test = scaler.fit_transform(X_test)

    return X_test, y_test



In [6]:
data_url = '/home/mjanuadi/MLDeploy_Flask/data/riceClassification.csv'

X_test_2, y_true = prepare_X_y_test(data_url)

In [7]:
X_test_2.shape

(1819, 10)

In [8]:
y_pred = model.predict(X_test_2)

In [9]:
import numpy as np
 
y_pred, np.array(y_true)

(array([0, 1, 0, ..., 0, 0, 0]), array([0, 1, 0, ..., 0, 0, 0]))

In [10]:
run_metrics = get_metrics(np.array(y_true), y_pred)
run_metrics

{'accuracy': 0.99, 'precision': 0.99, 'recall': 0.99}

### MlFlow Configuration
Let's start our MlFlow configuration for the MlOps

In [11]:
def create_experiment(experiment_name, run_name, run_metrics, model, confusion_matrix_path = None, 
                      roc_auc_plot_path = None, run_params= None):
    import mlflow
    mlflow.set_tracking_uri("http://localhost:5001")


    mlflow.set_experiment(experiment_name)

    with mlflow.start_run(run_name=run_name):

        if not run_params == None:
            for param in run_params:
                mlflow.log_param(param, run_params[param])
            
        for metric in run_metrics:
            mlflow.log_metric(metric, run_metrics[metric])
        
        if not confusion_matrix_path == None:
            mlflow.log_artifact(confusion_matrix_path, 'confusion_matrix')

        
        if not roc_auc_plot_path == None:
            mlflow.log_artifact(roc_auc_plot_path, "roc_auc_plot")

        mlflow.set_tag("tag1", "Rice Classifier")
        mlflow.set_tags({"tag2" : "Random Forest", "tag3" : "Binary Classification"})
        mlflow.sklearn.log_model(model, "model", registered_model_name= "rice-classifier")

    print('Run - %s is logged to Experiment - %s' %(run_name, experiment_name))

### Start MLFlow Server
Run this code in command line:
`mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts --host 0.0.0.0 --port 5001`

#### Excecute the create_experiment function and log experiment

In [12]:
from datetime import datetime
experiment_name = "rice_classifier_new_" + str(datetime.now().strftime("%d-%m-%y"))
run_name= "rice_classifier_new_"+ str(datetime.now().strftime("%d-%m-%y"))
create_experiment(experiment_name, run_name, run_metrics, model)

2024/04/23 09:25:27 INFO mlflow.tracking.fluent: Experiment with name 'rice_classifier_new_23-04-24' does not exist. Creating a new experiment.
Successfully registered model 'rice-classifier'.
2024/04/23 09:25:28 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: rice-classifier, version 1


Run - rice_classifier_new_23-04-24 is logged to Experiment - rice_classifier_new_23-04-24


Created version '1' of model 'rice-classifier'.


### Fetching an MlFlow Model from the Model Registry

#### Transitioning an MlFlow Model's stage

In [13]:
import mlflow
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="rice-classifier",
    version=1,
    stage = "Production"
)

  client.transition_model_version_stage(


<ModelVersion: aliases=[], creation_timestamp=1713839128826, current_stage='Production', description='', last_updated_timestamp=1713839167460, name='rice-classifier', run_id='4861b4e277c7444da82baee0030d1cdc', run_link='', source='/home/mjanuadi/artifacts/1/4861b4e277c7444da82baee0030d1cdc/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

#### Model Deployment

In [14]:
import mlflow
mlflow.set_tracking_uri('http://localhost:5001')


#### Load our model from Model Registry
Load our model using MlFlow Pyfunc


In [15]:
import mlflow
logged_model = 'runs:/4861b4e277c7444da82baee0030d1cdc/model'

loaded_model = mlflow.pyfunc.load_model(logged_model)


In [16]:
loaded_model

mlflow.pyfunc.loaded_model:
  artifact_path: model
  flavor: mlflow.sklearn
  run_id: 4861b4e277c7444da82baee0030d1cdc

#### Let's Predict The Dataset

In [23]:
import pandas as pd
y_pred_mlflow =  loaded_model.predict(pd.DataFrame(X_test_2))
y_pred_mlflow

array([0, 1, 0, ..., 0, 0, 0])

### Let's serving our model

In [24]:
import mlflow

mlflow.set_tracking_uri('http://localhost:5001')

#### Then Run this code
`mlflow models serve --model-uri models:/rice-classifier/Production -p 1234 --no-conda`

Model is served!!  

Listening at: http://127.0.0.1:1234 (3596)

#### Let's do the prediction

#### Online Prediction

An online prediction is a synchronous request. It means that when you make a prediction, you expect an immediate response.

In [25]:
import requests


inference_request = {
    "dataframe_records" : [[3050, 77.5, 53.04, 0.745, 3135, 63.1, 0.76, 211.2, 0.86, 1.45]]
}

endpoint = "http://localhost:1234/invocations"

response = requests.post(endpoint, json = inference_request)

print(response.text)

{"predictions": [1]}


#### Batch Prediction

Batch prediction is a process used in machine learning to generate predictions for a large dataset. Here are the key points:

Asynchronous Request: Unlike online prediction (which is synchronous), batch prediction is an asynchronous request. It doesn’t require deploying the model to an endpoint

In [26]:
X_test_2

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,EquivDiameter,Extent,Perimeter,Roundness,AspectRation
18056,9429,174.671378,69.661360,0.917032,9752,109.569045,0.644938,406.975,0.715385,2.507436
3256,5842,145.807572,51.673280,0.935096,6020,86.245379,0.641978,330.790,0.670914,2.821721
17106,8359,158.012446,67.848264,0.903121,8540,103.164962,0.666268,370.413,0.765582,2.328909
9810,6493,167.571643,50.974529,0.952610,6737,90.923838,0.796687,366.921,0.606052,3.287360
9563,5594,153.172203,47.071526,0.951609,5770,84.394917,0.496847,337.176,0.618328,3.254031
...,...,...,...,...,...,...,...,...,...,...
1983,6150,148.149483,54.276713,0.930471,6341,88.489678,0.533068,337.755,0.677456,2.729522
15769,8334,156.418697,69.190272,0.896847,8670,103.010574,0.576588,375.312,0.743496,2.260704
7816,8393,147.020754,73.865291,0.864627,8663,103.374559,0.612807,373.304,0.756836,1.990390
15626,9817,168.580801,74.765826,0.896274,10018,111.800682,0.634009,399.847,0.771616,2.254784


In [28]:
lst = X_test_2.values.tolist()

inference_request = {
    "dataframe_records" : lst
}

endpoint = "http://localhost:1234/invocations"

response = requests.post(endpoint, json = inference_request)

print(response.text)

{"predictions": [0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1,