## ML Flow

MLflow adalah platform open source yang digunakan untuk mengelola alur kerja Machine Learning (ML). MLflow membantu dalam mengatur, melacak dan
mengelola eksperimen ML, dari pengembangan model hingga produksi.

MLflow memiliki komponen untuk memonitor model ketika training dan running, kemampuan menyimpan model, me-load model di production code dan membuat
pipeline.
Core Component:
1. Tracking
2. Model registry
3. ML Flow Deployments for LLMs
4. Evaluate
5. Prompt Engineering UI
6. Recipes
7. Project

Why use ML Flow?
1. Traceability
2. Consistency
3. Flexibelity

Use Case of ML Flow:
1. Experiment tracking
2. Model Selection & Experiment
3. Model Performmance Monitoring
4. Collaborative Projects

In [1]:
import pandas as pd
import numpy as np
import os
from pycaret.classification import *
from pycaret.datasets import get_data
import ydata_profiling as pp
import mlflow
import mlflow.sklearn

Command Prompt
C:\Users\USER>mlflow ui
C:\Users\USER\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydantic\_internal\_config.py:322: UserWarning: Valid config keys have changed in V2:
 'schema_extra' has been renamed to 'json_schema_extra'
  warnings.warn(message, UserWarning)
INFO:waitress:Serving on http://127.0.0.1:5000

In [8]:
# Set URI pelacakan MLflow
mlflow.set_tracking_uri('http://localhost:5000')

# Set eksperimen yang diinginkan
mlflow.set_experiment('model-heart_cleveland')

# Aktifkan autologging untuk semua model sklearn
mlflow.sklearn.autolog()



Baris set URI menetapkan URI untuk pelacakan dan menunjukkan ke MLflow di mana catatan eksperimen harus disimpan. Dalam kasus ini, pelacakan dilakukan di http://localhost:5000, yang mungkin merupakan alamat lokal dari server MLflow yang berjalan.

Langkah ini menetapkan eksperimen yang diinginkan. Eksperimen adalah kumpulan run atau percobaan yang memiliki tujuan atau fokus yang sama. Di sini, kita menetapkan eksperimen dengan nama 'model-heart_cleveland'.

Baris ini mengaktifkan autologging untuk semua model sklearn. Autologging secara otomatis mencatat metrik, parameter, dan artefak model selama proses pelatihan, tanpa perlu menambahkan log manual. Autologging mempercepat proses pelacakan dan dokumentasi model. Dengan menggunakan autologging sklearn, semua model yang dilatih dalam kode akan otomatis dicatat oleh MLflow.

In [9]:
# Mulai MLflow run
with mlflow.start_run(run_name='iterasi_2'):

    # Import dataset
    dataset_path = 'D:/SDT Semester 4/Machine Learning Ops/heart_cleveland_upload.csv'
    dataset = pd.read_csv(dataset_path)

    # Catat lokasi dataset (Lokasi dataset dan nama file dataset dicatat sebagai parameter menggunakan mlflow.log_param())
    mlflow.log_param('dataset_path', dataset_path)

    # Catat nama file dataset
    dataset_filename = os.path.basename(dataset_path)
    print("Dataset Filename:", dataset_filename)  # Print the dataset filename
    mlflow.log_param('dataset_filename', dataset_filename)

    # Analisa dataset menggunakan profiler
    profile = pp.ProfileReport(dataset)
    profile.to_file("output_mlopsM7.html")

    # Bagi dataset menjadi data latih dan data uji
    data_train = dataset.sample(frac=0.8, random_state=110)
    data_unseen = dataset.drop(data_train.index)
    data_train.reset_index(drop=True, inplace=True)
    data_unseen.reset_index(drop=True, inplace=True)

    # Setup eksperimen
    cat_features = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']
    experiment = setup(data_train, target='condition', categorical_features=cat_features)

    # Proses training model
    best_model = compare_models(sort='Precision', fold=10, round=2)

    # Simpan model ke dalam direktori tempat menyimpan model
    model_path = "sklearn-model"
    mlflow.sklearn.log_model(best_model, model_path)

    # Analisis model
    evaluate_model(best_model)

Dataset Filename: heart_cleveland_upload.csv


Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,Description,Value
0,Session id,3431
1,Target,condition
2,Target type,Binary
3,Original data shape,"(238, 14)"
4,Transformed data shape,"(238, 23)"
5,Transformed train set shape,"(166, 23)"
6,Transformed test set shape,"(72, 23)"
7,Numeric features,6
8,Categorical features,7
9,Preprocess,True


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.86,0.0,0.8,0.89,0.83,0.71,0.73,5.19
catboost,CatBoost Classifier,0.83,0.0,0.78,0.86,0.81,0.66,0.67,2.97
ridge,Ridge Classifier,0.84,0.0,0.8,0.85,0.82,0.68,0.69,0.11
lda,Linear Discriminant Analysis,0.84,0.0,0.79,0.85,0.81,0.67,0.68,0.08
nb,Naive Bayes,0.83,0.0,0.76,0.84,0.8,0.65,0.65,0.16
et,Extra Trees Classifier,0.82,0.0,0.78,0.83,0.79,0.63,0.64,0.2
rf,Random Forest Classifier,0.8,0.0,0.72,0.82,0.76,0.59,0.6,0.25
lightgbm,Light Gradient Boosting Machine,0.8,0.0,0.74,0.82,0.77,0.6,0.61,0.31
ada,Ada Boost Classifier,0.79,0.0,0.74,0.81,0.76,0.58,0.6,0.11
gbc,Gradient Boosting Classifier,0.79,0.0,0.72,0.81,0.75,0.57,0.59,0.18


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [10]:
# Akhiri MLflow run
mlflow.end_run()