# MLflow 튜토리얼 개요

-  MLFlow를 엔드 투 엔드

1. 선형 회귀 분석 모형 훈련
2. 재사용 가능하고 재현 가능한 모델 형식으로 모델을 훈련하는 코드 패키지화
3. 예측 점수를 매길 수 있는 간단한 HTTP 서버에 모델 배포

# 사전 설치

- Install MLflow with extra dependencies, including scikit-learn (via pip install mlflow[extras])
- Install MLflow (via pip install mlflow) and install scikit-learn separately (via pip install scikit-learn)
- Install conda
- Clone (download) the MLflow repository via git clone https://github.com/mlflow/mlflow
- cd into the examples directory within your clone of MLflow - we’ll use this working directory for running the tutorial. We avoid running directly from our clone of MLflow as doing so would cause the tutorial to use MLflow from source, rather than your PyPI installation of MLflow.

## 성공 결과

![image.png](attachment:image.png)

# 모델 훈련

- 먼저 알파 및 l1_ratio라는 두 개의 초 매개 변수를 사용하는 선형 회귀 모형을 훈련
- 디렉토리 변경
-  MLFlow 추적 API는 모델을 훈련하는 데 사용되는 초 매개 변수 알파 및 l1_비와 같은 각 훈련 실행에 대한 정보를 기록하며, 모델을 평가하는 데 사용되는 RMSE와 같은 메트릭을 기록

In [9]:
# import os
# os.getcwd()
# os.chdir('C:/Users/canmanmo/Desktop/mlflow_test/mlflow/examples')
# os.getcwd()
# os.listdir()

['catboost',
 'databricks',
 'diviner',
 'docker',
 'evaluation',
 'fastai',
 'flower_classifier',
 'gluon',
 'h2o',
 'hyperparam',
 'keras',
 'lightgbm',
 'mlflow_artifacts',
 'multistep_workflow',
 'paddle',
 'pipelines',
 'pip_requirements',
 'pmdarima',
 'prophet',
 'pyfunc',
 'pyspark_ml_autologging',
 'pytorch',
 'quickstart',
 'rapids',
 'ray_serve',
 'README.md',
 'remote_store',
 'restore_model_dependencies',
 'rest_api',
 'r_wine',
 'shap',
 'sklearn_autolog',
 'sklearn_elasticnet_diabetes',
 'sklearn_elasticnet_wine',
 'sklearn_logistic_regression',
 'spacy',
 'spark_udf',
 'statsmodels',
 'supply_chain_security',
 'tensorflow',
 'virtualenv',
 'xgboost']

In [13]:
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)


def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url = (
        "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    )
    try:
        data = pd.read_csv(csv_url, sep=";")
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e
        )

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":

            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
        else:
            mlflow.sklearn.log_model(lr, "model")

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614


In [14]:
# Make sure the current working directory is 'examples'
!python sklearn_elasticnet_wine/train.py

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614


## 하이퍼파라미터 변경

- Try out some other values for alpha and l1_ratio by passing them as arguments to train.py:

In [16]:
# Make sure the current working directory is 'examples'
# python sklearn_elasticnet_wine/train.py <alpha> <l1_ratio>
!python sklearn_elasticnet_wine/train.py 0.3 0.2

Elasticnet model (alpha=0.300000, l1_ratio=0.200000):
  RMSE: 0.7397486012946923
  MAE: 0.5704931175017443
  R2: 0.2246424241189423


- 예를 실행할 때마다 MLflow는 mlruns 디렉토리에 실험 실행 정보를 기록합니다.

![image.png](attachment:image.png)

# 모델 비교

 - MLflow UI를 사용하여 생성한 모델을 비교
 - 모형을 비교하는 데 사용할 수 있는 메트릭을 사용한 실험 실행 목록을 볼 수 있음

In [18]:
!mlflow ui

^C


![image.png](attachment:image.png)

# Training Code 패키징

- 훈련 코드를 얻었으므로 다른 데이터 과학자가 모델을 쉽게 재사용하거나 Databricks와 같이 원격으로 교육을 실행할 수 있도록 훈련 코드 패키징 가능.
- You do this by using MLflow Projects conventions to specify the dependencies and entry points to your code.

In [None]:
#  sklearn_elasticnet_wine/MLproject

name: tutorial

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: {type: float, default: 0.5}
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"

In [None]:
# sklearn_elasticnet_wine/conda.yaml

name: tutorial
channels:
  - conda-forge
dependencies:
  - python=3.7
  - pip
  - pip:
      - scikit-learn==0.23.2
      - mlflow>=1.0
      - pandas

In [19]:
!mlflow run sklearn_elasticnet_wine -P alpha=0.42

Elasticnet model (alpha=0.420000, l1_ratio=0.100000):
  RMSE: 0.7420620899060748
  MAE: 0.5722846717246247
  R2: 0.21978513651550236


2022/08/02 13:22:51 INFO mlflow.utils.conda: Conda environment mlflow-7122f0cb71f385d249fbb61cc599afd8045ab238 already exists.
2022/08/02 13:22:51 INFO mlflow.projects.utils: === Created directory C:\Users\canmanmo\AppData\Local\Temp\tmpsviycpfe for downloading remote URIs passed to arguments of type 'path' ===
2022/08/02 13:22:51 INFO mlflow.projects.backend.local: === Running command 'conda activate mlflow-7122f0cb71f385d249fbb61cc599afd8045ab238 && python train.py 0.42 0.1' in run with ID 'cb1290aa72774d5ca34a7dfbbfec76d9' === 
2022/08/02 13:22:56 INFO mlflow.projects: === Run (ID 'cb1290aa72774d5ca34a7dfbbfec76d9') succeeded ===


In [21]:
# 예시 
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0

# 커스텀
!mlflow run https://github.com/karlbulee/MLflow_test.git -P alpha=5.0

Elasticnet model (alpha=5.000000, l1_ratio=0.100000):
  RMSE: 0.8328607459178974
  MAE: 0.668382096782403
  R2: 0.017169746603498015


2022/08/02 13:34:36 INFO mlflow.projects.utils: === Fetching project from https://github.com/karlbulee/MLflow_test.git into C:\Users\canmanmo\AppData\Local\Temp\tmputmgsywc ===
2022/08/02 13:34:39 INFO mlflow.projects.utils: Fetched 'main' branch
2022/08/02 13:34:42 INFO mlflow.utils.conda: Conda environment mlflow-7122f0cb71f385d249fbb61cc599afd8045ab238 already exists.
2022/08/02 13:34:42 INFO mlflow.projects.utils: === Created directory C:\Users\canmanmo\AppData\Local\Temp\tmpsgbzsgo7 for downloading remote URIs passed to arguments of type 'path' ===
2022/08/02 13:34:42 INFO mlflow.projects.backend.local: === Running command 'conda activate mlflow-7122f0cb71f385d249fbb61cc599afd8045ab238 && python train.py 5.0 0.1' in run with ID 'b2af12d57d2646c4bc5b1a16d24602aa' === 
2022/08/02 13:34:47 INFO mlflow.projects: === Run (ID 'b2af12d57d2646c4bc5b1a16d24602aa') succeeded ===


## 성공 결과

- 깃헙에 새로운 Repo 생성(MLflow_test)
- 해당 Repo에 sklearn_elasticnet_wine 파일 업로드

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# 모델 서빙

- MLflow Model은 다양한 다운스트림 도구에서 사용할 수 있는 머신러닝 모델을 패키징하기 위한 표준 포맷

In [22]:
mlflow.sklearn.log_model(lr, "model")

'mlflow.sklearn.log_model'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는
배치 파일이 아닙니다.


![image.png](attachment:image.png)

-  ML 모델 형식을 MLflow와 함께 사용하여 예측을 제공할 수 있는 로컬 REST 서버를 배포 가능

In [None]:
mlflow models serve -m /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234

## 성공 결과

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# 레퍼런스

- URL
1. https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html