# Deploy

**Sources**

- [Tutorial by Tobias Starbak](https://www.youtube.com/watch?v=IUF4s9SXnd4);
- [Deployment](https://mlflow.org/docs/latest/deployment/index.html) page in offcial MLflow documentation.

## Setup

In [10]:
import pandas as pd
from pathlib import Path

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec

In [11]:
%%bash
docker run -p 5000:5000 -dt --name mlflow_deploy --rm \
    ghcr.io/mlflow/mlflow \
    bash -c "mlflow server --host 0.0.0.0 --port 5000"

2acf7e2afb38abedbe7fa37412f9377134b900996caec92426979381f9dd352a


In [12]:
mlflow.set_tracking_uri("http://localhost:5000")
exp_name = "penguin_classification"
mlflow.create_experiment(exp_name)

'569328658685107276'

In [9]:
%%bash
docker stop mlflow_deploy

mlflow_deploy


## Add model to registry

### Create run

In [13]:
input_schema = Schema([
  ColSpec("double", "Culmen Length (mm)"),
  ColSpec("double", "Culmen Depth (mm)"),
])
output_schema = Schema([ColSpec("string")])

signature = ModelSignature(inputs=input_schema, outputs=output_schema)

mlflow.set_experiment(exp_name)
with mlflow.start_run() as run:
    run_id = run.info.run_id
    print(f"Started run {run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data = pd.read_csv(
        Path("deploy_files")/"penguins_classification.csv"
    )

    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 3
    max_leaf_nodes = 4
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")
    
    # Log the model
    mlflow.sklearn.log_model(tree, "model", signature=signature)

Started run f61f3c3e17d1474f9dc16e2b2a8237f2
Load dataset...
Prepare a train-test-split...
Initialize and fit a DecisionTreeClassifier with max_depth=3, max_leaf_nodes4
Result: Accuracy of the DecisionTreeClassifier: 96.5%




### Add run to registry

In [14]:
model_name = "penguins_clf"

result = mlflow.register_model(
    f"runs:/{run_id}/model",
    f"{model_name}"
)

Successfully registered model 'penguins_clf'.
2024/06/06 19:42:55 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: penguins_clf, version 1
Created version '1' of model 'penguins_clf'.


## Run MLFlow side API