<a href="https://colab.research.google.com/github/siwarnasri/MlOps_CustomerSatisfaction/blob/main/2_2_Local_Deployment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2.2: Deployment and Inference with MLflow

In the last lesson, we learned how to use MLflow and Weights & Biases to track our experiments and compare models. In the end, we figured out which hyperparameters give the best model for our validation dataset. How can we make this model available to our customers/users and enable them to query it?

Setting up a dynamically scalable, highly available and reliable model service is a complex problem, and many companies employ large MLOps teams to build and maintain such services. With ZenML, we can build sophisticated ML services in minutes. In this lesson, we start with a very simple model deployment, using the [MLflow Models](https://mlflow.org/docs/latest/models.html) component to deploy our model as a local application that we can interact with via a REST API.

The beauty of ZenML is that our code remains the same regardless of the tools or infrastructure we use. In a later chapter, we'll see how we can deploy the code we write here as a dynamically scalable, serverless microservice in the cloud. But more on that later.

First, let's set up zenml and import some of the core steps we created in earlier notebooks:

In [1]:
%pip install "zenml[server]" matplotlib
!zenml integration install sklearn mlflow -y
!rm -rf .zen
!zenml init
%pip install pyparsing==2.4.2  # required for Colab

import IPython

# automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

Collecting zenml[server]
  Downloading zenml-0.44.3-py3-none-any.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting alembic<1.9.0,>=1.8.1 (from zenml[server])
  Downloading alembic-1.8.1-py3-none-any.whl (209 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.8/209.8 kB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting azure-mgmt-resource>=21.0.0 (from zenml[server])
  Downloading azure_mgmt_resource-23.0.1-py3-none-any.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting click<8.1.4,>=8.0.1 (from zenml[server])
  Downloading click-8.1.3-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.6/96.6 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting click-params<0.4.0,>=0.3.0 (from zenml[server])
  Downloading click_params-0

[1;35mNumExpr defaulting to 2 threads.[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.2/113.2 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.2/85.2 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K[32m⠇[0m Installing integrations...  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.2/80.2 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.8/55.8 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m61.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━

{'status': 'ok', 'restart': True}

In [2]:
from zenml.environment import Environment

if Environment.in_google_colab():  # Colab only setup

    # clone zenbytes repo to get source code of previous lessons
    !git clone https://github.com/zenml-io/zenbytes.git  # noqa
    !mv zenbytes/steps .
    !mv zenbytes/pipelines .

Cloning into 'zenbytes'...
remote: Enumerating objects: 877, done.[K
remote: Counting objects: 100% (227/227), done.[K
remote: Compressing objects: 100% (138/138), done.[K
remote: Total 877 (delta 123), reused 162 (delta 87), pack-reused 650[K
Receiving objects: 100% (877/877), 16.13 MiB | 11.09 MiB/s, done.
Resolving deltas: 100% (478/478), done.


In [1]:
from steps.evaluator import evaluator
from steps.importer import importer
from steps.mlflow_trainer import svc_trainer_mlflow

INFO:numexpr.utils:NumExpr defaulting to 2 threads.


[1;35mNumExpr defaulting to 2 threads.[0m
[33mThe [0m[1;36m@step[33m decorator that you used to define your evaluatorstep is deprecated. Check out the 0.40.0 migration guide for more information on how to migrate your steps to the new syntax: https://docs.zenml.io/reference/migration-guide/migration-zero-forty[0m
[33mThe [0m[1;36m@step[33m decorator that you used to define your importerstep is deprecated. Check out the 0.40.0 migration guide for more information on how to migrate your steps to the new syntax: https://docs.zenml.io/reference/migration-guide/migration-zero-forty[0m
[33mUsing the [0m[1;36mOutput[33m class to define the outputs of your steps is deprecated. You should instead use the standard Python way of type annotating your functions. Check out our documentation https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#step-output-names for more information on how to assign custom names to your step outputs.[0m
[33mTh

ZenML provides a default step for deployment in MLflow, so we don't need to write custom code. To deploy our model after training, we only need to add the `mlflow_model_deployer_step` step to our pipeline. In addition to the trained model, this step expects a boolean argument indicating whether the model should be deployed or not. This is very useful in practice as it allows you to define some requirements for deploying your models, such as that the model performs better than the currently deployed model or that no data drift occurs. Let's first define a `Deployment_trigger` that will deploy a model only if the test accuracy is above 90%:

`Unfortunately, the MLflow Model Deployer is not yet available for use in production. This is a work in progress and will be available soon. At the moment it is only available for use in a local development environment. So you can have the following error from the next cell; ImportError: cannot import name 'MLFlowDeployerParameters' from 'zenml.integrations.mlflow.steps' `

In [2]:
from zenml.integrations.mlflow.steps import (
    MLFlowDeployerParameters,
    mlflow_model_deployer_step
)
from zenml.pipelines import pipeline
from zenml.steps import step


@step
def deployment_trigger(test_acc: float) -> bool:
    """Only deploy if the test accuracy > 90%."""
    return test_acc > 0.9


@pipeline(enable_cache=False)
def train_evaluate_deploy_pipeline(
    importer,
    trainer,
    evaluator,
    deployment_trigger,
    model_deployer,
):
    """Train and deploy a model with MLflow."""
    X_train, X_test, y_train, y_test = importer()
    model = trainer(X_train=X_train, y_train=y_train)
    test_acc = evaluator(X_test=X_test, y_test=y_test, model=model)
    deployment_decision = deployment_trigger(test_acc)  # new
    model_deployer(deployment_decision, model)  # new


mlflow_pipeline = train_evaluate_deploy_pipeline(
    importer=importer(),
    trainer=svc_trainer_mlflow(),
    evaluator=evaluator(),
    deployment_trigger=deployment_trigger(),  # new
    model_deployer=mlflow_model_deployer_step(
        MLFlowDeployerParameters(timeout=20)
    ),  # new
)

Since we are using a new MLOps stack component, we need to register it with ZenML again before we can run our pipeline.
Similar to the registration of the experiment tracker in the last notebook, we first define a new model deployer and then add it to our ZenML stack.

In [None]:
# Define MLflow experiment tracker from last lesson
!zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register the MLflow model deployer
!zenml model-deployer register mlflow --flavor=mlflow

# Create a new stack with MLflow components
!zenml stack register mlflow_stack -a default -o default -d mlflow -e mlflow_tracker

# Set the new stack as active
!zenml stack set mlflow_stack

Executing pipeline.run() will now automatically deploy our model using MLflow. Let's try it out:

In [None]:
mlflow_pipeline.run(unlisted=True)

Let's run the following command to get a list of all models currently deployed with our ZenML stack:

In [None]:
!zenml model-deployer models list

If you see a check mark under Status, the model has been inserted correctly. Congratulations!

To find the URL of a model that was deployed in a particular run, you can use the
metadata field `deployed_model_url` of the model deployment step of your pipeline
run, for example:

In [None]:
from zenml.post_execution import get_unlisted_runs

last_run = get_unlisted_runs()[0]
deployer_step = last_run.get_step("model_deployer")
deployed_model_url = deployer_step.metadata["deployed_model_url"].value
print(deployed_model_url)

To interact with the deployed model in Python, we can use the `find_model_server()` method of the ZenML Model Deployer stack component:

In [None]:
from zenml.client import Client

client = Client()
model_deployer = client.active_stack.model_deployer
services = model_deployer.find_model_server(
    pipeline_name="train_evaluate_deploy_pipeline",
    pipeline_step_name="model_deployer",
    running=True,
)
service = services[0]
service.check_status()

Let's play a little with our model service and send it a query.

First, let's query the artifact store to get a sample from the test set of our last run.

In [None]:
last_run = get_unlisted_runs()[0]
X_test = last_run.steps[0].outputs["X_test"].read()
y_test = last_run.steps[0].outputs["y_test"].read()

Let's use matplotlib to plot the sample and see what our model would predict:

In [None]:
import matplotlib.pyplot as plt

plt.axis("off")
plt.imshow(X_test[0].reshape(8, 8), cmap=plt.cm.gray_r, interpolation="nearest")
pred0 = service.predict(X_test[0:1])
print(f"Model predicted {pred0}, true label was {y_test[0]}")

And that was it. We set up our first ML pipeline and learned how to interact with it. In practice, of course, you would not query the model service manually, but send samples to it automatically as new data arrives. That's exactly what we'll do in the [next lesson](2-3_Inference_Pipelines.ipynb), where we'll set up a basic inference pipeline. See you there!