# "MLOps project- part 2: Machine Learning Workflow Orchestration using Prefect and ZenML"
> "Machine learning workflow orchestration."

- toc: True
- branch: master
- badges: true
- comments: true
- categories: [mlops]
- image: images/some_folder/your_image.png
- hide: false
- search_exclude: true

In the previous blog post, we saw how to train a model and track experiments using MLflow.
In the second blog post in this series, we will get the code from previous step and convert it into a machine learning pipeline. I will show how to do it using two popular tools: Prefect and ZenML. There are many amazing tools out there which we cannot cover here such as Flyte, Kale, Aro, etc..

But why do we need a pipeline for our machine learning services? ZenML documentation explains it clearly [[source](https://github.com/zenml-io/zenbytes)]:

> As an ML practitioner, you are probably familiar with building ML models using Scikit-learn, PyTorch, TensorFlow, or similar. An **[ML Pipeline](https://docs.zenml.io/developer-guide/steps-and-pipelines)** is simply an extension, including other steps you would typically do before or after building a model, like data acquisition, preprocessing, model deployment, or monitoring. The ML pipeline essentially defines a step-by-step procedure of your work as an ML practitioner. Defining ML pipelines explicitly in code is great because: <br>
> - We can easily rerun all of our work, not just the model, eliminating bugs and making our models easier to reproduce.
> - Data and models can be versioned and tracked, so we can see at a glance which dataset a model was trained on and how it compares to other models.
> - If the entire pipeline is coded up, we can automate many operational tasks, like retraining and redeploying models when the underlying problem or data changes or rolling out new and improved models with CI/CD workflows.



We may have extensive preprocessing that we do not want to repeat every time we train a model, such as in the last blog post where we generated the 'corpus' list.
We may also need to compare the performance of different models, or wish to deploy the model and monitor data and model performance. Here, ML pipelines come into play, allowing us to specify our workflows as a series of modular processes that can subsequently be combined.

Additionally, we may have a machine learning pipeline that we would like to execute every week. We can put it on a timetable, and if the machine learning model fails or the incoming data fails, we can analyze and resolve the issues.

Let's consider a standard machine learning pipeline:

![](images/workflow-orchestration/1.png)
*[source](https://www.youtube.com/watch?v=eKzCjNXoCTc&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=22)*

First, we have a postgresql database and perhaps a task that produces some data into a parquet file. Next, we use pandas to ingest the parquet file and combine it with API data that we're pulling.
After training the model, we register the artifact and experiment with MLflow, and if certain requirements are met, we may deploy the model using Flask, for instance.
Clearly, all of these phases are interdependent, and if one fails, the entire pipeline will be affected.
Failure can even occur in unexpected ways. For instance, the incoming data is faulty, the API randomly fails to connect, and the same is true for MLflow. Perhaps you are utilizing a database to store MLflow artifacts, such as experiments, and there is a problem. All of these are regular occurrences, and the purpose of workflow orchestration is to both reduce the impact of these failures and aid in their resolution.

![](images/workflow-orchestration/2.png)
*[source](https://www.youtube.com/watch?v=eKzCjNXoCTc&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=22)*



<!-- We can identify three distinct steps in our example: data loading, model training, and model evaluation. Let us now define each of them as a ZenML **[Pipeline Step](https://docs.zenml.io/developer-guide/steps-and-pipelines#step)** simply by moving each step to its own function and decorating them with ZenML's `@step` [Python decorator](https://realpython.com/primer-on-python-decorators/). -->



All of these will aid the organization and its developers in completing their tasks and locating issues more quickly, allowing them to devote their attention to something more vital.

Great! let's see how we can do it in practice. First, let's see how Prefect can help us.

# Prefect

We will use p

# ZenML

[ZenML](https://github.com/zenml-io/zenml/) is an excellent tool for this task, as it is straightforward and intuitive to use and has [integrations](https://docs.zenml.io/mlops-stacks/integrations) with most of the advanced MLOps tools we will want to use later. Make sure you have ZenML installed (via `pip install zenml`). Let's run some commands to make sure you start with a fresh ML stack. You can ignore the details for now, as we will learn about it in more detail in a later chapter.