<a href="https://colab.research.google.com/github/siwarnasri/MlOps_CustomerSatisfaction/blob/main/1_1_Pipelines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook 1.1: ML Pipelines

In this notebook, you will learn how to convert existing ML code into ML pipelines using ZenML.

Since we will be creating models using Sklearn, you must have the ZenML Sklearn integration installed. You can install ZenML and the Sklearn integration with the following command, which will also reboot your notebook's kernel.

In [None]:
%pip install "zenml[server]"
!zenml integration install sklearn -y
%pip install pyparsing==2.4.2  # required for Colab

import IPython

# automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

Collecting zenml[server]
  Downloading zenml-0.44.3-py3-none-any.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic<1.9.0,>=1.8.1 (from zenml[server])
  Downloading alembic-1.8.1-py3-none-any.whl (209 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.8/209.8 kB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting azure-mgmt-resource>=21.0.0 (from zenml[server])
  Downloading azure_mgmt_resource-23.0.1-py3-none-any.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting click<8.1.4,>=8.0.1 (from zenml[server])
  Downloading click-8.1.3-py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.6/96.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting click-params<0.4.0,>=0.3.0 (from zenml[server])
  Downloading click_pa

**Colab Note:** On Colab, you need an [ngrok account](https://dashboard.ngrok.com/signup) to view some of the visualizations later. Please set up an account, then set your user token below:

In [None]:
NGROK_TOKEN = "ONNKZJSBB6C24BGK2DLSM5C4MQDORS5V"  # TODO: set your ngrok token if you are working on Colab

In [None]:
from zenml.environment import Environment

if Environment.in_google_colab():  # Colab only setup

    # install and authenticate ngrok
    !pip install pyngrok
    !ngrok authtoken {NGROK_TOKEN}

As an ML practitioner, you are probably familiar with building ML models using Scikit-learn, PyTorch, TensorFlow, or similar. An ML pipeline is simply an extension that includes other steps you would normally perform before or after creating a model, such as data collection, preprocessing, model deployment, or monitoring. The ML pipeline essentially defines a step-by-step process for your work as an ML practitioner. Defining ML pipelines explicitly in code is great because:

We can easily repeat all of our work, not just the model, to eliminate errors and make our models easier to reproduce.
Data and models can be versioned and tracked, so we can see at a glance which dataset a model was trained on and how it compares to other models.
When the entire pipeline is coded, we can automate many operational tasks, such as re-training and redeploying models when the underlying problem or data changes, or rolling out new and improved models with CI/CD workflows.
A well-defined ML pipeline is essential for ML teams looking to deploy models at scale.

## ZenML Setup
Throughout this series, we will define our ML pipelines using [ZenML](https://github.com/zenml-io/zenml/). ZenML is an excellent tool for this task, as it is easy and intuitive to use, and has [integrations](https://zenml.io/integrations) with most of the advanced MLOps tools we will use later. Make sure you have ZenML installed (via `pip install zenml`). Next, we run some commands to make sure you start with a fresh ML stack.