<a href="https://colab.research.google.com/github/kamleshcode/intern-project/blob/main/ML_Pipeline_with_ZenML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Key Concepts:**ML Pipelines Steps

In this notebook. we will learn how to easily convert existing ML code into ML pipelines using ZenML.

In [1]:
%pip install "zenml[server]"
!zenml integration install sklearn —y
%pip install pyparsing==2.4.2 #required for Colab
import IPython
#automatically restart kernel
IPython.Application.instance().kernel.do_shutdown(restart=True)

[1;35mNumExpr defaulting to 2 threads.[0m
[2;36mAll required packages for integration [0m[2;32m'sklearn'[0m[2;36m are already installed.[0m
[33mUnable to find integration [0m[32m'—y'[0m[33m.[0m
Collecting pyparsing==2.4.2
  Using cached pyparsing-2.4.2-py2.py3-none-any.whl (65 kB)
Installing collected packages: pyparsing
  Attempting uninstall: pyparsing
    Found existing installation: pyparsing 2.4.7
    Uninstalling pyparsing-2.4.7:
      Successfully uninstalled pyparsing-2.4.7
Successfully installed pyparsing-2.4.2


{'status': 'ok', 'restart': True}

In [25]:
NGROK_TOKEN = "2fiJ3BA1OUCSgEwNd52NpmZYppS_56uYRY9SMc8489WjeFmk6"

In [26]:
from zenml.environment import Environment
if Environment.in_google_colab():
  #install and authenticate ngrok
  !pip install pyngrok
  !ngrok authtoken {NGROK_TOKEN}

Collecting pyngrok
  Downloading pyngrok-7.1.6-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.1.6
Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


ZenML Setup

In [27]:
!rm -rf .zen #to remove any existing file
!zenml init  # to initialize zenml repositary

[1;35mNumExpr defaulting to 2 threads.[0m
[?25l[32m⠋[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠹[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠸[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠼[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠴[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠦[0m Initializing ZenML repository at /content.
[2K[1A[2K[32m⠧[0m Initializing ZenML repository at /content.
[1;35mSetting the repo active workspace to 'default'.[0m
[33mSetting the repo active stack to default.[0m
[2K[1A[2K[2;36mZenML repository initialized at [0m[2;35m/[0m[2;95mcontent.[0m
[2;32m⠇[0m[2;36m Initializing ZenML repository at /content.[0m
[2K[1A[2K[32m⠇[0m Initializing ZenML repository at /content.

[1A[2K[1A[2K[2;36mThe local active stack was initialized to [0m[2;32m'default'[0m[2;36m. This local configuration will only take effect [0m
[2;36mwhen you're 

Example Experimentation ML code

In [28]:
import numpy as np
from sklearn.base import ClassifierMixin
from sklearn.svm import SVC
from sklearn. datasets import load_digits
from sklearn.model_selection import train_test_split

def train_test() -> None:
  """Train and Test a scikit-learn SVC classifier on digits"""
  digits=load_digits()
  data=digits.images.reshape((len(digits.images),-1))
  X_train,X_test,Y_train,Y_test = train_test_split(
      data, digits.target, test_size=0.2, shuffle=False
  )
  model=SVC(gamma=0.001)
  model.fit(X_train,Y_train)
  test_acc=model.score(X_test,Y_test)
  print(f"Test accuracy:{test_acc}")
train_test()


Test accuracy:0.9583333333333334


Turning experiments into ML Pipelines with ZenML

Importer --> SVC Trainer --> Evaluator

In [29]:
from zenml import step
from typing_extensions import Annotated
import pandas as pd
from typing import Tuple

@step
def importer() ->Tuple[
    Annotated[np.ndarray, "X_train"],
    Annotated[np.ndarray, "X_test"],
    Annotated[np.ndarray, "Y_train"],
    Annotated[np.ndarray, "Y_test"]
]:

    """Load digit Dataset as Numpy Arrays"""
    digits=load_digits()
    data=digits.images.reshape((len(digits.images),-1))
    X_train,X_test,Y_train,Y_test = train_test_split(
      data, digits.target, test_size=0.2, shuffle=False
    )
    return X_train, X_test,Y_train,Y_test


@step
def svc_trainer(
    X_train: np.ndarray,
    Y_train: np.ndarray,
 )->ClassifierMixin:
    """Train as sklearn SVC classifier."""
    model=SVC(gamma=0.001)
    model.fit(X_train, Y_train)
    return model

@step
def evaluator(
    X_test:np.ndarray,
    Y_test:np.ndarray,
    model:ClassifierMixin,
)->float:
   """Calculate the test set accuracy of an sklearn model."""
   test_acc = model.score(X_test,Y_test)
   print(f"Test accuracy:{test_acc}")
   return test_acc

Similarly we can use ZenML @pipeline decorator to connect all our steps into an ML pipeline.
Note that the pipeline definition does not depend on the concrete step functions we defined above it merely establishes a recipe for how data
moves through the steps. This means we can replace steps as we wish, e.g. to run the Same pipeline with different models to compare their
performances.

In [30]:
from zenml import pipeline

@pipeline
def digit_pipeline():
  """Links all the steps together in a pipeline"""
  X_train,X_test,Y_train,Y_test = importer()
  model = svc_trainer(X_train=X_train, Y_train=Y_train)
  evaluator(X_test=X_test,Y_test=Y_test,model=model)

Running ZenML Pipelines

Finally, we initialize our pipeline with concrete step functions and call the run() method to run it.

In [31]:
digits_svc_pipeline=digit_pipeline()
#digits_svc_pipeline.run(unlisted=True)


[1;35mInitiating a new run for the pipeline: [0m[1;36mdigit_pipeline[1;35m.[0m
[1;35mReusing registered pipeline version: [0m[1;36m(version: 1)[1;35m.[0m
[1;35mExecuting a new run.[0m
[1;35mUsing user: [0m[1;36mdefault[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35mYou can visualize your pipeline runs in the [0m[1;36mZenML Dashboard[1;35m. In order to try it locally, please run [0m[1;36mzenml up[1;35m.[0m
[1;35mUsing cached version of [0m[1;36mimporter[1;35m.[0m
[1;35mStep [0m[1;36mimporter[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36msvc_trainer[1;35m.[0m
[1;35mStep [0m[1;36msvc_trainer[1;35m has started.[0m
[1;35mUsing cached version of [0m[1;36mevaluator[1;35m.[0m
[1;35mStep [0m[1;36mevaluator[1;35m has started.[0m
[1;35mPipeline run has finished in [0m[1;36m0.742s[1;35m.[0m


And that's it. we just built and ran our first ML pipeline! Great job!

You can now visualize the pipeline run in ZenMl's dashboard. TO do so, run zenml up to spin up a ZenML dashboard locally, log in with
username default and empty password, and navigate to the "Runs" tab in the 'Pipelines' section.

In [None]:
from zenml.environment import Environment
def start_zenml_dashboard(port=8237):
  if Environment.in_google_colab():
    from pyngrok import ngrok
    public_url = ngrok.connect(port)
    print(f"/xlb[31mIn Colab, use this URL insted: {public_url}!\x1b[Om")
    !zenml up --blocking --port {port}

  else:
    !zenml up --port {port}

start_zenml_dashboard()


INFO:pyngrok.ngrok:Opening tunnel named: http-8237-f5ad5396-8376-4476-87d7-2469293824af


[1;35mOpening tunnel named: http-8237-f5ad5396-8376-4476-87d7-2469293824af[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="no configuration paths supplied"


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="no configuration paths supplied"[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="using configuration at default config path" path=/root/.config/ngrok/ngrok.yml


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="using configuration at default config path" path=/root/.config/ngrok/ngrok.yml[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="open config file" path=/root/.config/ngrok/ngrok.yml err=nil


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="open config file" path=/root/.config/ngrok/ngrok.yml err=nil[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040 allow_hosts=[]


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040 allow_hosts=[][0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="client session established" obj=tunnels.session


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="client session established" obj=tunnels.session[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="tunnel session started" obj=tunnels.session


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="tunnel session started" obj=tunnels.session[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=e6276920f6f3657f


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=e6276920f6f3657f[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=e6276920f6f3657f status=200 dur=491.329µs


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=e6276920f6f3657f status=200 dur=491.329µs[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=bc380260a8fcd744


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=bc380260a8fcd744[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=bc380260a8fcd744 status=200 dur=126.97µs


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=bc380260a8fcd744 status=200 dur=126.97µs[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=37c54939204a990a


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=start pg=/api/tunnels id=37c54939204a990a[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg="started tunnel" obj=tunnels name=http-8237-f5ad5396-8376-4476-87d7-2469293824af addr=http://localhost:8237 url=https://cc85-34-23-193-73.ngrok-free.app


[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg="started tunnel" obj=tunnels name=http-8237-f5ad5396-8376-4476-87d7-2469293824af addr=http://localhost:8237 url=https://cc85-34-23-193-73.ngrok-free.app[0m


INFO:pyngrok.process.ngrok:t=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=37c54939204a990a status=201 dur=41.991574ms


/xlb[31mIn Colab, use this URL insted: NgrokTunnel: "https://cc85-34-23-193-73.ngrok-free.app" -> "http://localhost:8237"![Om
[1;35mt=2024-04-28T04:51:45+0000 lvl=info msg=end pg=/api/tunnels id=37c54939204a990a status=201 dur=41.991574ms[0m
[1;35mNumExpr defaulting to 2 threads.[0m
[2;36mThe local ZenML dashboard is about to deploy in a blocking process. You can connect to it using the [0m
[2;32m'default'[0m[2;36m username and an empty password.[0m
[1;35mDeploying a local ZenML server with name 'local'.[0m
[1;35mStarting ZenML Server as blocking process... press CTRL+C once to stop it.[0m
[32mINFO[0m:     Started server process [[36m29931[0m]
[32mINFO[0m:     Waiting for application startup.
[1;35mNot writing the global configuration to disk in a ZenML server environment.[0m
[1;35mNot writing the global configuration to disk in a ZenML server environment.[0m
[32mINFO[0m:     Application startup complete.
[32mINFO[0m:     Uvicorn running on [1mhttp://127.0