# 03Tools - Pipeline Example 1

**Deploying the current best model to an endpoint**

The notebooks `03a` through `03f` all train ML models with different techniques using the same training data.  This might represent:
- multiple coworkers working on the same project or
- different approaches being tried at different times
- continuous development of multiple techniques in parallel

In each attempt, the final model is registered to Vertex AI Model Registry as the latest, best attempt for the approach.  All of the model are registered with the same label value for `series`.  Using this label, we could routinely review all the approaches and pick the "current" best model for the domain.

An example workflow might be:

# **PLACE THE OUTLINE HERE**

**Pipelines**
When the workflow includes multiple steps with direct dependicies we have a pipeline.  Using Vertex AI Pipelines we can do all these steps in a single serverless manner.  Each of the steps of our workflow outlined above turn into components.  The components are connected through inputs and outputs.  

**Prerequisites:**
-  One or More of `03a`, `03b`, `03c`, `03d`, `03e`, `03f`
    - Each of these register a model/version in the Vertex AI Model Registry

**Resources:**
- [Introduction to Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction)
- [Pre-Built Components](https://cloud.google.com/vertex-ai/docs/pipelines/gcpc-list)
    - [SDK Documentation](https://google-cloud-pipeline-components.readthedocs.io/page/)
- [Custom Components](https://cloud.google.com/vertex-ai/docs/pipelines/build-own-components)
    - [Pipleines SDK](https://www.kubeflow.org/docs/components/pipelines/sdk-v2/v2-compatibility/)

**Conceptual Flow & Workflow**
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/03tools_pipe1_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/03tools_pipe1_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline_1'
SERIES = '03'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from google.cloud import bigquery
from datetime import datetime

from datetime import datetime
from typing import NamedTuple

from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import Artifact, Input, Metrics, Output, component

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [5]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{DATANAME}/pipelines/{EXPERIMENT}"
DIR = f"temp/{NOTEBOOK}"

In [6]:
# Give service account roles/storage.objectAdmin permissions
# Console > IMA > Select Account <projectnumber>-compute@developer.gserviceaccount.com > edit - give role
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [7]:
#!rm -rf {DIR}
#!mkdir -p {DIR}

---

## Pipeline

This section follows the process I take to build a workflow as a Vertex AI Pipeline.  
1. Create an outline of the workflow
2. Prepare components, custom or prebuilt for each step
3. Define the pipeline
4. Compile and run the pipeline

The build process can be iterative where 2, 3, and 4 and created and tested as iterative steps.

### 1 - Outline Pipeline

- Candidate Review path:
- Get list of candidate models: Vertex AI Model Registry where labels.series={SERIES} and version_alias=default
- Loop (async) over list of candidate models
    - BigqueryEvaluateModelJobOp
    - BigqueryMLConfusionMatrixJobOp
    - BigqueryMLRocCurveJobOp
- Pick the best candidate model
- Current model review path:
- Check for endpoint, create if needed
- Get the deployed model with most traffic, if any
    - BigqueryEvaluateModelJobOp
    - BigqueryMLConfusionMatrixJobOp
    - BigqueryMLRocCurveJobOp
- Compare And Update Path:
- Condition: if best > deployed then deploy
- test prediction on endpoint
- Condition: if prediction passes undeploy previous model, else move traffic back to previous model

### 2 - Prepare Components

#### Component 1

#### Component 2

#### Component 3

#### Component 4

#### Component 5

### 3 - Define Pipeline

In [None]:
@dsl.pipeline(
    name = f'series-{SERIES}-tournament',
    description = f'Update endpoint with best model.'
)


### 4 - Compile And Run Pipeline

#### Collect Inputs

In [None]:
query = f"""
CREATE OR REPLACE VIEW `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_TEST` AS
    SELECT * EXCEPT({','.join(VAR_OMIT.split())}, splits),
    FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
    WHERE splits = 'TEST'
"""
job = bq.query(query = query)
job.result()

In [None]:
parameter_values = {
    "project" : PROJECT_ID},
    "region" : REGION,
    "experiment" : EXPERIMENT,
    "series": SERIES,
    "bq_test": f'{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_TEST'
}

#### Compile Pipeline

In [None]:
kfp.v2.compiler.Compiler().compile(
    pipeline_func = pipeline,
    package_path = f"{DIR}/{NOTEBOOK}.json"
)

#### Define Pipeline Job
Using compiled pipeline:

In [None]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{NOTEBOOK}_tournament",
    template_path = f"{DIR}/{NOTEBOOK}.json",
    parameter_values = parameter_values,
    pipeline_root = ,
    enable_caching = False,
    labels = {}
)

#### Submit Pipeline Job

In [None]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

#### Wait On Pipeline Job

In [None]:
pipeline_job.wait()

### Get Parameters and Metrics From Pipeline Job

In [None]:
aiplatform.get_pipline_df(
    pipeline = 
)

---
## Remove Resources
see notebook "99 - Cleanup"