# Vertex AI - Pipeline for Training and Serving

This notebook creates a python package for a TensorFlow training project that uses the `<PROJECT_ID>.digits.digits_prepped` BigQuery table. Vertex AI clients are used to setup a Vertex AI custom training pipeline that runs the training job and uploads the resulting model.  Then Vertex AI clients are used to deploy the model to an endpoint for online predictions.

**Prerequisites**
- `00 - Initial Setup`
- `01 - BigQuery - Data`
- `05 - Vertex AI - Training Job and Serving` 
    - this notebook uses the python package created in 05
    
**Resources**
- Based on:
    - https://cloud.google.com/ai-platform-unified/docs/training/create-training-pipeline#custom-job-model-upload
- Using PipelineService:
    - https://googleapis.dev/python/aiplatform/latest/aiplatform_v1beta1/pipeline_service.html

**Overview**

<img src="architectures/statmike-mlops-06.png">

---
## Setup

Setup the environment:

In [1]:
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
from datetime import datetime

Define Parameters:

In [2]:
# Locations
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
BUCKET_NAME='gs://{}/digits/model/05_aip_train_job'.format(PROJECT_ID)
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
JOB_NAME='06_AIP_DIGITS_'+TIMESTAMP
MODEL_DIR = '{}/{}'.format(BUCKET_NAME, JOB_NAME)
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

# Resources
TRAIN_IMAGE='us-docker.pkg.dev/cloud-aiplatform/training/tf-cpu.2-4:latest'
DEPLOY_IMAGE ='us-docker.pkg.dev/cloud-aiplatform/prediction/tf2-cpu.2-3:latest'
TRAIN_COMPUTE='n1-standard-4'
DEPLOY_COMPUTE='n1-standard-4'

# TF Parameters to pass
EPOCHS = 25
BATCH_SIZE = 30

Setup AI Platform Python Clients
- https://googleapis.dev/python/aiplatform/latest/index.html

In [3]:
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)
client_options = {"api_endpoint": API_ENDPOINT}
clients = {}

---
## Custom Pipeline Job

Create a client for the AIP Pipeline Service:

In [4]:
clients['pipeline'] = aiplatform.gapic.PipelineServiceClient(client_options=client_options)

Define Training Job Parameters - to match the yaml spec in
- `gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml`

In [5]:
MACHINE_SPEC = {
    "machineType": TRAIN_COMPUTE,
    "acceleratorCount": 0
}


CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE)
]

WORKER_POOL_SPEC = [
    {
        "replicaCount": 1,
        "machineSpec": MACHINE_SPEC,
        "pythonPackageSpec": {
            "executorImageUri": TRAIN_IMAGE,
            "packageUris": [BUCKET_NAME + "/trainer_cifar.tar.gz"],
            "pythonModule": "trainer.task",
            "args": CMDARGS
        }
    }
]

JOB_SPEC = {
    "workerPoolSpecs": WORKER_POOL_SPEC,
    "baseOutputDirectory": {"outputUriPrefix": MODEL_DIR}
    
}

CUSTOM_JOB = {
    "display_name": JOB_NAME,
    "job_spec": JOB_SPEC
}

Define Training Pipeline Parameters:

In [6]:
training_task_definition = "gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml"
training_task_inputs = json_format.ParseDict(JOB_SPEC, Value())

training_pipeline = {
    "display_name": JOB_NAME,
    "training_task_definition": training_task_definition,
    "training_task_inputs": training_task_inputs,
    "model_to_upload": {
        "display_name": JOB_NAME,
        "container_spec": {"image_uri": DEPLOY_IMAGE},
    },
}

Submit pipeline job:
- this create a pipeline job
    - which create a custom training job
    - and when complete, uploads a model

In [7]:
pipeline = clients['pipeline'].create_training_pipeline(parent=PARENT, training_pipeline=training_pipeline)

In [11]:
clients['pipeline'].get_training_pipeline(name=pipeline.name).state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

In [12]:
clients['pipeline'].get_training_pipeline(name=pipeline.name).model_to_upload.name

'projects/691911073727/locations/us-central1/models/7544963139008200704'

---
## Endpoint Creation

Create a client to the endpoint service:

In [13]:
clients['endpoint'] = aiplatform.gapic.EndpointServiceClient(client_options=client_options)

Create the endpoint:

In [14]:
ENDPOINT_NAME = 'ENDPOINT_'+JOB_NAME
endpoint = clients['endpoint'].create_endpoint(parent=PARENT, endpoint={"display_name": ENDPOINT_NAME})

In [15]:
endpoint_info = clients['endpoint'].get_endpoint(name=endpoint.result(timeout=180).name)
endpoint_info.name

'projects/691911073727/locations/us-central1/endpoints/87943338036035584'

---
## Deploy Model to Endpoint

Setup Deployment Parameters:

In [16]:
MACHINE_SPEC = {
    "machine_type": DEPLOY_COMPUTE,
    "accelerator_count": 0,
}
DMODEL = {
        "model": clients['pipeline'].get_training_pipeline(name=pipeline.name).model_to_upload.name,
        "display_name": 'DEPLOYED_'+JOB_NAME,
        "dedicated_resources": {
            "min_replica_count": 1,
            "max_replica_count": 2,
            "machine_spec": MACHINE_SPEC
        }   
}
TRAFFIC = {
    '0' : 100
}

Deploy the Model to the Endpoint:

In [17]:
dmodel = clients['endpoint'].deploy_model(endpoint=endpoint_info.name, deployed_model=DMODEL, traffic_split=TRAFFIC)

In [18]:
dmodel_info = dmodel.result().deployed_model
dmodel_info.id

'4826460221750640640'

In [19]:
clients['endpoint'].get_endpoint(name=endpoint_info.name)

name: "projects/691911073727/locations/us-central1/endpoints/87943338036035584"
display_name: "ENDPOINT_06_AIP_DIGITS_20210812121445"
deployed_models {
  id: "4826460221750640640"
  model: "projects/691911073727/locations/us-central1/models/7544963139008200704"
  display_name: "DEPLOYED_06_AIP_DIGITS_20210812121445"
  create_time {
    seconds: 1628774930
    nanos: 24046000
  }
  dedicated_resources {
    machine_spec {
      machine_type: "n1-standard-4"
    }
    min_replica_count: 1
    max_replica_count: 2
  }
}
traffic_split {
  key: "4826460221750640640"
  value: 100
}
etag: "AMEw9yMYvxmDeVWRjkzTD41rI32VACOyBSdY-RVg9m40qek8OIyvNVHG2Whze3A0gqKz"
create_time {
  seconds: 1628774868
  nanos: 982105000
}
update_time {
  seconds: 1628775226
  nanos: 324167000
}

---
## Prediction

Create a client to the prediction service:

In [20]:
clients['prediction'] = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

Setup an observation for prediction:

In [21]:
%%bigquery pred
SELECT *
FROM `digits.digits_prepped`
WHERE splits='TEST'

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 780.34query/s] 
Downloading: 100%|██████████| 388/388 [00:02<00:00, 188.15rows/s]


In [22]:
pred.head(1)

Unnamed: 0,p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,...,p57,p58,p59,p60,p61,p62,p63,target,target_OE,SPLITS
0,0.0,0.0,0.0,0.0,7.0,16.0,6.0,0.0,0.0,0.0,...,0.0,0.0,0.0,9.0,16.0,6.0,0.0,1,Odd,TEST


In [23]:
newob = pred.loc[:0,'p0':'p63'].to_dict(orient='records')[0]
#newob

In [24]:
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

response = clients['prediction'].predict(endpoint=endpoint_info.name, instances=[json_format.ParseDict(newob, Value())], parameters=json_format.ParseDict({}, Value()))

In [25]:
response.predictions

[[1.89540026e-06, 0.969088256, 3.56145979e-09, 1.10171754e-06, 1.53606143e-05, 1.0783635e-07, 6.60791838e-11, 9.01938847e-07, 9.88870579e-06, 0.0308826156]]

In [26]:
import numpy as np
np.argmax(response.predictions[0])

1

---
## Remove Resources
see notebook "XX - Cleanup"