# AI Platform - Pipeline for Training and Serving

This notebook uses create a python package for a TensorFlow training project that uses the `<PROJECT_ID>.digits.digits_prepped` BigQuery table. AI Platform clients are used to setup an AI Platform custom training pipeline that runs the training job and uploads the resulting model.  Then AI Platform clients are used to deploy the model to an endpoint for online predictions.

**Prerequisites**
- `00 - Initial Setup`
- `01 - BigQuery - Data`
- `05 - AI Platform - Training Job and Serving` 
    - this notebook uses the python package created in 05
    
**Resources**
- Based on:
    - https://cloud.google.com/ai-platform-unified/docs/training/create-training-pipeline#custom-job-model-upload
- Using PipelineService:
    - https://googleapis.dev/python/aiplatform/latest/aiplatform_v1beta1/pipeline_service.html
    
---

## Custom Pipeline Job

Setup the environment:

In [18]:
from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
from datetime import datetime

Define Parameters:

In [19]:
# Locations
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
BUCKET_NAME='gs://statmike-models/digits/aip_train_job' #BUCKET_NAME
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
JOB_NAME='AIP_DIGITS_'+TIMESTAMP
MODEL_DIR = '{}/{}'.format(BUCKET_NAME, JOB_NAME)
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

# Resources
TRAIN_IMAGE='us-docker.pkg.dev/cloud-aiplatform/training/tf-cpu.2-4:latest'
DEPLOY_IMAGE ='us-docker.pkg.dev/cloud-aiplatform/prediction/tf2-cpu.2-3:latest'
TRAIN_COMPUTE='n1-standard-4'
DEPLOY_COMPUTE='n1-standard-4'

# TF Parameters to pass
EPOCHS = 25
BATCH_SIZE = 30

Create a client for the AIP Pipeline Service:

In [20]:
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)
client_options = {"api_endpoint": API_ENDPOINT}
clients = {}
clients['pipeline'] = aiplatform.gapic.PipelineServiceClient(client_options=client_options)

Define Training Job Parameters - to match the yaml spec in `gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml`)

In [21]:
MACHINE_SPEC = {
    "machineType": TRAIN_COMPUTE,
    "acceleratorCount": 0
}


CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE)
]

WORKER_POOL_SPEC = [
    {
        "replicaCount": 1,
        "machineSpec": MACHINE_SPEC,
        "pythonPackageSpec": {
            "executorImageUri": TRAIN_IMAGE,
            "packageUris": [BUCKET_NAME + "/trainer_cifar.tar.gz"],
            "pythonModule": "trainer.task",
            "args": CMDARGS
        }
    }
]

JOB_SPEC = {
    "workerPoolSpecs": WORKER_POOL_SPEC,
    "baseOutputDirectory": {"outputUriPrefix": MODEL_DIR}
    
}

CUSTOM_JOB = {
    "display_name": JOB_NAME,
    "job_spec": JOB_SPEC
}

Define Training Pipeline Parameters:

In [23]:
training_task_definition = "gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml"
training_task_inputs = json_format.ParseDict(JOB_SPEC, Value())

training_pipeline = {
    "display_name": JOB_NAME,
    "training_task_definition": training_task_definition,
    "training_task_inputs": training_task_inputs,
    "model_to_upload": {
        "display_name": JOB_NAME,
        "container_spec": {"image_uri": DEPLOY_IMAGE},
    },
}

Submit pipeline job:
- this create a pipeline job
    - which create a custom training job
    - and when complete, uploads a model

In [24]:
pipeline = clients['pipeline'].create_training_pipeline(parent=PARENT, training_pipeline=training_pipeline)

In [42]:
clients['pipeline'].get_training_pipeline(name=pipeline.name).state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

In [44]:
clients['pipeline'].get_training_pipeline(name=pipeline.name).model_to_upload.name

'projects/691911073727/locations/us-central1/models/2238799188198424576'

## Endpoint Creation

Create a client to the endpoint service:

In [37]:
clients['endpoint'] = aiplatform.gapic.EndpointServiceClient(client_options=client_options)

Create the endpoint:

In [38]:
ENDPOINT_NAME = 'ENDPOINT_'+JOB_NAME
endpoint = clients['endpoint'].create_endpoint(parent=PARENT, endpoint={"display_name": ENDPOINT_NAME})

In [39]:
endpoint_info = clients['endpoint'].get_endpoint(name=endpoint.result(timeout=180).name)
endpoint_info.name

'projects/691911073727/locations/us-central1/endpoints/1850715564058607616'

## Deploy Model to Endpoint

Setup Deployment Parameters:

In [45]:
MACHINE_SPEC = {
    "machine_type": DEPLOY_COMPUTE,
    "accelerator_count": 0,
}
DMODEL = {
        "model": clients['pipeline'].get_training_pipeline(name=pipeline.name).model_to_upload.name,
        "display_name": 'DEPLOYED_'+JOB_NAME,
        "dedicated_resources": {
            "min_replica_count": 1,
            "max_replica_count": 2,
            "machine_spec": MACHINE_SPEC
        }   
}
TRAFFIC = {
    '0' : 100
}

Deploy the Model to the Endpoint:

In [46]:
dmodel = clients['endpoint'].deploy_model(endpoint=endpoint_info.name, deployed_model=DMODEL, traffic_split=TRAFFIC)

In [47]:
dmodel_info = dmodel.result().deployed_model
dmodel_info.id

'5497646099810222080'

In [48]:
clients['endpoint'].get_endpoint(name=endpoint_info.name)

name: "projects/691911073727/locations/us-central1/endpoints/1850715564058607616"
display_name: "ENDPOINT_AIP_DIGITS_20210409142218"
deployed_models {
  id: "5497646099810222080"
  model: "projects/691911073727/locations/us-central1/models/2238799188198424576"
  display_name: "DEPLOYED_AIP_DIGITS_20210409142218"
  create_time {
    seconds: 1617979855
    nanos: 461607000
  }
  dedicated_resources {
    machine_spec {
      machine_type: "n1-standard-4"
    }
    min_replica_count: 1
    max_replica_count: 2
  }
}
traffic_split {
  key: "5497646099810222080"
  value: 100
}
etag: "AMEw9yMF8HoITBhm7DMXeam-BWvm8_AFjkyHLt02knf6_ZIddmFfWDqg4Hya5RtoK4Lw"
create_time {
  seconds: 1617979570
  nanos: 616844000
}
update_time {
  seconds: 1617980054
  nanos: 672091000
}

## Prediction

Create a client to the prediction service:

In [49]:
clients['prediction'] = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

In [55]:
endpoint_info.display_name

'ENDPOINT_AIP_DIGITS_20210409142218'

Setup an observation for prediction:

In [57]:
%%bigquery pred
SELECT *
FROM ML.PREDICT(MODEL `statmike-mlops.digits.digits_lr`,(
    SELECT *
    FROM `statmike-mlops.digits.digits_prepped`)
  )

In [65]:
newob = pred.loc[:0,'p0':'p63'].to_dict(orient='records')[0]
#newob

In [63]:
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

response = clients['prediction'].predict(endpoint=endpoint_info.name, instances=[json_format.ParseDict(newob, Value())], parameters=json_format.ParseDict({}, Value()))
prediction = response.predictions
prediction

[[0.999997139, 9.27718926e-12, 2.80958579e-06, 4.09633083e-09, 7.97816078e-13, 2.13014806e-09, 8.16476131e-08, 2.0208718e-10, 5.25585853e-10, 2.79572152e-08]]

In [64]:
import numpy as np
np.argmax(prediction[0])

0