# 05 - Continuous Training

After testing, compiling, and uploading the pipeline definition to Cloud Storage, the pipeline is executed with respect to a trigger. We use [Cloud Functions](https://cloud.google.com/functions) and [Cloud Pub/Sub](https://cloud.google.com/pubsub) as a triggering mechanism. The triggering can be scheduled using [Cloud Scheduler](https://cloud.google.com/scheduler). The trigger source sends a message to a Cloud Pub/Sub topic that the Cloud Function listens to, and then it submits the pipeline to AI Platform Managed Pipelines to be executed.

This notebook covers the following steps:
1. Create the Cloud Pub/Sub topic.
2. Deploy the Cloud Function 
3. Test triggering a pipeline.
4. Extracting pipeline run metadata.

## Setup

### Import libraries

In [7]:
%load_ext autoreload
%autoreload 2

import json
import os
import logging
import tensorflow as tf
import tfx
import IPython 

logging.getLogger().setLevel(logging.INFO)

print("Tensorflow Version:", tfx.__version__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Tensorflow Version: 1.8.0


### Setup Google Cloud project

In [8]:
PROJECT = 'pbalm-cxb-aa'
REGION = 'europe-west4'
CF_REGION = 'europe-west1' # No Cloud Functions in europe-west4

BUCKET =  PROJECT + '-eu'
SERVICE_ACCOUNT = "188940921537-compute@developer.gserviceaccount.com"

if PROJECT == "" or PROJECT is None or PROJECT == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT = shell_output[0]
    
if BUCKET == "" or BUCKET is None or BUCKET == "[your-bucket-name]":
    # Get your bucket name to GCP projet id
    BUCKET = PROJECT
    
if SERVICE_ACCOUNT == "" or SERVICE_ACCOUNT is None or SERVICE_ACCOUNT == "[your-service-account]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.account)' 2>/dev/null
    SERVICE_ACCOUNT = shell_output[0]

print("Project ID:", PROJECT)
print("Region:", REGION)
print("Bucket name:", BUCKET)
print("Service Account:", SERVICE_ACCOUNT)

Project ID: pbalm-cxb-aa
Region: europe-west4
Bucket name: pbalm-cxb-aa-eu
Service Account: 188940921537-compute@developer.gserviceaccount.com


### Set configurations

In [9]:
VERSION = 'v02'
DATASET_DISPLAY_NAME = 'creditcards'
MODEL_DISPLAY_NAME = f'{DATASET_DISPLAY_NAME}-classifier-{VERSION}'
PIPELINE_NAME = f'{MODEL_DISPLAY_NAME}-train-pipeline'

PIPELINES_STORE = f'gs://{BUCKET}/{DATASET_DISPLAY_NAME}/compiled_pipelines/'
GCS_PIPELINE_FILE_LOCATION = os.path.join(PIPELINES_STORE, f'{PIPELINE_NAME}.json')
PUBSUB_TOPIC = f'trigger-{PIPELINE_NAME}'
CLOUD_FUNCTION_NAME = f'trigger-{PIPELINE_NAME}-fn'

In [10]:
!gsutil ls {GCS_PIPELINE_FILE_LOCATION}

gs://pbalm-cxb-aa-eu/creditcards/compiled_pipelines/creditcards-classifier-v02-train-pipeline.json


## 1. Create a Pub/Sub topic

In [None]:
!gcloud pubsub topics create {PUBSUB_TOPIC}

## 2. Deploy the Cloud Function

The Cloud Function is going to be deployed to `CF_REGION` and the pipeline will be triggered in `REGION`.

In [17]:
ENV_VARS=f"""\
PROJECT={PROJECT},\
REGION={REGION},\
GCS_PIPELINE_FILE_LOCATION={GCS_PIPELINE_FILE_LOCATION},\
SERVICE_ACCOUNT={SERVICE_ACCOUNT},\
PIPELINE_NAME={PIPELINE_NAME}
"""

!echo {ENV_VARS}

PROJECT=pbalm-cxb-aa,REGION=europe-west4,GCS_PIPELINE_FILE_LOCATION=gs://pbalm-cxb-aa-eu/creditcards/compiled_pipelines/creditcards-classifier-v02-train-pipeline.json,SERVICE_ACCOUNT=188940921537-compute@developer.gserviceaccount.com,PIPELINE_NAME=creditcards-classifier-v02-train-pipeline


In [18]:
!rm -rf src/pipeline_triggering/.ipynb_checkpoints

In [21]:
!gcloud functions deploy {CLOUD_FUNCTION_NAME} \
    --region={CF_REGION} \
    --trigger-topic={PUBSUB_TOPIC} \
    --runtime=python37 \
    --source=src/pipeline_triggering\
    --entry-point=trigger_pipeline\
    --stage-bucket={BUCKET}\
    --ingress-settings=internal-only\
    --service-account={SERVICE_ACCOUNT}\
    --update-env-vars={ENV_VARS}

Deploying function (may take a while - up to 2 minutes)...⠹                    
For Cloud Build Logs, visit: https://console.cloud.google.com/cloud-build/builds;region=europe-west1/3449375a-e5ee-4e3a-8599-97bc78a17b7b?project=188940921537
Deploying function (may take a while - up to 2 minutes)...done.                
availableMemoryMb: 256
buildId: 3449375a-e5ee-4e3a-8599-97bc78a17b7b
buildName: projects/188940921537/locations/europe-west1/builds/3449375a-e5ee-4e3a-8599-97bc78a17b7b
dockerRegistry: CONTAINER_REGISTRY
entryPoint: trigger_pipeline
environmentVariables:
  GCS_PIPELINE_FILE_LOCATION: gs://pbalm-cxb-aa-eu/creditcards/compiled_pipelines/creditcards-classifier-v02-train-pipeline.json
  PIPELINE_NAME: creditcards-classifier-v02-train-pipeline
  PROJECT: pbalm-cxb-aa
  REGION: europe-west4
  SERVICE_ACCOUNT: 188940921537-compute@developer.gserviceaccount.com
eventTrigger:
  eventType: google.pubsub.topic.publish
  failurePolicy: {}
  resource: projects/pbalm-cxb-aa/topics/trigg

In [23]:
cloud_fn_url = f"https://console.cloud.google.com/functions/details/{CF_REGION}/{CLOUD_FUNCTION_NAME}"
html = f'See the Cloud Function details <a href="{cloud_fn_url}" target="_blank">here</a>.'
IPython.display.display(IPython.display.HTML(html))

## 3. Trigger the pipeline

In [39]:
from google.cloud import pubsub

publish_client = pubsub.PublisherClient()
topic = f'projects/{PROJECT}/topics/{PUBSUB_TOPIC}'
data = {
    'num_epochs': 7,
    'learning_rate': 0.0015,
    'batch_size': 512,
    'hidden_units': '256,126'
}
message = json.dumps(data)

_ = publish_client.publish(topic, message.encode())

Wait for a few seconds for the pipeline run to be submitted, then you can see the run in the Cloud Console

In [38]:
import google.cloud.aiplatform as vertex_ai

vertex_ai.init(project=PROJECT, location=REGION)
job = vertex_ai.PipelineJob.list(filter=f'display_name="{PIPELINE_NAME}"',
order_by='create_time desc')[0]

job_display_name = job.resource_name.split('/')[-1]
job_url = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/pipelines/runs/{job_display_name}"
html = f'See the Pipeline job <a href="{job_url}" target="_blank">here</a>.'
IPython.display.display(IPython.display.HTML(html))


## 4. Extracting pipeline runs metadata

In [40]:
from google.cloud import aiplatform as vertex_ai

pipeline_df = vertex_ai.get_pipeline_df(PIPELINE_NAME)
pipeline_df = pipeline_df[pipeline_df.pipeline_name == PIPELINE_NAME]
pipeline_df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
pipeline_name,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline
run_name,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...,creditcards-classifier-v02-train-pipeline-2022...
param.input:batch_size,512,512,512,512,,,,512,512,512,512,512
param.input:hidden_units,256126,256126,128128,128128,,,,128128,128128,128128,128128,128128
param.input:learning_rate,0.0015,0.0015,0.003,0.003,,,,0.003,0.003,0.003,0.003,0.003
param.input:num_epochs,7,7,30,30,,,,30,30,30,30,30
