## References

* [Cloud Scheduler Official Doc](https://cloud.google.com/scheduler/docs)
* [Cloud Scheduler RPC Spec](https://cloud.google.com/scheduler/docs/reference/rpc/google.cloud.scheduler.v1#google.cloud.scheduler.v1.CloudScheduler)
* [Cloud Scheduler Python API Docs](https://googleapis.dev/python/cloudscheduler/latest/scheduler_v1/cloud_scheduler.html?highlight=google%20cloud%20scheduler_v1#module-google.cloud.scheduler_v1.services.cloud_scheduler)
* [Cloud Scheduler Python Example](https://stackoverflow.com/questions/60681672/how-to-create-a-job-with-google-cloud-scheduler-python-api)

## Pre-requisites
This notebook only shows how to create jobs for Google Cloud Scheduler service. In order to reproduce the same output, the following pre-requisites must be met.
- [Cloud Build TFX Notebook](https://github.com/sayakpaul/CI-CD-for-Model-Training/blob/main/cloud_build_tfx.ipynb)
  - Build TFX pipeline, docker image which each TFX component will be run, and compile a TFX pipeline job spec.
- [Cloud Function Trigger Notebook](https://github.com/sayakpaul/CI-CD-for-Model-Training/blob/main/cloud_function_trigger.ipynb)
  - Create a Pub/Sub topic
  - Create and deploy Cloud Function to trigger Vertex AI pipeline by refering to the TFX pipeline job spec.

## Setting up
By installing `google-cloud-scheduler` Python package, you can create jobs for Cloud Scheduler programatically.

In [None]:
!pip install --upgrade -q google-cloud-scheduler

### ***Restart runtime (if you are using Colab)***

In [None]:
!gcloud init # only need if you are using Colab

In [None]:
# only need if you are using Colab
from google.colab import auth
auth.authenticate_user()

In [8]:
GOOGLE_CLOUD_PROJECT = "gcp-ml-172005"
GOOGLE_CLOUD_REGION = "us-central1"

PIPELINE_NAME = "penguin-vertex-training"
PUBSUB_TOPIC = f"trigger-{PIPELINE_NAME}"
SCHEDULER_JOB_NAME = "MLOpsJob"

## Create Pub/Sub Topic via CLI

#### Setup environment variable for GCP credentials which has a permission to the Google Cloud Scheduler
- You need to get and upload the credentials beforehand. Please refer to this [official document](https://cloud.google.com/run/docs/triggering/using-scheduler#create-service-account).
- `gcloud scheduler jobs` command will automatically recognize the environment variable, `GOOGLE_APPLICATION_CREDENTIALS`. Otherwise, you have to specify it explicitly.

In [None]:
!export GOOGLE_APPLICATION_CREDENTIALS="/home/jupyter/CI-CD-for-Model-Training/gcp-ml-172005-528977a75f85.json"

- The message comsumed by Cloud Function from Pub/Sub should be encoded with `json.dumps`. 
- The following `gcloud` command will schedule the job `*/3 * * * *` which means every three minutes. You don't want to schedule this often for real world project, but this value is set for a demonstration purpose only.

In [172]:
import json

data = '{"num_epochs": "3", "learning_rate": "1e-2"}'
data = json.dumps(data)

In [173]:
!gcloud scheduler jobs create pubsub $SCHEDULER_JOB_NAME --schedule "*/3 * * * *" --topic $PUBSUB_TOPIC --message-body $data

name: projects/gcp-ml-172005/locations/us-central1/jobs/MLOpsJob
pubsubTarget:
  data: eyJudW1fZXBvY2hzIjogIjMiLCAibGVhcm5pbmdfcmF0ZSI6ICIxZS0yIn0=
  topicName: projects/gcp-ml-172005/topics/trigger-penguin-vertex-training
retryConfig:
  maxBackoffDuration: 3600s
  maxDoublings: 16
  maxRetryDuration: 0s
  minBackoffDuration: 5s
schedule: '*/3 * * * *'
state: ENABLED
timeZone: Etc/UTC
userUpdateTime: '2021-08-26T17:41:58Z'


### See Its behaviour in Vertex AI Pipeline

![](https://i.ibb.co/GkHmwTL/Screen-Shot-2021-08-27-at-2-49-00-AM.png)

## Create Pub/Sub Topic Programatically

Let's see how we can do the same thing programatically in Python.

In [18]:
import json
from google.cloud import scheduler_v1
from google.cloud.scheduler_v1.types.target import PubsubTarget
from google.cloud.scheduler_v1.types.job import Job
from google.cloud.scheduler_v1.types.cloudscheduler import CreateJobRequest

client = scheduler_v1.CloudSchedulerClient.from_service_account_json(
    r"./gcp-ml-172005-528977a75f85.json")

There are three main differences compared to `gcloud` command.
- The message should be encoded in `utf-8`. This makes sure the message is encoded in bytes, and `data` parameter in `PubsubTarget` requires the message to be `bytes`.
- Pub/Sub topic name should follow the `"projects/<PROJECT-ID>/topics/<TOPIC-NAME>"` format.
- Scheduler Job name should follow the `"projects/<PROJECT-ID>/locations/<REGION-ID>/jobs/<JOB-NAME>"` format.

In [23]:
parent = client.common_location_path(GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_REGION)

data = {"num_epochs": "3", "learning_rate": "1e-2"}
data = json.dumps(data).encode('utf-8')
pubsub_target = PubsubTarget(
    topic_name=f"projects/{GOOGLE_CLOUD_PROJECT}/topics/{PUBSUB_TOPIC}", 
    data=data)

job = Job(name=f"projects/{GOOGLE_CLOUD_PROJECT}/locations/{GOOGLE_CLOUD_REGION}/jobs/traing_for_model", 
          pubsub_target=pubsub_target, 
          schedule="*/3 * * * *")

req = CreateJobRequest(parent=parent, job=job)

In [24]:
result_job = client.create_job(req)

In [25]:
result_job

name: "projects/gcp-ml-172005/locations/us-central1/jobs/traing_for_model"
pubsub_target {
  topic_name: "projects/gcp-ml-172005/topics/trigger-penguin-vertex-training"
  data: "{\"num_epochs\": \"3\", \"learning_rate\": \"1e-2\"}"
}
user_update_time {
  seconds: 1630031544
}
state: ENABLED
schedule: "*/3 * * * *"
time_zone: "UTC"