![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2Farchitectures%2Ftracking%2Fsetup%2Fgithub&dt=GitHub+Metrics+-+3+-+Commits+-+Incremental+Update+Cloud+Function.ipynb)

# GitHub Metrics: Commit History Cloud Function For Incremental Updates

The notebooks for commit history in steps 1 and 2 created and initial setup of tables in the BigQuery datasets `github_metrics` and `reporting`.  Tables are `commits` and `commits_files` within both.  The logic for incrementally updating these is also tested and developed in those notebooks.

This notebook creates a Cloud Function to run the code that incrementally updates the tables in the datasets and schedules it to run daily using Pub/Sub and Cloud Scheduler.

**Notes**
- The Pub/Sub topic is shared between all step 3 notebooks
- The Cloud Scheduler is shared between all step 3 notebooks

---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/tracking/setup/github/GitHub%20Metrics%20-%203%20-%20Commits%20-%20Incremental%20Update%20Cloud%20Function.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [None]:
PROJECT_ID = 'vertex-ai-mlops-369716' # replace with project ID

In [None]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

Updated property [core/project].


---
## Package Installs (if needed)

This notebook uses the Python Clients for
- Google Service Usage
    - to enable APIs
- Cloud Pub/Sub
- Cloud Functions
- Cloud Scheduler

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These installs must be completed before continuing this notebook.

In [None]:
try:
    import google.cloud.service_usage_v1
except ImportError:
    print('You need to pip install google-cloud-service-usage')
    !pip install google-cloud-service-usage -q

try:
    import google.cloud.pubsub
except ImportError:
    print('You need to pip install google-cloud-pubsub')
    !pip install google-cloud-pubsub -q

try:
    import google.cloud.functions
except ImportError:
    print('You need to pip install google-cloud-functions')
    !pip install google-cloud-functions -q

try:
    import google.cloud.scheduler
except ImportError:
    print('You need to pip install google-cloud-scheduler')
    !pip install google-cloud-scheduler -q

You need to pip install google-cloud-service-usage
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 KB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hYou need to pip install google-cloud-pubsub
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.0/243.0 KB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hYou need to pip install google-cloud-functions
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.6/125.6 KB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hYou need to pip install google-cloud-scheduler
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.5/98.5 KB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h

---
## Setup

In [None]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'vertex-ai-mlops-369716'

In [None]:
REGION = 'us-central1'

github_user = 'statmike'
github_repo = 'vertex-ai-mlops'

BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'github_metrics'

In [None]:
import requests
import json
import time
from datetime import datetime
import pandas as pd
from io import StringIO
import os, shutil

from google.cloud import bigquery
from google.cloud import storage

from google.cloud import service_usage_v1
from google.cloud import pubsub_v1
from google.cloud import functions_v1
from google.cloud import scheduler_v1

In [None]:
bq = bigquery.Client(project = PROJECT_ID)
gcs = storage.Client(project = PROJECT_ID)

su_client = service_usage_v1.ServiceUsageClient()
pubsub_pubclient = pubsub_v1.PublisherClient() 
functions_client = functions_v1.CloudFunctionsServiceClient()
scheduler_client = scheduler_v1.CloudSchedulerClient()

In [None]:
DIR = 'temp'
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Enable APIs

Using Cloud Functions, Cloud Pub/Sub and Cloud Scheduler requires enabling these APIs for the Google Cloud Project.  Additionally, Cloud Functions uses Cloud Build which also need to be enabled.

Options for enabeling these.  In this notebook option 2 is used.
 1. Use the APIs & Services page in the console: https://console.cloud.google.com/apis
     - `+ Enable APIs and Services`
     - Search for Cloud Build and Enable
     - Search for Artifact Registry and Enable
 2. Use [Google Service Usage](https://cloud.google.com/service-usage/docs) API from Python
     - [Python Client For Service Usage](https://github.com/googleapis/python-service-usage)
     - [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/serviceusage/latest)
     
The following code cells use the Service Usage Client to:
- get the state of the service
- if 'DISABLED':
    - Try enabling the service and return the state after trying
- if 'ENABLED' print the state for confirmation

### IAM
The API may be needed for creating a service account to run the Cloud Function

In [None]:
iam = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/iam.googleapis.com'
    )
).state.name


if iam == 'DISABLED':
    print(f'IAM is currently {iam} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/iam.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'IAM is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'IAM already enabled for project: {PROJECT_ID}')

IAM already enabled for project: vertex-ai-mlops-369716


### Cloud Pub/Sub

In [None]:
pubsub = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/pubsub.googleapis.com'
    )
).state.name


if pubsub == 'DISABLED':
    print(f'Cloud Pub/Sub is currently {pubsub} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/pubsub.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Pub/Sub is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Pub/Sub already enabled for project: {PROJECT_ID}')

Cloud Pub/Sub already enabled for project: vertex-ai-mlops-369716


### Cloud Functions

In [None]:
cloudfunctions = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudfunctions.googleapis.com'
    )
).state.name


if cloudfunctions == 'DISABLED':
    print(f'Cloud Functions is currently {cloudfunctions} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudfunctions.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Functions is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Functions already enabled for project: {PROJECT_ID}')

Cloud Functions already enabled for project: vertex-ai-mlops-369716


### Cloud Scheduler

In [None]:
cloudscheduler = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudscheduler.googleapis.com'
    )
).state.name


if cloudscheduler == 'DISABLED':
    print(f'Cloud Scheduler is currently {cloudscheduler} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudscheduler.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Scheduler is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Scheduler already enabled for project: {PROJECT_ID}')

Cloud Scheduler already enabled for project: vertex-ai-mlops-369716


### Cloud Build 
Used By Cloud Functions

In [None]:
cloudbuild = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
    )
).state.name


if cloudbuild == 'DISABLED':
    print(f'Cloud Build is currently {cloudbuild} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Build is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Build already enabled for project: {PROJECT_ID}')

Cloud Build already enabled for project: vertex-ai-mlops-369716


---
## Pub/Sub
Use a Pub/Sub topic to trigger a Cloud Function to run.  The topic will be able to receive message manually or on a schedule from Cloud Scheduler.

The main concepts:
- Topic - a feed of messages
     - Publish - send a new message to a topic
     - Subscription - receive messages that arrive on topic
          - Push - the subscriber has new messages pushed to it
          - Pull - the subscriber request new messages by pulling them
          
In this example, a topic will be set up for daily runs of metric functions.  Publishing a new message to this topic will trigger one or more Cloud Functions to run like the one setup below.  The Cloud Funtion will have a push subscription to the topic.

In [None]:
PUBSUB_TOPIC = 'daily_metrics_triggers'

In [None]:
topic = ''
for topic in pubsub_pubclient.list_topics(project = f'projects/{PROJECT_ID}'):
    if topic.name.endswith(PUBSUB_TOPIC):
        break
    else: topic = ''

In [None]:
if topic:
    print(topic)
else:
    topic = pubsub_pubclient.create_topic(
        name = pubsub_pubclient.topic_path(PROJECT_ID, PUBSUB_TOPIC)
    )
    print(topic)

name: "projects/vertex-ai-mlops-369716/topics/daily_metrics_triggers"



In [None]:
print(f'Review The Pub/Sub Topic In The Console:\nhttps://console.cloud.google.com/cloudpubsub/topic/list?project={PROJECT_ID}')

Review The Pub/Sub Topic In The Console:
https://console.cloud.google.com/cloudpubsub/topic/list?project=vertex-ai-mlops-369716


---
## Cloud Function

Create a Cloud Funtion that runs the incremental update code for the tables in the dataset `github_metrics`.  The method below creates code files and zips them for storage on Cloud Storage as a source to the Cloud Function.

### Create Files

In [None]:
if os.path.exists(f'{DIR}/function'): shutil.rmtree(f'{DIR}/function')
os.makedirs(f'{DIR}/function')

In [None]:
%%writefile {DIR}/function/requirements.txt
pandas
db-dtypes
google-cloud-bigquery
google-cloud-storage

Writing temp/function/requirements.txt


In [None]:
%%writefile {DIR}/function/main.py

# packages
import base64
import requests
import json
import pandas as pd
import os
from google.cloud import bigquery
#from google.cloud import storage


# clients
bq = bigquery.Client()
#gcs = storage.Client()


# parameters
github_user = 'statmike'
github_repo = 'vertex-ai-mlops'
BQ_DATASET = 'github_metrics'
github_api_url = f'https://api.github.com/repos/{github_user}/{github_repo}'
pat = os.getenv('GITHUB_PAT')


# helper function
def metric_get(metric_type):
  response = requests.get(f'{github_api_url}/{metric_type}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
  while response.status_code == 202:
      time.sleep(10)
      response = requests.get(f'{github_api_url}/{metric_type}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
  return response


# the function
def collect(event, context):

    # print inputs to Cloud Function
    function_inputs = json.loads(base64.b64decode(event['data']).decode('utf-8'))
    print(function_inputs)
    PROJECT_ID = function_inputs['PROJECT_ID']
    BQ_PROJECT = PROJECT_ID

    # START: Content from notebook: GitHub Metrics - 1 - Commits

    # Read Commits From GitHub
    page_size = 100
    page = 1
    raw_commits = []
    while page_size == 100:
      response = requests.get(f'{github_api_url}/commits?per_page=100&page={page}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
      new_page = json.loads(response.text)
      if response.status_code == 200:
        raw_commits += new_page
        page_size = len(new_page)
        page +=1
      else: break

    # parse Commits into list of dicts: one for each commit
    commits = []
    for i, c in enumerate(raw_commits):
      author = c['commit']['author']['name']
      author2 = ''
      if 'author' in c and c['author']:
        if 'login' in c['author']: author2 = c['author']['login']

      # refined author with logic:
      if author2: refined_author = author2
      else: refined_author = author 

      commits += [{
          'sha': c['sha'],
          'datetime': c['commit']['committer']['date'],
          'url': c['html_url'],
          'message': c['commit']['message'],
          'author': refined_author
      }]

    # create pandas dataframe of commits
    commits = pd.DataFrame(commits)
    #commits['datetime'] = pd.to_datetime(commits['datetime'], infer_datetime_format = True)
    commits.loc[commits['author'].str.lower() == 'mike henderson', 'author'] = 'statmike'

    # look through prior commits, make a list of which commits are new
    prior_commits = bq.query(query = f"""SELECT sha FROM `{BQ_PROJECT}.{BQ_DATASET}.commits`""").to_dataframe()
    new_commits = pd.merge(prior_commits, commits, on = 'sha', how = 'outer', indicator = True)
    new_commits = new_commits[new_commits['_merge'] == 'right_only'].drop('_merge', axis = 1)

    # if new commits then update to BigQuery
    if new_commits.shape[0] > 0:
      new_commits_job = bq.load_table_from_dataframe(
          dataframe = new_commits,
          destination = bigquery.TableReference.from_string(f"{BQ_PROJECT}.{BQ_DATASET}.commits"),
          job_config = bigquery.LoadJobConfig(
              write_disposition = 'WRITE_APPEND', # WRITE_TRUNCATE = replace if exists, WRITE_APPEND = append if exists, WRITE_EMPTY = write new but dont overwrite
              autodetect = True # detect schema
          )
      )
      new_commits_job.result()

    # if new commits, retrieve files associated with the commits, parse the data into dataframe, append to BigQuery table
    if new_commits.shape[0] > 0:
      # retrieve files for new commits:
      sha = list(new_commits['sha'])
      raw_files = []
      for s in sha:
        page = 1
        response = requests.get(f'{github_api_url}/commits/{s}?per_page=100&page={page}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
        files = json.loads(response.text)['files']
        if len(files) == 100:
          while len(files) % 100 == 0:
            page += 1
            response = requests.get(f'{github_api_url}/commits/{s}?per_page=100&page={page}', headers = {'Authorization': f'Bearer {pat}', 'Accept': 'application/vnd.github+json'})
            files += json.loads(response.text)['files']
        raw_files += [{'sha': s, 'files': files}]

      # parse data for files and combine with commit data in dataframe
      commits_files = []
      for c in raw_files:
        for f in c['files']:
          commits_files += [{
              'sha': c['sha'],
              'file_sha': f['sha'],
              'file': f"{github_user}/{github_repo}/{f['filename']}",
              'additions': f['additions'],
              'deletions': f['deletions']
          }]
      commits_files = pd.DataFrame(commits_files)
      commits_files = pd.merge(new_commits, commits_files, on = 'sha', how = 'inner')

      # append files for new commits to BigQuery table
      new_commits_files_job = bq.load_table_from_dataframe(
          dataframe = commits_files,
          destination = bigquery.TableReference.from_string(f"{BQ_PROJECT}.{BQ_DATASET}.commits_files"),
          job_config = bigquery.LoadJobConfig(
              write_disposition = 'WRITE_APPEND', # WRITE_TRUNCATE = replace if exists, WRITE_APPEND = append if exists, WRITE_EMPTY = write new but dont overwrite
              autodetect = True, # detect schema
          )
      )
      new_commits_files_job.result()

    # START: Content from notebook: GitHub Metrics - 2 - Commits

    # update reporting tables (this was a BQ Scheduled Query)
    query = f"""
      INSERT INTO `vertex-ai-mlops-369716.reporting.commits_files`
        WITH
          CURRENT_COMMITS AS (SELECT sha FROM `vertex-ai-mlops-369716.reporting.commits`),
          SOURCE_COMMITS AS (SELECT sha FROM `vertex-ai-mlops-369716.github_metrics.commits`),
          NEW_COMMITS AS (SELECT SOURCE_COMMITS.sha FROM SOURCE_COMMITS WHERE NOT EXISTS (SELECT CURRENT_COMMITS.sha FROM CURRENT_COMMITS WHERE SOURCE_COMMITS.sha = CURRENT_COMMITS.sha)),
          RAW_COMMITS AS (SELECT * FROM NEW_COMMITS LEFT OUTER JOIN `vertex-ai-mlops-369716.github_metrics.commits_files` USING(sha))
        SELECT
          * EXCEPT(datetime),
          DATETIME(TIMESTAMP(datetime)) AS datetime
        FROM RAW_COMMITS
        ORDER BY datetime
      ;
      INSERT INTO `vertex-ai-mlops-369716.reporting.commits`
        WITH
          CURRENT_COMMITS AS (SELECT sha FROM `vertex-ai-mlops-369716.reporting.commits`),
          SOURCE_COMMITS AS (SELECT sha FROM `vertex-ai-mlops-369716.github_metrics.commits`),
          NEW_COMMITS AS (SELECT SOURCE_COMMITS.sha FROM SOURCE_COMMITS WHERE NOT EXISTS (SELECT CURRENT_COMMITS.sha FROM CURRENT_COMMITS WHERE SOURCE_COMMITS.sha = CURRENT_COMMITS.sha)),
          RAW_COMMITS AS (SELECT * FROM NEW_COMMITS LEFT OUTER JOIN `vertex-ai-mlops-369716.github_metrics.commits` USING(sha))
        SELECT
          * EXCEPT(datetime),
          DATETIME(TIMESTAMP(datetime)) AS datetime
        FROM RAW_COMMITS
        ORDER BY datetime
      ;
    """
    job = bq.query(query = query)
    job.result()
    print(job.state)

Writing temp/function/main.py


In [None]:
!ls {DIR}/function

main.py  requirements.txt


### Zip Files

In [None]:
import zipfile
with zipfile.ZipFile(f'{DIR}/function/function_commit.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/function/main.py', 'main.py')
    archive.write(f'{DIR}/function/requirements.txt', 'requirements.txt')

In [None]:
!ls {DIR}/function

function_commit.zip  main.py  requirements.txt


In [None]:
with zipfile.ZipFile(f'{DIR}/function/function_commit.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2023-02-28 12:16:14         7082
requirements.txt                               2023-02-28 12:16:12           60


### Move Files to GCS

Expects a bucket with the same name as the project:

In [None]:
bucket = gcs.bucket(PROJECT_ID)

In [None]:
SOURCEPATH = f'architectures/tracking/setup/github'
blob = bucket.blob(f'{SOURCEPATH}/function_commit.zip')
blob.upload_from_filename(f'{DIR}/function/function_commit.zip')

In [None]:
list(bucket.list_blobs(prefix = f'{SOURCEPATH}'))

[<Blob: vertex-ai-mlops-369716, architectures/tracking/setup/github/function_commit.zip, 1677586594050541>]

In [None]:
print(f"View the bucket directly here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SOURCEPATH};tab=objects&project={PROJECT_ID}")

View the bucket directly here:
https://console.cloud.google.com/storage/browser/vertex-ai-mlops-369716/architectures/tracking/setup/github;tab=objects&project=vertex-ai-mlops-369716


### Service Account
The Cloud Function will run as a service account.  Retrieve the default app engine service account and check its permissions.  It needs to be able to read/write to BigQuery and read secrets from the secret manager.

I used the Console to create a service account for these jobs:
- Console > IAM > Service Accounts
- Create New: name = `metrics-runner`
- roles = BigQuery Admin, Secret Accessor



In [None]:
print(f'Review Service Account Details in Console:\nhttps://console.cloud.google.com/iam-admin/serviceaccounts?project={PROJECT_ID}')

Review Service Account Details in Console:
https://console.cloud.google.com/iam-admin/serviceaccounts?project=vertex-ai-mlops-369716


### Create (or Update) Cloud Function

In [None]:
function_name = f'github_metrics_commits'

In [None]:
function = ''
for function in functions_client.list_functions(request = functions_v1.ListFunctionsRequest(parent = f'projects/{PROJECT_ID}/locations/{REGION}')):
    if function.name.endswith(function_name):
        break
    else: function = ''

In [None]:
function

name: "projects/vertex-ai-mlops-369716/locations/us-central1/functions/github_metrics_commits"
source_archive_url: "gs://vertex-ai-mlops-369716/architectures/tracking/setup/github/function_commit.zip"
event_trigger {
  event_type: "providers/cloud.pubsub/eventTypes/topic.publish"
  resource: "projects/vertex-ai-mlops-369716/topics/daily_metrics_triggers"
  service: "pubsub.googleapis.com"
  failure_policy {
  }
}
status: ACTIVE
entry_point: "collect"
timeout {
  seconds: 420
}
available_memory_mb: 256
service_account_email: "metrics-runner@vertex-ai-mlops-369716.iam.gserviceaccount.com"
update_time {
  seconds: 1676847212
  nanos: 97000000
}
version_id: 5
runtime: "python310"
max_instances: 3000
ingress_settings: ALLOW_ALL
build_id: "0354f03f-c84c-497c-8ed3-5631cce37f1e"
secret_environment_variables {
  key: "GITHUB_PAT"
  project_id: "807305962454"
  secret: "github_api"
  version: "latest"
}
build_name: "projects/807305962454/locations/us-central1/builds/0354f03f-c84c-497c-8ed3-5631c

In [None]:
from google.protobuf.duration_pb2 import Duration

functionDef = functions_v1.CloudFunction()
functionDef.name = f'projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name}'
functionDef.source_archive_url = f"gs://{PROJECT_ID}/{SOURCEPATH}/function_commit.zip"
functionDef.event_trigger = functions_v1.EventTrigger()
functionDef.event_trigger.event_type = 'providers/cloud.pubsub/eventTypes/topic.publish'
functionDef.event_trigger.resource = topic.name
functionDef.runtime = 'python310'
functionDef.entry_point = 'collect'
functionDef.timeout = Duration(seconds = 420)
functionDef.service_account_email = f"metrics-runner@{PROJECT_ID}.iam.gserviceaccount.com"
functionDef.secret_environment_variables = [functions_v1.SecretEnvVar(
    key = 'GITHUB_PAT',
    secret = 'github_api'
)]

In [None]:
functionDef

name: "projects/vertex-ai-mlops-369716/locations/us-central1/functions/github_metrics_commits"
source_archive_url: "gs://vertex-ai-mlops-369716/architectures/tracking/setup/github/function_commit.zip"
event_trigger {
  event_type: "providers/cloud.pubsub/eventTypes/topic.publish"
  resource: "projects/vertex-ai-mlops-369716/topics/daily_metrics_triggers"
}
entry_point: "collect"
timeout {
  seconds: 420
}
service_account_email: "metrics-runner@vertex-ai-mlops-369716.iam.gserviceaccount.com"
runtime: "python310"
secret_environment_variables {
  key: "GITHUB_PAT"
  secret: "github_api"
}

In [None]:
if function:
    request = functions_v1.UpdateFunctionRequest(
        function = functionDef
    )
    operation = functions_client.update_function(request = request)
else:
    request = functions_v1.CreateFunctionRequest(
        location = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef
    )
    operation = functions_client.create_function(request = request)

In [None]:
response = operation.result()
print(response)

name: "projects/vertex-ai-mlops-369716/locations/us-central1/functions/github_metrics_commits"
source_archive_url: "gs://vertex-ai-mlops-369716/architectures/tracking/setup/github/function_commit.zip"
event_trigger {
  event_type: "providers/cloud.pubsub/eventTypes/topic.publish"
  resource: "projects/vertex-ai-mlops-369716/topics/daily_metrics_triggers"
  service: "pubsub.googleapis.com"
  failure_policy {
  }
}
status: ACTIVE
entry_point: "collect"
timeout {
  seconds: 420
}
available_memory_mb: 256
service_account_email: "metrics-runner@vertex-ai-mlops-369716.iam.gserviceaccount.com"
update_time {
  seconds: 1677586751
  nanos: 693000000
}
version_id: 6
runtime: "python310"
max_instances: 3000
ingress_settings: ALLOW_ALL
build_id: "c24d64d5-c363-4e84-9b3a-03e016cd363e"
secret_environment_variables {
  key: "GITHUB_PAT"
  project_id: "807305962454"
  secret: "github_api"
  version: "latest"
}
build_name: "projects/807305962454/locations/us-central1/builds/c24d64d5-c363-4e84-9b3a-03e0

In [None]:
print(f'Review the Cloud Function in the console here:\nhttps://console.cloud.google.com/functions/list?env=gen1&project={PROJECT_ID}')

Review the Cloud Function in the console here:
https://console.cloud.google.com/functions/list?env=gen1&project=vertex-ai-mlops-369716


### Manual Run of Cloud Function

Publish a message to the Pub/Sub topic that will cause the Cloud Function to initiate training.  The code below could be anywhere you want to trigger training!

The function will receive the message as `event` in the format:
```
{
    '@type': 'type.googleapis.com/google.pubsub.v1.PubsubMessage',
    'attributes': {'key' : 'value', ...},
    'data': <base64 encoded string>
}
```

To handle the `event` and retrieve the inputs of the message three things need to happen:
1. reference the 'data' value as `event['data']`
2. decode the 'data' value with `base64.b64decode(<1>).decode('utf-8')`
3. convert the decoded string into a Python dictionary with `json.loads(<2>)`

This looks like:
```
funtion_inputs = json.loads(base64.b64decode(event['data']).decode('utf-8'))
```

In [None]:
function_input = {
    'PROJECT_ID': PROJECT_ID
}

In [None]:
message = json.dumps(function_input)
message = message.encode('utf-8')

In [None]:
future = pubsub_pubclient.publish(topic.name, message, trigger = 'manual')

In [None]:
future.result()

'7043211265006086'

---
## Scheduled Run with Cloud Scheduler

Use Cloud Scheduler to publish a message to the topic at any defined interval which will cause the Cloud Function to initiate training.

Resources:
- List of Time zones - [TZ Database Names](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)
- Job Frequency - [unix-cron format guide](https://man7.org/linux/man-pages/man5/crontab.5.html)
    - minute hour day_of_month month day_of_week
    - 0 23 * * tue = 11PM every Tuesday


In [None]:
schedule_name = 'daily_3am_est'

In [None]:
schedule = ''
for schedule in scheduler_client.list_jobs(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if schedule.name.endswith(schedule_name):
        break
    else: schedule = ''

In [None]:
if schedule:
    print(schedule)
else:
    request = scheduler_v1.CreateJobRequest(
        parent = f'projects/{PROJECT_ID}/locations/{REGION}',
        job = scheduler_v1.Job(
            name = f'projects/{PROJECT_ID}/locations/{REGION}/jobs/{schedule_name}',
            pubsub_target = scheduler_v1.PubsubTarget(
                topic_name = topic.name,
                data = message,
                attributes = {'trigger': 'scheduled'}
            ),
            schedule = '0 3 * * *',
            time_zone = 'America/New_York'
        )
    )
    schedule = scheduler_client.create_job(request = request)
    print(schedule)

name: "projects/vertex-ai-mlops-369716/locations/us-central1/jobs/daily_3am_est"
pubsub_target {
  topic_name: "projects/vertex-ai-mlops-369716/topics/daily_metrics_triggers"
  data: "{\"PROJECT_ID\": \"vertex-ai-mlops-369716\"}"
  attributes {
    key: "trigger"
    value: "scheduled"
  }
}
user_update_time {
  seconds: 1676847544
}
state: ENABLED
schedule: "0 3 * * *"
time_zone: "America/New_York"



In [None]:
print(f'Review the schedule in the console:\nhttps://console.cloud.google.com/cloudscheduler?project={PROJECT_ID}')

Review the schedule in the console:
https://console.cloud.google.com/cloudscheduler?project=vertex-ai-mlops-369716
