# BigQuery + Cloud Functions: how to run your queries as soon as a new Google Analytics table is available

https://towardsdatascience.com/bigquery-cloud-functions-how-to-run-your-queries-as-soon-as-a-new-google-analytics-table-is-17fbb62f8aaa



In [None]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

In [None]:
project_number=!gcloud projects describe $(gcloud config get-value project) --format='value(projectNumber)'
PROJECT_NUMBER = project_number[0]
PROJECT_NUMBER

In [29]:
LOCATION = "us-central1"
APP_NAME = "bq-eventarc-queries-demo"

SVC_ACCOUNT_NAME=APP_NAME
SVC_ACCOUNT_EMAIL=f"{APP_NAME}@{PROJECT_ID}.iam.gserviceaccount.com"

DATASET_NAME = "bq_eventarc_queries_demo"
TABLE_NAME = "loan_201"

TOPIC_NAME = "bq-load-events-topic"

BUCKET=APP_NAME

BUCKET_NAME=f"{PROJECT_ID}-{APP_NAME}"

In [8]:
DEFAULT_SVC_ACCOUNT_EMAIL=f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

In [None]:
#  Google Cloud Storage client
storage_client = storage.Client(project=PROJECT_ID)

## Setup 

### Google Cloud

In [None]:
def check_and_create_bucket(bucket_name, location):
    try:
        storage_client.get_bucket(bucket_name)
        print(f"Bucket {bucket_name} already exists.")
    except NotFound:
        bucket = storage_client.create_bucket(bucket_or_name=bucket_name, location=location)
        print(f"Bucket {bucket_name} created.")

In [None]:
check_and_create_bucket(BUCKET_NAME, LOCATION)

1. Setup GCP - run `00_setup_env.sh`  - enable APIs, create GCS bucket 
3. Setup BQ - run `01_setup_bq.sh` - ingest sample data to GCS bucket, create target BQ dat

#### Create Service account


refs 

* https://cloud.google.com/bigquery/docs/access-control

In [None]:
!gcloud iam service-accounts create $SVC_ACCOUNT_NAME --project $PROJECT_ID

In [None]:
## Grant the service account project editor permissions
## or `roles/bigquery.jobUser` if minimal required
!gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member "serviceAccount:{SVC_ACCOUNT_EMAIL}" \
  --role "roles/bigquery.admin" \
  --condition="None"

### Create BQ scheduled query 

#TODO @justinjm - create scheduled query as part of setup BQ

refs 

* https://cloud.google.com/bigquery/docs/scheduling-queries#python_1
* https://cloud.google.com/bigquery/docs/access-control
* https://cloud.google.com/iam/docs/manage-access-service-accounts#iam-view-access-sa-gcloud

API

* https://cloud.google.com/python/docs/reference/bigquerydatatransfer/latest/google.cloud.bigquery_datatransfer_v1.types.TransferConfig

In [None]:
## TODO - add steps to manually create via ui and then update code here  to list 
# and get the transferID for later on
# from google.cloud import bigquery_datatransfer

# transfer_client = bigquery_datatransfer.DataTransferServiceClient()

# # The project where the query job runs is the same as the project
# # containing the destination dataset.
# project_id = PROJECT_ID
# dataset_id = DATASET_NAME

# # This service account will be used to execute the scheduled queries. Omit
# # this request parameter to run the query as the user with the credentials
# # associated with this client.
# service_account_name = SVC_ACCOUNT_EMAIL

# # Use standard SQL syntax for the query.
# query_string = f"""
# SELECT * FROM `{PROJECT_ID}.{DATASET_NAME}.{TABLE_NAME}` LIMIT 10
# """

# parent = transfer_client.common_project_path(project_id)

# transfer_config = bigquery_datatransfer.TransferConfig(
#     destination_dataset_id=dataset_id,
#     display_name="bq-eventarc-driven-query-demo",
#     data_source_id="scheduled_query",
#     params={
#         "query": query_string,
#         "destination_table_name_template": "processed_{run_date}",
#         "write_disposition": "WRITE_TRUNCATE",

#     },
#     # schedule=None,
# )

# transfer_config = transfer_client.create_transfer_config(
#     bigquery_datatransfer.CreateTransferConfigRequest(
#         parent=parent,
#         transfer_config=transfer_config,
#         service_account_name=service_account_name,
#     )
# )

# print("Created scheduled query '{}'".format(transfer_config.name))

## Setup Query Workflow 

### Configure cloud logging filter 


Demo Version:

```txt
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId="bq_eventarc_queries_demo"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.projectId="demos-vertex-ai"
protoPayload.methodName="jobservice.jobcompleted"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:"loan_201"
```


Google Analytics Version: 

```txt
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId="[REPLACE_WITH_YOUR_DATASET_ID]"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.projectId="REPLACE_WITH_YOUR_PROJECT_ID"
protoPayload.authenticationInfo.principalEmail="analytics-processing-dev@system.gserviceaccount.com"
protoPayload.methodName="jobservice.jobcompleted"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:"ga_sessions"
NOT protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:"ga_sessions_intraday"
```

refs

* https://cloud.google.com/logging/docs/view/building-queries
* https://cloud.google.com/logging/docs/view/logging-query-language

In [None]:
## Create Pub/Sub topic
!gcloud pubsub topics create $TOPIC_NAME

In [None]:
## create log sink filter based on query above 
!gcloud logging sinks create bq-load-events-sink "pubsub.googleapis.com/projects/${PROJECT_ID}/topics/${TOPIC_NAME}" \
    --log-filter='protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId=\"bq_eventarc_queries_demo\" AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.projectId=\"demos-vertex-ai\" AND protoPayload.methodName=\"jobservice.jobcompleted\" AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId:\"loan_201\"'

In [None]:
# grant `serviceAccount:service-PROJECT_NUMBER@gcp-sa-logging.iam.gserviceaccount.com` the Pub/Sub Publisher role on the topic.
# More information about sinks can be found at https://cloud.google.com/logging/docs/export/configure_export
!gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-logging.iam.gserviceaccount.com" \
  --role="roles/pubsub.publisher" \
  --project=$PROJECT_ID \
  --condition=None

### Create Cloud Function 


Create and deploy a Cloud function from the source code in the [functions](functions/) directory.



### Create CF files 

First we create necessary files 

In [None]:
# ! rm -rf functions/
!mkdir functions

In [None]:
%%writefile functions/main.py 
import time
from google.protobuf.timestamp_pb2 import Timestamp
from google.cloud import bigquery_datatransfer_v1

def runQuery (parent, requested_run_time):
    client = bigquery_datatransfer_v1.DataTransferServiceClient()
    projectid = '746038361521' # Enter your projectID here
    transferid = '670dbb89-0000-27e1-9de1-883d24f77884'  # Enter your transferId here
    parent = client.project_transfer_config_path(projectid, transferid)
    start_time = bigquery_datatransfer_v1.types.Timestamp(seconds=int(time.time() + 10))
    response = client.start_manual_transfer_runs(parent, requested_run_time=start_time)
    print(response)
    
# do not forget to put google-cloud-bigquery-datatransfer==1 in the requirements.txt

In [None]:
%%writefile functions/requirements.txt
google-cloud-bigquery-datatransfer==1

### Deploy Cloud Function

In [None]:
!gcloud functions deploy bq-eventarc-driven-queries-demo \
  --gen2 \
  --region=us-central1 \
  --runtime=python311 \
  --source=functions/ \
  --entry-point=runQuery \
  --trigger-topic=$TOPIC_NAME \
  --timeout=540 \
  --no-allow-unauthenticated

#### Test

By creating a new table in BQ 

In [51]:
!DATA_FILE_CSV="loan_201.csv"
# copied from above to match args file
!BQ_DATASET="bq_eventarc_queries_demo"
!BQ_TABLE_DATA="loan_201"

In [None]:
### create table - target table
!bq load \
    --autodetect=TRUE \
    --skip_leading_rows=1 \
    bq_eventarc_queries_demo.loan_201 \
    gs://demos-vertex-ai-bq-eventarc-driven-queries/$DATA_FILE_CSV