This SDK provides a framework to process BigQuery Data Transfer Service Transfer Run requests.
You will need to do the following:
-
Install the gcloud SDK
-
Enable the following APIs in the Google Developers Console.
- BigQuery Data Transfer API
- Cloud Pub/Sub API
gcloud services enable bigquerydatatransfer.googleapis.com gcloud services enable pubsub.googleapis.com
-
Create an operational IAM Service Account and download credentials for running your source. Running these commands
- Creates a new Service Account named
bq-dts-[SOURCE]@[PROJECT_ID].iam.gserviceaccount.com
- Grants
roles/bigquery.admin
,roles/pubsub.subscriber
,roles/storage.objectAdmin
- Downloads a Service-Account key called
.gcp-service-account.json
SOURCE="example-source" PROJECT_ID=$(gcloud config get-value core/project) PARTNER_SA_NAME="bq-dts-${SOURCE}" PARTNER_SA_EMAIL="${PARTNER_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" # Creating a Service Account gcloud iam service-accounts create ${PARTNER_SA_NAME} --display-name ${PARTNER_SA_NAME} # Granting Service Account required roles gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${PARTNER_SA_EMAIL}" --role='roles/bigquery.admin' gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${PARTNER_SA_EMAIL}" --role='roles/pubsub.subscriber' gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${PARTNER_SA_EMAIL}" --role='roles/storage.objectAdmin' # Create service account credentials and store it locally needed for starting/running data sources. gcloud iam service-accounts keys create --iam-account "${PARTNER_SA_EMAIL}" .gcp-service-account.json
- Creates a new Service Account named
-
Create an administrative IAM Service Account and store credentials locally for creating data source.
- Creates a new Service Account named
bq-dts-admin@[PROJECT_ID].iam.gserviceaccount.com
- Grants
roles/project.owner
- Downloads a Service-Account key called
.gcp-service-account.json
PROJECT_ID=$(gcloud config get-value core/project) PARTNER_SA_NAME="bq-dts-admin" PARTNER_SA_EMAIL="${PARTNER_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" # Creating a Service Account gcloud iam service-accounts create ${PARTNER_SA_NAME} --display-name ${PARTNER_SA_NAME} gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${PARTNER_SA_EMAIL}" --role='roles/owner' # Create service account credentials and store it locally needed for creating data source gcloud iam service-accounts keys create --iam-account "${PARTNER_SA_EMAIL}" .gcp-service-account-owner.json
- Creates a new Service Account named
-
Grant permissions to a GCP-managed Service Account
- Creates a custom role -
bigquerydatatransfer.connector
with permissionclientauthconfig.clients.getWithSecret
- Grants project-specific role
bigquerydatatransfer.connector
PROJECT_ID=$(gcloud config get-value core/project) GCP_SA_EMAIL="connectors@bigquery-data-connectors.iam.gserviceaccount.com" # Creating a custom role gcloud iam roles create bigquerydatatransfer.connector --project ${PROJECT_ID} --title "BigQuery Data Transfer Service Connector" --description "Custom role for GCP-managed Service Account for BQ DTS" --permissions "clientauthconfig.clients.getWithSecret" --stage ALPHA # Granting Service Account required roles gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${GCP_SA_EMAIL}" --role="projects/${PROJECT_ID}/roles/bigquerydatatransfer.connector"
- Creates a custom role -
-
Create an OAuth Consent Screen
-
Join BigQuery Data Transfer Service Partner-level whitelists. Reach out to your Google Cloud Platform contact to get whitelisted for these APIs.
# Requires Python 3.6
brew install python3
virtualenv env --python /usr/local/bin/python3.6
source env/bin/activate
pip install -r requirements.txt
-
example/calendar_connector.py - Implementation of a BQ DTS connector
- Handles Pub/Sub Message ack-deadline extensions
- Flushes log messages to BQ DTS every --update-interval seconds
- Stages data to Google Cloud Storage before invoking BQ loads
-
example/calendar_connector.yaml - Connector configuration file
- data_source_defintion - YAML representation of a DataSourceDefinition
- Required for one-time setup of a DataSourceDefinition
- Used in conjunction with bin/data_source_definition.py
- imported_data_info - Partial YAML representation of StartBigQueryJobsRequest.ImportedDataInfo
- Mapping of table name to table schemas
destination_table_id_template
is Python-formatted string used bybase_connector.templatize_table_name
andbase_connector.table_stager
- data_source_defintion - YAML representation of a DataSourceDefinition
-
example/transfer_run.yaml - YAML representation of a TransferRun
- [DEV ONLY] Mimics what would be received via a Pub/Sub subscription
Prior to using the below examples, ensure you have set the following environment variables
- GOOGLE_CLOUD_PROJECT={project-id}
- PYTHONPATH=<path_to_folder_where_virtualenv_is_set>
When working with Data Source Definitions, you must authenticate as the Administrative Service Account with role Project Owner (roles/owner)
.
-
GOOGLE_APPLICATION_CREDENTIALS={path-to/.gcp-service-account-owner.json}
-
OAuth client create and list
- clientauthconfig.clients.create
- clientauthconfig.clients.list
-
Pub/Sub Admin (roles/pubsub.admin)
# Create python bin/data_source_definition.py --project-id {project_id} --location-id us --body-yaml example/calendar_connector.yaml create # List python bin/data_source_definition.py --project-id {project_id} --location-id us list # Get python bin/data_source_definition.py --project-id {project_id} --location-id us --data-source-id {data_source_id} get # Patch python bin/data_source_definition.py --project-id {project_id} --location-id us --data-source-id {data_source_id} --update-mask supportedLocationIds,dataSource.updateDeadlineSeconds --body-yaml example/calendar_connector.yaml patch # Delete python bin/data_source_definition.py --project-id {project_id} --location-id us --data-source-id {data_source_id} delete
When serving BQ DTS requests, you should pass the credentials of the Operational IAM Service Account setup in Step 4 of Before you begin.
-
GOOGLE_APPLICATION_CREDENTIALS={path-to/.gcp-service-account.json}
# Development python example/calendar_connector.py --gcs-tmpdir gs://{gcs_bucket}/{blob_prefix}/ --transfer-run-yaml example/transfer_run.yaml example/calendar_connector.yaml # Production python example/calendar_connector.py --gcs-tmpdir gs://{gcs_bucket}/{blob_prefix}/ --ps-subname bigquerydatatransfer.{datasource-id}.{location-id}.run example/calendar_connector.yaml
# Starts up a 3x f1-micro backed cluster in us-central1
NAME=micro-cluster-ha
ZONE=us-central1-b
ADDITIONAL_ZONES=us-central1-c,us-central1-f
MACHINE_TYPE=f1-micro
gcloud config set compute/zone ${ZONE}
gcloud beta container clusters create ${NAME} --zone ${ZONE} --additional-zones ${ADDITIONAL_ZONES} --machine-type ${MACHINE_TYPE} --num-nodes=1 --enable-autoupgrade --enable-autorepair
# Switch to default docker config
kubectl config use-context $(kubectl config get-contexts -o name | grep gke_)
unset DOCKER_TLS_VERIFY
unset DOCKER_HOST
unset DOCKER_CERT_PATH
unset DOCKER_API_VERSION
https://docs.docker.com/docker-for-mac/#general
IMAGE_NAME=bq-dts-partner-connector
IMAGE_VERSION=$(date +"%Y%m%d-%H%M")
PROJECT_ID="$(gcloud config get-value project)"
REGISTRY_PREFIX="gcr.io/${PROJECT_ID/://}"
REGISTRY_IMAGE=${REGISTRY_PREFIX}/${IMAGE_NAME}
# Build locally since Container Builder doesn't respect .Dockerfile
docker build -t ${REGISTRY_IMAGE}:${IMAGE_VERSION} .
docker tag ${REGISTRY_IMAGE}:${IMAGE_VERSION} ${REGISTRY_IMAGE}:latest
gcloud docker -- push ${REGISTRY_IMAGE}:${IMAGE_VERSION}
gcloud docker -- push ${REGISTRY_IMAGE}:latest
-
Install Docker for Mac
-
Install VirtualBox
-
Install
kubectl
:gcloud components install kubectl
-
Install Minikube
kubectl config use-context minikube
eval $(minikube docker-env)
IMAGE_NAME=bq-dts-partner-connector
docker build -t ${IMAGE_NAME}:latest .
# NOTE - When deploying to your K8s cluster, CHANGE THESE VARIABLES
GCP_SERVICE_ACCOUNT_CREDS=.gcp-service-account.json
BQ_DTS_PARTNER_CONNECTOR_CONFIG=example/imported_data_info.yaml
K8_DEPLOYMENT=kube-deploy.minikube.yaml
kubectl delete secret bq-dts-partner-connector-service-account || true
kubectl create secret generic bq-dts-partner-connector-service-account --from-file ${GCP_SERVICE_ACCOUNT_CREDS}
kubectl delete configmap bq-dts-partner-connector-config || true
kubectl create configmap bq-dts-partner-connector-config --from-file config.yaml=${BQ_DTS_PARTNER_CONNECTOR_CONFIG}
kubectl apply -f ${K8_DEPLOYMENT}
REPLICAS=$(kubectl get deployment/${APPNAME} -o jsonpath='{.spec.replicas}')
kubectl scale --replicas=0 deployment ${APPNAME}
kubectl scale --replicas=${REPLICAS} deployment ${APPNAME}
kubectl describe deployments bq-dts-partner-connector
kubectl describe pods $(kubectl get pods -l app=bq-dts-partner-connector -o name)
kubectl logs -f $(kubectl get pods -l app=bq-dts-partner-connector -o name)