Skip to content

Script that clones BigQuery dataset(s) to a different project in different location

Notifications You must be signed in to change notification settings

ilamaaa/bq-dataset-clone

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

bq-dataset-clone

Clone BigQuery datasets to a different project in different location

Modified from @tswast's answer to https://stackoverflow.com/questions/32767245/how-to-best-handle-data-stored-in-different-locations-in-google-bigquery

Setup

  • Authentication

    https://cloud.google.com/docs/authentication/getting-started (In the sample code I set environment variable for the storage transfer service account creds, and then explicitly specified BigQuery service account creds path - it's probably uncessary to do it twice. You can use just one service account if you want, just need to grant the right access level for both BigQuery and Cloud Storage in your gcloud console - IAM. you can do it in the web interface)

  • Make sure this service account has access to both projects. To grant cross project access you just need to add this initial service account email to both projects: https://stackoverflow.com/questions/35479025/cross-project-management-using-service-account

  • Update all the TODOs in the script

Flow

  • Under FROM_PROJECT, get datasets need to be cloned
  • For each one of the above datasets, extract all the BigQuery tables into a Cloud Storage bucket in the same region as the tables. (Recommend Avro format for best fidelity in data types and fastest loading speed.)
  • Run a storage transfer job to copy the extracted files from the starting location bucket to a bucket in the destination location (TO_BUCKET).
  • Load all the files into a BigQuery dataset located in the destination location under TO_PROJECT.

this does branch

About

Script that clones BigQuery dataset(s) to a different project in different location

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%