Includes:
-
BigQueryToGCSOperator: To export tables into Google Cloud Storage (example is with partitions).
-
PythonOperator: Leveraging GoogleCloudStorageHook in a custom function to compose partition text files into one file.
-
GCSDeleteObjectsOperator: To delete objects from a given bucket and prefix.
Initialize the Airflow in Docker. It will ask you about installing yq in your Linux/Mac development environment:
./init.sh
If you don't want to use an external tool to edit your docker-compose, please set your GCP_PROJECT_ID, GCP_BIGQUERY_DATASET_NAME, GCP_BIGQUERY_EXPORT_BUCKET_NAME, BIGQUERY_TABLE_NAME environment variables manually. This is possible by add those lines on the docker-compose.yaml
x-airflow-common
service environment:
GCP_PROJECT_ID: '<YOUR-PROJECT-ID>'
GCP_BIGQUERY_DATASET_NAME: '<YOUR-DATASET>'
GCP_BIGQUERY_EXPORT_BUCKET_NAME: '<YOUR-BUCKET-NAME>'
BIGQUERY_TABLE_NAME: '<YOUR-TABLE-NAME>'
docker-compose up
Then go to local Airflow.
Go to the configurations to set up your GCP Connection. Please follow the instructions on the providers documentation.
Run the DAGs on demand and see the pipeline running.