Skip to content

Latest commit

 

History

History
60 lines (46 loc) · 2.52 KB

README.md

File metadata and controls

60 lines (46 loc) · 2.52 KB

Up and running

Initialize environment as described in repo root README

List container images built on environment initialization to confirm availability:

# Using local setup
docker images list | grep grep extract_load_trips_from_tlc_to_gs

# Using cloud setup
gcloud artifacts docker images list $GCP_CONTAINER_REGISTRY_URL | grep extract_load_trips_from_tlc_to_gs

Set container image on orchestrator instance for "extract_load_trips_from_tlc_to_gs" dags:

# Using local setup
IMAGE=extract_load_trips_from_tlc_to_gs:latest
airflow variables set docker_image_extract_load_trips_from_tlc_to_gs $IMAGE

# Using cloud setup
IMAGE=us-central1-docker.pkg.dev/dtc-de-project-383119/airflow-docker-operators/extract_load_trips_from_tlc_to_gs:latest
gcloud composer environments update "$COMPOSER_ENV_NAME" \
    --location "$COMPOSER_ENV_LOCATION" \
    --update-env-variables=docker_image_extract_load_trips_from_tlc_to_gs=$IMAGE

Parameters in ad-hoc/manual DagRuns

Example DAG configuration for single lightweighest task:

{
    "cloud_run_jobs_parent": null,
    "data_bucket_name": null,
    "vehicle_types": ["green"],
    "years": [2023]
}

Example DAG configuration for eight dynamic tasks:

{
    "cloud_run_jobs_parent": null,
    "data_bucket_name": null,
    "vehicle_types": ["green", "yellow"],
    "years": [2019, 2020, 2021, 2022]
}

Notes on environment variables:

  • cloud_run_jobs_parent and data_bucket_name are parameteres defaulting to Airflow variables set as environment variables in orchestration instance, in both local and cloud setup. You can check AIRFLOW_VAR_CLOUD_RUN_JOBS_PARENT and AIRFLOW_VAR_DATA_BUCKET_NAME values in the environment file generated by initialization script. Read repo root README
  • The values of these Airflow variables set through environment variables may be overriden in ad-hoc/manual DagRuns as DAG parameters but cannot be modified from environment variables UI view, as Airflow variables defined as environment variables are not visible from Airflow UI. Read https://airflow.apache.org/docs/apache-airflow/stable/howto/variable.html#storing-variables-in-environment-variables

DAGs in Composer

DAGs Execution isolation is implemented using Cloud Run Job deployed through a virtual environment Airflow operator requiring google-cloud-run package.

TODO: validate why splitting steps into three different tasks raises error. Use single task DAG as a temporary solution.

2023-05-18: added support for Cloud Batch, similar to Cloud Run Job