Skip to content

Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.

Notifications You must be signed in to change notification settings

riju18/Airflow-data-engineering-with-BigQuery-and-dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Airflow-data-engineering-with-BigQuery-and-dbt

Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the dwh and run soda to check data quality.

alt retail_data_dag

Get Started

  • create python venv
  • enable it
  • install dependencies from requirements.txt file
  • configure airflow in airflow.cfg
  • create as GCP service account and add a key. also, download the key in json format.

Airflow Webserver

  • In Variable section add the following three varibales
    • gcp_project
    • gcp_bigquery_retail_dataset
    • gcp_account : downloaded json file path
  • In Connection section add a new GCP connection
    • connection name: my_gcp_conn
    • value: downloaded service account json file content

Configure dbt and soda in airflow

  • at the bottom of the dags/data_retail_project.py, modify the bash command with dbt and soda project dir,dbt and soda env from where the dbt and soda will run.

Run

  • airflow webserver
  • airflow scheduler

About

Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages