Skip to content

This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.

Notifications You must be signed in to change notification settings

prathmeshyelne/ETL-Pipeline-for-Employee-Data-Using-Data-Fusion-Airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Project with Data Fusion, Airflow, and BigQuery

This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.

Note : Im Using Faker Library for Demo Data Instead of Actual Company Data.

Overview

The project aims to perform the following tasks:

  1. Data Extraction: Extract data using python.
  2. Data Masking: Apply data masking & encoding techniques to sensitive information in Cloud Data Fusion before loading it into BigQuery.
  3. Data Loading: Load transformed data into Google BigQuery tables.
  4. Orchestration: Automate complete Data pipeline using Airflow ( Cloud Composer )

image

Architecture

image

About

This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages