Python ETL Project

This repository contains the code and documentation for an ETL (Extract, Transform, Load) project using python.

Overview

The ETL process consists of the following steps:

Extraction: Data is extracted from multiple sources including data lakes, and APIs. Data Quality Check: Quality checks are performed on the extracted data to ensure its integrity and validity.
Transformation: Data is transformed according to predefined rules and requirements. This involves merging datasets, calculating new columns, and creating lookup tables.
Loading : Loading Data to the target which is Information Mart

Extraction: Contains notebooks for extracting data from different sources.
DQcheck: Notebooks for performing data quality checks on the extracted data.
Transformation: Notebooks for transforming the data as per the project requirements.
visualization: Notebooks for modeling the data and creating visualizations.

To get started with the project:

Clone this repository to your local machine.
Navigate to the relevant folder (Extraction, Data Quality Check, Transformation, Modeling).
Follow the instructions in the respective README files to run the code.

The project relies on the following dependencies:

Python 3.x, Jupyter Notebook, pandas, numpy, matplotlib, seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
DQcheck		DQcheck
Data Merging		Data Merging
Extraction		Extraction
Information Mart		Information Mart
Transformation		Transformation
landing		landing
source		source
staging_1		staging_1
staging_2		staging_2
visualization		visualization
Python_ETL_Usecase_Document.docx		Python_ETL_Usecase_Document.docx
README.md		README.md