This repository contains the code and documentation for an ETL (Extract, Transform, Load) project using python.
The ETL process consists of the following steps:
- Extraction: Data is extracted from multiple sources including data lakes, and APIs. Data Quality Check: Quality checks are performed on the extracted data to ensure its integrity and validity.
- Transformation: Data is transformed according to predefined rules and requirements. This involves merging datasets, calculating new columns, and creating lookup tables.
- Loading : Loading Data to the target which is Information Mart
- Extraction: Contains notebooks for extracting data from different sources.
- DQcheck: Notebooks for performing data quality checks on the extracted data.
- Transformation: Notebooks for transforming the data as per the project requirements.
- visualization: Notebooks for modeling the data and creating visualizations.
To get started with the project:
- Clone this repository to your local machine.
- Navigate to the relevant folder (Extraction, Data Quality Check, Transformation, Modeling).
- Follow the instructions in the respective README files to run the code.
The project relies on the following dependencies:
Python 3.x, Jupyter Notebook, pandas, numpy, matplotlib, seaborn