Skip to content

About This project implements an end-to-end Extract, Transform, Load (ETL) process for a bicycle store using Python.It gathers data from data lakes and APIs, performs quality checks, transforms the data as necessary, and loads it into a structured format for analytics.

Notifications You must be signed in to change notification settings

mo3azf/ETL-Using-Python

Repository files navigation

Python ETL Project

This repository contains the code and documentation for an ETL (Extract, Transform, Load) project using python.

Overview

The ETL process consists of the following steps:

  1. Extraction: Data is extracted from multiple sources including data lakes, and APIs. Data Quality Check: Quality checks are performed on the extracted data to ensure its integrity and validity.
  2. Transformation: Data is transformed according to predefined rules and requirements. This involves merging datasets, calculating new columns, and creating lookup tables.
  3. Loading : Loading Data to the target which is Information Mart

Project Structure:

  1. Extraction: Contains notebooks for extracting data from different sources.
  2. DQcheck: Notebooks for performing data quality checks on the extracted data.
  3. Transformation: Notebooks for transforming the data as per the project requirements.
  4. visualization: Notebooks for modeling the data and creating visualizations.

Getting Started

To get started with the project:

  • Clone this repository to your local machine.
  • Navigate to the relevant folder (Extraction, Data Quality Check, Transformation, Modeling).
  • Follow the instructions in the respective README files to run the code.

Dependencies

The project relies on the following dependencies:

Python 3.x, Jupyter Notebook, pandas, numpy, matplotlib, seaborn

About

About This project implements an end-to-end Extract, Transform, Load (ETL) process for a bicycle store using Python.It gathers data from data lakes and APIs, performs quality checks, transforms the data as necessary, and loads it into a structured format for analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published