Retail Data Pipeline

Overview

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline. The pipeline is built on Apache Airflow and dbt, and is deployed on Astronomer. The data is stored in Google Cloud Storage and BigQuery. The pipeline is built to be modular and scalable, and can be easily adapted to other use cases.

The dataset used in this project is the Online Retail Data Set from Kaggle. This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Dataset Description

Column	Description
InvoiceNo	Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
StockCode	Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
Description	Product (item) name. Nominal.
Quantity	The quantities of each product (item) per transaction. Numeric.
InvoiceDate	Invice Date and time. Numeric, the day and time when each transaction was generated.
UnitPrice	Unit price. Numeric, Product price per unit in sterling.
CustomerID	Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
Country	Country name. Nominal, the name of the country where each customer resides.

Technologies

Apache Airflow
dbt
Google Clous Storage, BigQuery
Soda
Metabase
Python
Docker, Docker Compose

Pipeline

Data Modeling

Getting Started

Clone the repo
Create a project on Google Cloud Platform, then create a service account and download the key file
Add the key file to the include/gcp folder
Run

astro dev start

To start the Airflow server. The airflow UI will be available at https://localhost:8080

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.astro		.astro
dags		dags
imgs		imgs
include		include
logs		logs
terraform		terraform
tests/dags		tests/dags
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.override.yml		docker-compose.override.yml
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Data Pipeline

Table of Contents

Overview

Dataset Description

Technologies

Pipeline

Data Modeling

Getting Started

Reference

About

Releases

Packages

Languages

ntphiep/retail-etl-pipeline

Folders and files

Latest commit

History

Repository files navigation

Retail Data Pipeline

Table of Contents

Overview

Dataset Description

Technologies

Pipeline

Data Modeling

Getting Started

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages