Fashion-Campus-Orders

Data Engineering 2024 project for DataEngineer-Zoomcamp. Developed an end-to-end data pipeline for an Indonesian e-commerce website Fashion Campus.

Data is extracted from Kaggle - https://www.kaggle.com/datasets/latifahhukma/fashion-campus/data?select=transactions.csv

Information about the data - Fashion Campus, an e-commerce fashion company targeting "Indonesian Young Urbans" - young people aged 15-35 - was established in Indonesia in early 2016. The company offers a catalog of local and international brands popular among young people. Given that the data is static, the data pipeline operates as a one-time process. The dataset contains 4 CSV files

Clickstream
Transactions
Product
Customer

Goal

Develop a data architecture from the raw data of the Fashion Campus using Google Cloud Platform. The data is extracted from Kaggle, inital data ingestion and workflow orchestration is done through Mage. Final ETL pipeline is developed in DBT. When data is stored in the warehouse i.e. Bigquery, then visualization for business is done through Looker.

Data Architecture of the project-

Tools and Steps

Cloud:
- Google Cloud Platform (GCP)
Data Ingestion (batch):
- Mage
- Batch data ingestion is done through Mage, as it makes easy to handle big data and the data gets stored in data lake in batches.
Data Lake:
- Google Cloud Storage
- When data is ingested and processed from Mage, it is stored in google cloud storage. As it is a cloud platform, it becomes easy to access the data for further processing.
Data Transformations and Processing:
- dbt
- DBT is used for the development of the ETL of the data. Developed staging tables for the files which are further joined into a fact table.
- Further dimensions are created according to the requirement and then data is pushed into data warehouse in batches.
Data Warehousing:
- Google BigQuery
- Data from both dev and prod environment is stored in bigquery. This can easily help us in writing adhoc SQL scripting and also provides data for visualization in looker
Dashboarding:
- Google Looker Studio
- Check out the dashboards below
- Fashion Campus Order Analysis - https://lookerstudio.google.com/u/0/reporting/bd6e5b38-1d02-4395-9b30-395046c28f68/page/OoIxD?s=kgU0Du65M1k
- Fashion Campus Product Details - https://lookerstudio.google.com/s/kQqG5WWwpHo

Future Scope of the project

Creating CI/CD pipeline on DBT, so that data can be merged easily on git.
Developing further visualizations of clickstream to retain customers.
Developing further dimensions of the ETL architecture to generate niche data.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Analytics-Engineering/Fashion_Campus		Analytics-Engineering/Fashion_Campus
Images		Images
Looker Visualization		Looker Visualization
Mage		Mage
Python		Python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analytics-Engineering/Fashion_Campus

Analytics-Engineering/Fashion_Campus

Images

Images

Looker Visualization

Looker Visualization

Mage

Mage

Python

Python

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Fashion-Campus-Orders

Data Engineering 2024 project for DataEngineer-Zoomcamp. Developed an end-to-end data pipeline for an Indonesian e-commerce website Fashion Campus.

Goal

Data Architecture of the project-

Tools and Steps

Future Scope of the project

About

Releases

Packages

Languages

rtilwalia/Fashion-Campus-Orders

Folders and files

Latest commit

History

Repository files navigation

Fashion-Campus-Orders

Data Engineering 2024 project for DataEngineer-Zoomcamp. Developed an end-to-end data pipeline for an Indonesian e-commerce website Fashion Campus.

Goal

Data Architecture of the project-

Tools and Steps

Future Scope of the project

About

Resources

Stars

Watchers

Forks

Languages