Data Pipeline Project

This project implements a data pipeline that ingests data from Bronze to Gold layers using Databricks and Delta Lake.

Project Overview

The pipeline processes sales and product data, transforming it from raw CSV files into a refined analytical dataset.

Important Note: For data ingestion, Synapse Workspace was used.

Project Screenshots

Azure Resources

The following Azure resources were provisioned for this project, including the Synapse Workspace, Databricks Service, and Storage Account.

Data Ingestion (Synapse)

Synapse pipelines were used to ingest data from the landing zone to the Bronze layer.

Databricks Job Execution

The transformation pipelines were orchestrated using Databricks Jobs.

Job Graph:

Job Run Details:

Data Previews

Silver Layer - Products:

Silver Layer - Sales:

Gold Layer - Sales Report:

Architecture

The data flows through the following layers:

Bronze: Raw data ingestion (CSV format).
Silver: Cleaned and enriched data (Delta format).
Gold: Aggregated and business-level data (Delta format).

Notebooks Description

The project consists of the following Databricks notebooks:

1. `Products2Silver`

Purpose: Ingests product data from the Bronze layer.
Operations:
- Reads products.csv from Bronze.
- Adds a processing_date column.
- Writes the data to the Silver layer (silver/products) in Delta format.
- Creates the silver.products table.

2. `Sales2Silver`

Purpose: Ingests sales data from the Bronze layer.
Operations:
- Reads sales.csv from Bronze.
- Adds a processing_date column.
- Writes the data to the Silver layer (silver/sales) in Delta format.
- Creates the silver.sales table.

3. `Gold`

Purpose: Creates the final business report.
Operations:
- Reads silver.products and silver.sales tables.
- Joins the two tables on ProductID.
- Aggregates data to calculate TotalSold (Sum of OrderQty) and TotalRevenue (Sum of LineTotal) per product.
- Adds a processing_date column.
- Writes the aggregated data to the Gold layer (gold/informeventas) in Delta format.
- Creates the gold.informeventas table.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
.gitignore		.gitignore
Gold.ipynb		Gold.ipynb
Products2Silver.ipynb		Products2Silver.ipynb
README.md		README.md
Sales2Silver.ipynb		Sales2Silver.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipeline Project

Project Overview

Project Screenshots

Azure Resources

Data Ingestion (Synapse)

Databricks Job Execution

Data Previews

Architecture

Notebooks Description

1. `Products2Silver`

2. `Sales2Silver`

3. `Gold`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline Project

Project Overview

Project Screenshots

Azure Resources

Data Ingestion (Synapse)

Databricks Job Execution

Data Previews

Architecture

Notebooks Description

1. Products2Silver

2. Sales2Silver

3. Gold

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `Products2Silver`

2. `Sales2Silver`

3. `Gold`

Packages