This project implements a data pipeline that ingests data from Bronze to Gold layers using Databricks and Delta Lake.
The pipeline processes sales and product data, transforming it from raw CSV files into a refined analytical dataset.
Important Note: For data ingestion, Synapse Workspace was used.
The following Azure resources were provisioned for this project, including the Synapse Workspace, Databricks Service, and Storage Account.

Synapse pipelines were used to ingest data from the landing zone to the Bronze layer.

The transformation pipelines were orchestrated using Databricks Jobs.
The data flows through the following layers:
- Bronze: Raw data ingestion (CSV format).
- Silver: Cleaned and enriched data (Delta format).
- Gold: Aggregated and business-level data (Delta format).
The project consists of the following Databricks notebooks:
- Purpose: Ingests product data from the Bronze layer.
- Operations:
- Reads
products.csvfrom Bronze. - Adds a
processing_datecolumn. - Writes the data to the Silver layer (
silver/products) in Delta format. - Creates the
silver.productstable.
- Reads
- Purpose: Ingests sales data from the Bronze layer.
- Operations:
- Reads
sales.csvfrom Bronze. - Adds a
processing_datecolumn. - Writes the data to the Silver layer (
silver/sales) in Delta format. - Creates the
silver.salestable.
- Reads
- Purpose: Creates the final business report.
- Operations:
- Reads
silver.productsandsilver.salestables. - Joins the two tables on
ProductID. - Aggregates data to calculate
TotalSold(Sum of OrderQty) andTotalRevenue(Sum of LineTotal) per product. - Adds a
processing_datecolumn. - Writes the aggregated data to the Gold layer (
gold/informeventas) in Delta format. - Creates the
gold.informeventastable.
- Reads




