Skip to content

orr21/Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pipeline Project

This project implements a data pipeline that ingests data from Bronze to Gold layers using Databricks and Delta Lake.

Project Overview

The pipeline processes sales and product data, transforming it from raw CSV files into a refined analytical dataset.

Important Note: For data ingestion, Synapse Workspace was used.

Project Screenshots

Azure Resources

The following Azure resources were provisioned for this project, including the Synapse Workspace, Databricks Service, and Storage Account. Azure Resources

Data Ingestion (Synapse)

Synapse pipelines were used to ingest data from the landing zone to the Bronze layer. Synapse Activity Runs

Databricks Job Execution

The transformation pipelines were orchestrated using Databricks Jobs.

Job Graph: Job Graph

Job Run Details: Job Run

Data Previews

Silver Layer - Products: Silver Products

Silver Layer - Sales: Silver Sales

Gold Layer - Sales Report: Gold Report

Architecture

The data flows through the following layers:

  1. Bronze: Raw data ingestion (CSV format).
  2. Silver: Cleaned and enriched data (Delta format).
  3. Gold: Aggregated and business-level data (Delta format).

Notebooks Description

The project consists of the following Databricks notebooks:

1. Products2Silver

  • Purpose: Ingests product data from the Bronze layer.
  • Operations:
    • Reads products.csv from Bronze.
    • Adds a processing_date column.
    • Writes the data to the Silver layer (silver/products) in Delta format.
    • Creates the silver.products table.

2. Sales2Silver

  • Purpose: Ingests sales data from the Bronze layer.
  • Operations:
    • Reads sales.csv from Bronze.
    • Adds a processing_date column.
    • Writes the data to the Silver layer (silver/sales) in Delta format.
    • Creates the silver.sales table.

3. Gold

  • Purpose: Creates the final business report.
  • Operations:
    • Reads silver.products and silver.sales tables.
    • Joins the two tables on ProductID.
    • Aggregates data to calculate TotalSold (Sum of OrderQty) and TotalRevenue (Sum of LineTotal) per product.
    • Adds a processing_date column.
    • Writes the aggregated data to the Gold layer (gold/informeventas) in Delta format.
    • Creates the gold.informeventas table.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors