# Project: Circuit Box Data Lakehouse
**Domain:** E-commerce (Consumer Electronics)  
**Technology Stack:** Delta Live Tables (DLT), PySpark, Databricks SQL

## 1. Project Overview
This project implements a Data Lakehouse architecture for **Circuit Box**, a fictional e-commerce company selling electronic equipment directly to consumers. The goal is to build a robust ETL pipeline using **Delta Live Tables (DLT)** to ingest incremental operational data, transform it, and serve it for business analytics.

## 2. Architecture: The Medallion Model
We will implement a multi-hop architecture consisting of three layers:

* **Bronze Layer (Raw):** Ingests raw data incrementally from source files. This layer handles schema inference and preserves the original data state.
* **Silver Layer (Cleansed & Enriched):** Filters, cleans, and augments the data. This layer enforces data quality expectations (e.g., valid keys, non-null values).
* **Gold Layer (Curated):** Provides business-level aggregations and specific views for reporting and analytics.

## 3. Data Sources & Data Model
The pipeline consumes three primary datasets. We expect **incremental data**, meaning each new file contains only data added since the last processing run.

| Dataset | Format | Primary Key | Foreign Key | Description |
| :--- | :--- | :--- | :--- | :--- |
| **Customers** | JSON | `customer_id` | N/A | Customer profiles. `customer_id` cannot be null. |
| **Addresses** | CSV | `customer_id` | `customer_id` | Shipping addresses. Linked 1:1 with Customers. |
| **Orders** | JSON | `order_id` | `customer_id` | Transactional data. A customer can have multiple orders. |

### Data Relationships
* **Customers ↔ Addresses:** One-to-One relationship (joined via `customer_id`).
* **Customers ↔ Orders:** One-to-Many relationship (joined via `customer_id`).

## 4. Key Technical Objectives
* **Incremental Ingestion:** Efficiently process only new files using Databricks Auto Loader (`cloud_files`).
* **Data Quality:** Apply DLT Expectations to ensure data integrity (e.g., constraint checks).
* **Pipeline Orchestration:** Use Delta Live Tables to manage dependencies and pipeline state automatically.


![MODEL_DLT.png](./MODEL_DLT.png "MODEL_DLT.png")

## Circuit-Box Architecture
![Circuit_Box_Architecture.png](./Circuit_Box_Architecture.png "Circuit_Box_Architecture.png")