# 01_Bronze_Ingestion

## Raw Data Loading into Delta Lake

This notebook implements the **Bronze layer** of the Medallion Architecture:
- **Purpose**: Ingest raw data with minimal transformation
- **Input**: CSV file from Unity Catalog Volume
- **Output**: Delta table with raw data preserved

In [0]:
# ============================================
# 01_Bronze_Ingestion.ipynb
# --------------------------------------------
# Purpose:
#   Ingest raw cost-aware risk case data
#   from Unity Catalog Volumes into
#   Bronze Delta tables.
#
# Bronze Principles:
#   - Raw, immutable data
#   - No business logic
#   - Delta Lake storage
#
# Domain:
#   Finance & Banking (Cost-Aware Decisioning)
# ============================================

### Configuration

In [0]:
CATALOG = "cost_aware_capstone"
SCHEMA = "risk_decisioning"

In [0]:
RAW_VOLUME_PATH = (
    "/Volumes/cost_aware_capstone/"
    "risk_decisioning/raw_data"
)

RAW_CASES_FILE = f"{RAW_VOLUME_PATH}/cost_aware_cases.csv"

In [0]:
BRONZE_TABLE = (
    "cost_aware_capstone.risk_decisioning."
    "bronze_cost_aware_cases"
)

print("Raw data path:", RAW_CASES_FILE)
print("Bronze table:", BRONZE_TABLE)

### Read Raw Data from Unity Catalog Volume

In [0]:
# Read raw CSV with schema inference
try:
    cases_raw = (
        spark.read
        .option("header", True)
        .option("inferSchema", True)
        .csv(RAW_CASES_FILE)
    )
    
    row_count = cases_raw.count()
    if row_count == 0:
        raise ValueError("CSV file is empty!")
    
    print(f"Successfully loaded {row_count:,} records")
    cases_raw.printSchema()
    display(cases_raw.limit(5))
    
except Exception as e:
    print(f"Error loading file: {e}")
    print(f"   Expected path: {RAW_CASES_FILE}")
    print("   Ensure you've uploaded the CSV to the Unity Catalog Volume")
    raise

### Write Bronze Delta Table

In [0]:
(
    cases_raw
    .write
    .format("delta")
    .mode("overwrite")
    .saveAsTable(BRONZE_TABLE)
)

### Verifying delta table data

In [0]:
spark.sql(f"""
    SELECT COUNT(*) AS bronze_row_count
    FROM {BRONZE_TABLE}
""").show()


---
## Bronze Layer Complete

**What I Accomplished**:
- Loaded raw CSV data from Unity Catalog Volume
- Created Delta table with automatic schema inference
- Preserved raw data for auditability

**Next**: Run `02_Silver_Feature_Engineering.ipynb` for data cleaning and feature creation

---