# ðŸ“¥ Notebook: 00 ETL Bronze Layer

This notebook forms the **first stage** of the AI-powered claims processing pipeline, focusing on the **Bronze Layer (Raw Ingestion)** of the Medallion Architecture. It sets up the foundational data required for downstream processing in the Databricks platform.

---

## ðŸ§± Purpose
To ingest raw call audio files from a defined volume location into a structured Delta Lake table for further processing in the pipeline.

In [0]:
%run "./resources/init" 

DataFrame[]

In [0]:
raw_audio_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/raw_recordings/"
if not dbutils.fs.mkdirs(raw_audio_path):
    dbutils.fs.mkdirs(raw_audio_path)

In [0]:
import pyspark.sql.functions as F

files = dbutils.fs.ls(raw_audio_path)
if not files:
    raise ValueError("Empty directory")

file_reference_df = spark.createDataFrame(files)\
  .withColumn("file_path", F.expr("substring(path, 6, length(path))"))

display(file_reference_df)

file_reference_df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(f"{CATALOG}.{SCHEMA}.recordings_file_reference_bronze")

path,name,size,modificationTime,file_path
dbfs:/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/5e7e3k53_AGT002_2025-01-15 13_35_10.m4a,5e7e3k53_AGT002_2025-01-15 13_35_10.m4a,787392,1743602105000,/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/5e7e3k53_AGT002_2025-01-15 13_35_10.m4a
dbfs:/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/ct4m50n5_AGT005_2025-03-01 12_36_07.m4a,ct4m50n5_AGT005_2025-03-01 12_36_07.m4a,939809,1743602105000,/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/ct4m50n5_AGT005_2025-03-01 12_36_07.m4a
dbfs:/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/nv7032f9_AGT001_2025-02-27 12_40_45.m4a,nv7032f9_AGT001_2025-02-27 12_40_45.m4a,993088,1743602105000,/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/nv7032f9_AGT001_2025-02-27 12_40_45.m4a
dbfs:/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/pxvlh18a_AGT001_2025-02-11 11_33_33.m4a,pxvlh18a_AGT001_2025-02-11 11_33_33.m4a,1028483,1743602105000,/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/pxvlh18a_AGT001_2025-02-11 11_33_33.m4a
dbfs:/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/ulnocrnh_AGT005_2025-02-04 05_42_51.m4a,ulnocrnh_AGT005_2025-02-04 05_42_51.m4a,1038857,1743602105000,/Volumes/samantha_wise/ai_claims_processing_final/audio_recordings/raw_recordings/ulnocrnh_AGT005_2025-02-04 05_42_51.m4a


## âœ… Output
- A Delta table: recordings_file_reference_bronze
- This serves as the source of truth for all raw audio ingestions in the pipeline.