##1. Delta Live Tables (DLT) Bronze Ingestion

This section defines the logic for capturing raw transaction items and loading them into the Delta Lake.

####Logic: 

The code uses the @dlt.table decorator to define a managed table. It reads CSV data from the chunk2 volume path and appends technical metadata columns.

####Why this code: 
 
 Using Delta Live Tables simplifies the ingestion process by automatically managing table creation, schema evolution, and data lineage. It is designed for reliability, ensuring that the transactions_items_bronze table is updated correctly as new data arrives in the source directory.

In [0]:
import dlt
from pyspark.sql.functions import current_timestamp, lit

# 1. Define the Bronze Table using DLT
# Logic: The @dlt.table decorator registers this function as a managed table in the pipeline.
# The 'comment' property provides built-in documentation for Unity Catalog discovery.
@dlt.table(
  name="transactions_items_bronze",
  comment="Bronze table for transaction items ingested from CSV chunk2"
)
def bronze_transaction_items():
    return (
        spark.read
        .option("header", "true")
        # Path Logic: Points to the chunked CSV data in the Unity Catalog Volume
        .csv("/Volumes/vstone-catalog/vstone_schema/chunked_data/chunk2/transaction_items/")
        # Audit Logic: 'load_dt' tracks ingestion time; 'source' identifies the data origin.
        .withColumn("load_dt", current_timestamp())
        .withColumn("source", lit("chunk2_csv"))
    )

####2. Key Ingestion Features


* Managed Table Definition:

 By using dlt.table, Databricks handles the underlying Spark infrastructure and table maintenance automatically.

* Auditability: 

Adding load_dt (loading timestamp) and source (origin tag) allows data engineers to trace data back to its ingestion batch.

* Discovery:

 The comment attribute ensures that the table's purpose is visible to other users within the Databricks Data Explorer