This notebook establishes a Validation and Testing Suite for the incremental ingestion of transaction items (Chunk 2). 

It utilizes the unittest framework to verify that the transformation logic used within Delta Live Tables (DLT) is accurate and that the source data volumes are correctly configured.

##Why Incremental Loading for Transaction Items?

We have specifically selected the Transaction Items dataset for the incremental loading method (Chunk 2) for the following reasons:

- **High Data Volume:** Transaction items represent the largest dataset in the pipeline, containing millions of rows that describe individual product lines for every purchase.
 
- **Efficiency:** Unlike a full refresh, incremental loading only processes new or changed data since the last ingestion, significantly reducing cluster compute costs and processing time.

- **Data Freshness:** This method allows for frequent, low-latency updates, ensuring that downstream analytics reflect the latest sales data without the overhead of re-reading the entire historical dataset.

##1. Core Transformation Logic

The business logic for preparing incremental data is isolated into a single function to ensure it can be validated independently of the DLT pipeline.

####Logic: 

The transform_transaction_items function appends necessary audit metadata to each record. 

####Why this code: 

Isolating this logic allows us to verify that audit columns like load_dt and source are added correctly before the data is committed to the Bronze layer.

In [0]:
from pyspark.sql.functions import current_timestamp, lit

def transform_transaction_items(df):
    """
    Core transformation logic for Chunk 2.
    Adds load_dt and source metadata columns.
    """
    return (
        df.withColumn("load_dt", current_timestamp())
          .withColumn("source", lit("chunk2_csv"))
    )

##2. Unit and Integration Test Suite

This section defines the test cases required to ensure the stability of the incremental ingestion process.

####Logic:

Unit Test (test_transformation_logic): Uses mock data to verify that the transformation function adds the required columns and tags correctly.

####Integration Test (test_integration_path_check): 
Directly checks the Unity Catalog Volume to ensure the source files are physically accessible by the compute cluster.

####Why this code: 

These tests act as a "gatekeeper" for the DLT pipeline. By verifying logic and path accessibility upfront, we prevent pipeline failures that are often difficult to debug during live streaming runs.

In [0]:
import unittest
import io
from unittest import TextTestRunner

class DLTIncrementalTest(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        # Access the existing SparkSession
        cls.spark = spark

    def test_transformation_logic(self):
        """Unit Test: Verify metadata columns are added correctly"""
        # Create mock input data
        input_data = [("TXN123", "ITEM456")]
        input_df = self.spark.createDataFrame(input_data, ["transaction_id", "item_id"])
        
        # Apply transformation
        output_df = transform_transaction_items(input_df)
        
        # Validate columns and content
        self.assertIn("load_dt", output_df.columns)
        self.assertIn("source", output_df.columns)
        self.assertEqual(output_df.collect()[0]["source"], "chunk2_csv")

    def test_integration_path_check(self):
        """Integration Test: Verify source Volume accessibility"""
        source_path = "/Volumes/vstone-catalog/vstone_schema/chunked_data/chunk2/transaction_items/"
        try:
            dbutils.fs.ls(source_path)
            path_exists = True
        except:
            path_exists = False
        
        print(f"Integration Check - Path {source_path} accessible: {path_exists}")
        self.assertTrue(path_exists)

# Initialize the suite
suite = unittest.TestLoader().loadTestsFromTestCase(DLTIncrementalTest)

##3. Reporting and Results

The final block executes the tests and generates a formatted DLT Ingestion Pipeline Test Report.

####Logic:

The script captures test results into a stream and prints a summary including the count of successes, failures, and errors. 
 
####Why this code: 
This report provides a clear "Go/No-Go" signal for the deployment of the incremental ingestion job.

In [0]:
# Capture report output
stream = io.StringIO()
runner = TextTestRunner(stream=stream, verbosity=2)
result = runner.run(suite)

# Generate Final Report
print("●●● DLT INGESTION PIPELINE TEST REPORT ●●●")
print("-" * 45)
print(stream.getvalue())
print("-" * 45)
print(f"TOTAL TESTS RUN: {result.testsRun}")
print(f"SUCCESSES: {result.testsRun - len(result.failures) - len(result.errors)}")
print(f"FAILURES: {len(result.failures)}")
print(f"ERRORS: {len(result.errors)}")
print("-" * 45)

if not result.wasSuccessful():
    print("\n[DEBUG] Failure detected. Use %debug to inspect the transformation.")

**Report Summary**
- Total Tests Run: 2

- Successes: 2

- Failures: 0

- Path Accessibility: Verified (True)