# Data Quality Check — Sales Table

This notebook executes the Data Quality framework for the `sales` table as part of the Lakehouse Expansion engineering pillar. It loads the table, applies expectations defined in `sales_expectations.json`, evaluates the results using the shared `dq_runner.py` module, and produces a structured report suitable for downstream monitoring, alerting, or pipeline enforcement.

The workflow is intentionally consolidated into a single execution block to keep the notebook clean, readable, and aligned with production engineering practices. All operational steps—loading data, importing the runner, executing checks, formatting results, saving outputs, and enforcing failure rules—are performed within one Python cell for deterministic execution and minimal cognitive overhead.

This notebook is designed to be pipeline‑ready, environment‑agnostic, and easily extended to additional tables as the Lakehouse Expansion continues.

In [None]:
# Step 1 — Load sales table
df = spark.read.table("lakehouse.sales").toPandas()

# Step 2 — Add module path and import runner
import sys
sys.path.append("/lakehouse/default/Files/15-lakehouse-expansion/features/data-quality")
from dq_runner import run_dq

# Step 3 — Define expectations path
expectations_path = "/lakehouse/default/Files/15-lakehouse-expansion/features/data-quality/expectations/sales_expectations.json"

# Step 4 — Run data quality checks
results = run_dq(
    table_df=df,
    expectations_path=expectations_path
)

# Step 5 — Convert results to DataFrame
import pandas as pd
results_df = pd.DataFrame(results)

# Step 6 — Display results
results_df

# Step 7 — Optional: Save report
results_df.to_csv(
    "/lakehouse/default/Files/15-lakehouse-expansion/features/data-quality/dq_report_sales.csv",
    index=False
)

# Step 8 — Optional: Fail notebook if checks failed
failed = [r for r in results if not r["passed"]]
if failed:
    raise Exception("One or more Data Quality checks failed.")