
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Data Quality Enforcement

This notebook allows you to programmatically generate and trigger an update of a DLT pipeline that consists of the following notebooks:

|DLT Pipeline|
|---|
|Auto Load to Bronze|
|Stream from Multiplex Bronze|
|[Quality Enforcement]($./Pipeline/SDLT 2.3.1 - Data Quality Enforcement)|

As we continue through the course, you can return to this notebook and use the provided methods to:
- Land a new batch of data
- Trigger a pipeline update
- Process all remaining data

**NOTE:** Re-running the entire notebook will delete the underlying data files for both the source data and your DLT Pipeline.

## Run Setup
Run the following cell to reset and configure your working environment for this course.

In [0]:
%run ./Includes/Classroom-Setup-03


## Generate DLT Pipeline
Run the cell below to auto-generate your DLT pipeline using the provided configuration values.
Please navigate to Delta live table under workflows tab.


In [0]:
DA.generate_pipeline(
    pipeline_name=DA.generate_pipeline_name(), 
    use_schema = DA.schema_name,
    notebooks_folder='Pipeline', 
    pipeline_notebooks=[
        'SDLT 2.1.1 - Auto Load to Bronze',
        'SDLT 2.2.1 - Stream from Multiplex Bronze',
        'SDLT 2.3.1 - Data Quality Enforcement'
        ],
    use_configuration ={"source": DA.paths.stream_source, "lookup_db": DA.lookup_db}
    )

## Trigger Pipeline Run

With a pipeline created, you will now run the pipeline. The initial run will take several minutes while a cluster is provisioned. Subsequent runs will be appreciably quicker.

Explore the DAG - As the pipeline completes, the execution flow is graphed. With each triggered update, all newly arriving data will be processed through your pipeline. Metrics will always be reported for current run.

In [0]:
DA.start_pipeline()

##### 📌 NOTE: Please navigate to the Pipelines tab and ensure the pipeline has completed successfully before running further cells.

## Land New Data

Run the cell below to land more data in the source directory, then manually trigger another pipeline update using the UI or the cell above.

In [0]:
DA.daily_stream.load()

## Process All Remaining Data
To continuously load all remaining batches of data to the source directory, call the same load method above with the **`continuous`** parameter set to **`True`**.

Trigger another update to process the remaining data.

In [0]:
DA.daily_stream.load(continuous=True)  # Load all remaining batches of data
DA.start_pipeline()  # Trigger another pipeline update


&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>