# Exploring Delta Live Tables (DLT)
- DLT is a powerful tool to build production-grade data pipelines with ease.
- Offers a simple and intuitive interface to manage and monitor pipelines.
- Focuses on helping users extract insights, not manage infrastructure.
- Built on Apache Spark and designed to be declarative, reliable, and scalable.

### Benefits of Delta Live Tables
- **Simplified pipeline construction:**

  Uses a declarative approach to reduce code complexity and development time.

- **Automatic table dependency management:**

  Utilizes Directed Acyclic Graphs (DAGs) to manage dependencies efficiently.

- **Built-in data quality control:**

  Supports constraints and expectations to ensure high data quality.

### DLT vs Spark Structured Streaming

| Feature                       | **Delta Live Tables (DLT)**                               | **Spark Structured Streaming**                                         |
| ----------------------------- | --------------------------------------------------------- | ---------------------------------------------------------------------- |
| **Framework Type**            | Declarative ETL framework built on Apache Spark           | Imperative stream processing API                                       |
| **Pipeline Definition**       | SQL or Python with `@dlt.table` decorators                | PySpark DataFrame API using `readStream` / `writeStream`               |
| **Checkpointing**             | Handled automatically by DLT                              | Must be configured manually via `.option("checkpointLocation", "...")` |
| **Data Quality Control**      | Built-in support via `EXPECT` and `CONSTRAINT` clauses    | Basic constraints via Delta Lake (e.g., NOT NULL, CHECK)               |
| **SQL Support for Streaming** | Full SQL support (`CREATE STREAMING TABLE`)               | Cannot create streaming tables with SQL alone                          |
| **Table Management**          | Manages table dependencies using DAGs                     | User must manage dependencies manually                                 |
| **Error Handling**            | Declarative `ON VIOLATION` actions: DROP ROW, FAIL UPDATE | Manual handling via custom logic or exception capture                  |
| **Observability & Logging**   | Built-in metrics and monitoring via DLT UI                | Limited by default; requires custom instrumentation                    |
| **Development Complexity**    | Simplified with less boilerplate code                     | More complex and code-heavy                                            |


### DLT Object Types
#### Streaming Tables
- **Purpose**: Handle incremental data (new records only).
- **Use Case:** Suitable for real-time or near-real-time data ingestion and processing.
- **Behavior:**
  - Each pipeline run processes only the new data added since the last run.
  - Requires streaming sources, such as Auto Loader or append-only Delta tables.
- **Processing Pattern:** Optimized for continuous updates from live data streams.

  **Example syntax:**

  `CREATE OR REFRESH STREAMING TABLE table_name AS`<br>
  `SELECT * FROM <streaming source>`


---

#### Materialized Views (formerly known as Live Tables)
- **Purpose**: Handle batch data (full refresh).
- **Use Case:** Ideal when input data includes updates, deletes, or overwrites.
- **Behavior:**
  - On each run, the entire query is re-executed and the table/view is fully updated.
  - Can work with both streaming and batch data sources.
- **Processing Pattern:** Ensures the target always reflects the latest complete view of source data.

  **Example syntax:**

  `CREATE OR REPLACE MATERIALIZED VIEW mview AS`<br>
  `SELECT * FROM <batch source>`

---

#### Live Views
- **Purpose:** Used for temporary, intermediate transformations.
- **Use Case:** Ideal for data quality checks, filtering, or shaping data within a pipeline, where the output doesn’t need to be stored long-term.
- **Behavior:**
  - Exists only during the pipeline run.
  - Not persisted to the catalog.
  - Used as logical layers to break down complex transformations.

  **Example syntax:**

  `CREATE TEMPORARY LIVE VIEW my_temp_view AS <query>`


### DLT Expectations for Data Quality
Delta Live Tables allows you to define data quality rules using expectations, which are enforced through the CONSTRAINT keyword in SQL or decorators in Python. These rules ensure that the data meets specified conditions before being processed further.

**Defining Constraints**
Use `CONSTRAINT <name> EXPECT (<condition>)` to define data rules.

⚠️ **Handling Violations with `ON VIOLATION` Clause**

- **DROP ROW:** Excludes rows that do not meet the condition.

  `CONSTRAINT <constraint_name> EXPECT (<condition>) ON VIOLATION DROP ROW`

- **FAIL UPDATE:** Fails the pipeline run if any row violates the condition.

  `CONSTRAINT <constraint_name> EXPECT (<condition>)ON VIOLATION FAIL UPDATE`

- **Default:** If ON VIOLATION is not specified, violations are logged but the pipeline continues processing.