
# Deep Dive into Delta Live Tables (DLT)

Thursday, March 13, 2025

[Meetup Announcement](https://www.meetup.com/warsaw-data-engineering/events/306608088/) ([Lu.ma](https://lu.ma/0ub5v16m))


# Agenda

Całkowity czas trwania spotkania: **1h 15min**


ChatGPT odpowiada:

> If you were to teach the internals of Delta Live Tables, what would it be? What topics do you consider important while presenting DLT?


> If I were teaching the internals of Delta Live Tables (DLT), I would break it down into key technical areas to help learners understand how DLT works under the hood. Here’s how I’d structure the session:


1. Hands-On & Advanced Use Cases
    * Live demo:
        * Building a real-time streaming pipeline in DLT.
        * Implementing incremental batch processing with APPLY CHANGES INTO.
    * Advanced topics:
        * Multi-hop architecture (Bronze → Silver → Gold).
        * Integrating DLT with external sources (Kafka, Event Hubs, CDC sources).
        * 🌟🌟 Using DLT with ML & Feature Engineering. 🌟🌟
1. Core concepts of Delta Live Tables (DLT)
    * Pipelines, DLT definitions (LIVE, STREAM, APPLY CHANGES INTO)
    * How does DLT compare to standard Delta Lake and Databricks Jobs?
1. How DLT executes workflows
    * Declarative vs. procedural data transformations.
    * How DLT automatically manages dependencies between tables.
    * The role of event-driven execution & orchestration.
1. DLT Engine & DAG (Directed Acyclic Graph) Execution
    * How Databricks runtime creates and optimizes DAGs.
    * Understanding pipeline DAG visualization.
    * How DLT optimizes transformations (batch & streaming).
1. Data Processing in DLT
    * Pipeline Modes: Triggered (Batch) vs. Continuous Processing (Streaming).
    * Stateful Processing & Change Data Capture (CDC):
        * APPLY CHANGES INTO (DLT-native CDC).
        * How schema evolution & enforcement works.
    * Data Quality & Expectations:
        * How expectations (CONSTRAINT and EXPECT statements) affect execution.
        * Auto-handling of bad records (quarantining & error policies).
1. Storage & Metadata Management
    * DLT and Delta Lake:
        * How DLT writes & manages Delta tables.
        * LIVE vs. TARGET table internals.
        * How DLT tracks transaction logs & versioning.
    * Auto-Scaling & Performance Optimization:
        * How DLT auto-scales clusters dynamically.
        * Optimized storage using Z-Ordering & Bloom Filters.
        * Caching mechanisms to boost performance.
1. DLT Operational Aspects
    * Monitoring & Debugging DLT Pipelines:
        * Understanding logs (event_log table).
        * Built-in observability dashboards.
        * How to debug pipeline failures effectively.
    * Security & Governance in DLT:
        * Unity Catalog integration (RBAC, lineage tracking).
        * Managing access controls & data protection.
    * Cost Optimization & Best Practices:
        * Managing DBU costs efficiently.
        * Using phased deployment for cost-effective runs.


# LinkedIn Poll

[Poll](https://www.linkedin.com/feed/update/urn:li:activity:7302063410647638016?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAutA8BdJM8iCUgt2VrqrjL8sihttmw9FE)


# Event Question

O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje 🙏

1. [Apache Paimon](https://paimon.apache.org/)
1. Kafka ingestion
1. [Liquid clustering](https://docs.databricks.com/aws/en/delta/clustering)
1. [Apache Iceberg](https://iceberg.apache.org/)

# 📢 News

Things worth watching out for...


## New members in Warsaw Data Engineering!

[You now have 578 members!](https://www.meetup.com/warsaw-data-engineering/)

1. zastanawiam sie nad rozwojem w kierunku inzyniera danych
1. Presentations about various tech
1. Znam Jacka i siedzę w tematach Spark Scala Delta Lake


## New Versions

What has changed in the tooling space we keep an eye on since we last met?

* [Databricks CLI 0.243.0](https://github.com/databricks/cli/releases/tag/v0.243.0)
* [uv 0.6.6](https://github.com/astral-sh/uv/releases/tag/0.6.6)
* [dqx 0.2.0](https://github.com/databrickslabs/dqx/releases/tag/v0.2.0)


## DLT: Set the target catalog and schema

W [Databricks CLI 0.243.0](https://github.com/databricks/cli/releases/tag/v0.243.0) pojawiło się wsparcie dla [Set the target catalog and schema](https://docs.databricks.com/aws/en/dlt/target-schema) (jako [Use schema field for pipeline in builtin template #2347](https://github.com/databricks/cli/pull/2347)).


## DAB: Add support for schemas in deployment

W [Databricks CLI 0.243.0](https://github.com/databricks/cli/releases/tag/v0.243.0) dostaliśmy [Add support for schemas in deployment bind/unbind commands #2406](https://github.com/databricks/cli/pull/2406)


# Live Coding Session


## Step 1. Intro

[What is Delta Live Tables?](https://docs.databricks.com/aws/en/delta-live-tables/)

## Step 2. Create DLT Pipeline Project

Use Databricks Asset Bundles

<br>

```bash
databricks bundle init default-python --config-file dlt-config-file.json
```


## Step 3. Deploy

<br>

```bash
databricks bundle deploy
```


## Step 4. Run

<br>

```bash
databricks bundle run dlt_demo_pipeline
```


# 💡 Ideas for Future Events


1. Expectations
1. `SHOW TABLES DROPPED`
1. Full scan na PostgreSQL z DLT?
1. [Delta Live Tables](https://docs.databricks.com/en/delta-live-tables/index.html) with uv and pydantic
1. Explore more [Pydantic](https://docs.pydantic.dev/latest/) features
1. Create a new DAB template with `uv` as the project management tool (based on `default-python` template). Start from `databricks bundle init --help`.


## 🚀 MLflow and uv

‼️ Highly unlikely we cover this topic at this meetup ‼️

1. [Faster Model Validation with uv Package Manager](https://mlflow.org/releases/2.20.0) in MLflow 2.20.0 🥳
    * [Support uv in `mlflow.models.predict`](https://github.com/mlflow/mlflow/pull/i3824) for faster model environment validation
    * [mlflow.models.predict](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.models.html#mlflow.models.predict)
    * [MLflow Models](https://mlflow.org/docs/latest/model)
1. [[FR] Support uv package installer when inferring requirements (i.e. when logging model) #12478](https://github.com/mlflow/mlflow/issues/12478) - a conversation about how uv could be used with and outside MLflow
