# 3. Querying and Transforming Your Data (ETL)

Once your data is in the Lakehouse, you need to query it and build transformation pipelines. Databricks offers two primary methods for orchestrating this work:

1. **Databricks Jobs:** An imperative approach, ideal for simple, scheduled tasks.
2. **Delta Live Tables:** A declarative approach, ideal for building robust, observable data pipelines.

## Writing Queries in the Databricks SQL Editor

The **SQL Editor** is your home for data exploration. You can write standard SQL to query any table, and the underlying Serverless SQL Warehouse provides best-in-class performance. You can also review the query profile to troubleshoot performance bottlenecks and see query history.

### To Try It Yourself:
1. In the left navigation bar, select **SQL Editor**.
2. Ensure a SQL Warehouse is selected in the top-right.
3. Run the query below to explore the clean, aggregated data produced by one of the demo pipelines.

In [0]:
%sql
-- This is the final 'gold' table from the bike pipeline demo, ready for BI.
SELECT * FROM main.dbdemos_pipeline_bike.rides;

## Orchestration Method 1: Databricks Jobs (The Imperative Approach)

A **Job** is the simplest way to run a notebook, script, or SQL query on a schedule. This is an *imperative* approach, where you define the steps to be executed in order. Think of it as a powerful "cron job" for Databricks, perfect for straightforward, routine tasks.

### To See How it Works:

You can schedule any notebook to run as a Job by clicking the **Schedule** button in the top-right corner of the notebook UI. This will take you to the Jobs UI where you can define the schedule and compute.

##Databricks Workflows: Build, Run, and Manage ETL, ML, and Analytics Video
[![Video Thumbnail](https://img.youtube.com/vi/MWGmsnnaGLY/0.jpg)](https://www.youtube.com/watch?v=MWGmsnnaGLY "Databricks Workflows: Build, Run, and Manage ETL, ML, and Analytics")

📖 **Resource:** [Databricks Workflows Quickstart](https://docs.databricks.com/en/workflows/jobs/jobs-quickstart.html)

## Orchestration Method 2: Delta Live Tables (The Declarative Approach)

For building robust data pipelines, **Delta Live Tables (DLT)** is the modern, recommended approach. It's a *declarative* framework: instead of defining the *steps* of your pipeline, you simply define the *end state* of your tables using standard SQL or Python.

DLT automatically manages the underlying infrastructure, orchestration, data quality monitoring, and error handling.

Your setup script already deployed a DLT pipeline from the `pipeline-bike` demo!

### To Explore the DLT Pipeline:

1. **See the Definition:** In your workspace, navigate to the `pipeline-bike` demo folder and open the **`01-DLT-Pipeline-SQL`** notebook. This is the simple SQL code that defines the entire pipeline.
2. **See it Running:** Use the link generated by the setup script (or go to `Workflows > Delta Live Tables`) to see the live pipeline graph. You can monitor data flowing through the bronze, silver, and gold tables and see data quality scores.

### What is Delta Live Tables? (Video)

[![Video Thumbnail](https://img.youtube.com/vi/rr0I6AMwqS0/0.jpg)](https://www.youtube.com/watch?v=rr0I6AMwqS0 "What is Delta Live Tables?")

## Advanced SQL Techniques and Performance Optimization

### Query Optimization Tips

In [0]:
%sql
-- Example: Using partitioning for large tables
CREATE TABLE main.default.sales_data (
    transaction_id STRING,
    amount DECIMAL(10,2),
    customer_id STRING,
    sale_date DATE
)
USING DELTA
PARTITIONED BY (sale_date);

-- Example: Analyzing query performance
EXPLAIN EXTENDED 
SELECT customer_id, SUM(amount) as total_sales
FROM main.default.sales_data 
WHERE sale_date >= '2024-01-01'
GROUP BY customer_id;

## Comprehensive Resource Library

### 📚 **Official Documentation**
* [SQL Analytics and Warehousing Complete Guide](https://docs.databricks.com/en/sql/index.html)
* [Delta Live Tables Documentation](https://docs.databricks.com/en/delta-live-tables/index.html)
* [Databricks Workflows (Jobs) Guide](https://docs.databricks.com/en/workflows/index.html)
* [SQL Reference Guide](https://docs.databricks.com/en/sql/language-manual/index.html)
* [Query Optimization Best Practices](https://docs.databricks.com/en/optimizations/index.html)
* [Data Quality Monitoring](https://docs.databricks.com/en/delta-live-tables/expectations.html)

### 🔧 **Tools and Integrations**
* [dbt Integration with Databricks](https://docs.databricks.com/en/partners/prep/dbt.html)