
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# 2B -  Create Streaming Tables with SQL using Auto Loader

In this demonstration we will create a streaming table to incrementally ingest files from a volume using Auto Loader with SQL. 

When you create a streaming table using the CREATE OR REFRESH STREAMING TABLE statement, the initial data refresh and population begin immediately. These operations do not consume DBSQL warehouse compute. Instead, streaming table rely on serverless DLT for both creation and refresh. A dedicated serverless DLT pipeline is automatically created and managed by the system for each streaming table.

### Learning Objectives

By the end of this lesson, you should be able to:
- Create streaming tables in Databricks SQL for incremental data ingestion.
- Refresh streaming tables using the REFRESH statement.

### RECOMMENDATION

The CREATE STREAMING TABLE SQL command is the recommended alternative to the legacy COPY INTO SQL command for incremental ingestion from cloud object storage. Databricks recommends using streaming tables to ingest data using Databricks SQL. 

A streaming table is a table registered to Unity Catalog with extra support for streaming or incremental data processing. A DLT pipeline is automatically created for each streaming table. You can use streaming tables for incremental data loading from Kafka and cloud object storage.

## REQUIRED - SELECT CLASSIC COMPUTE

**NOTE: We'll use a classic compute cluster to set up the demonstration files, as Python is required. After that, you'll need to switch to a SQL warehouse to create the streaming tables using SQL.**

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default and you have a Shared SQL warehouse.

<!-- ![Select Cluster](./Includes/images/selecting_cluster_info.png) -->

Follow these steps to select the classic compute cluster:


1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.


## A. Classroom Setup

Run the following cell to configure your working environment for this notebook.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course in the lab environment.

**TROUBLESHOOTING:** If you select a SQL Warehouse, an error will be returned since Python is used for the setup.

In [0]:
%run ./Includes/Classroom-Setup-Auto-Loader

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


----------------------------------------------------------------------------------------
Directory /Volumes/dbacademy/ops/labuser10983516_1758894578@vocareum_com/csv_demo_files already exists. No action taken.
Directory /Volumes/dbacademy/ops/labuser10983516_1758894578@vocareum_com/json_demo_files already exists. No action taken.
Directory /Volumes/dbacademy/ops/labuser10983516_1758894578@vocareum_com/xml_demo_files already exists. No action taken.
----------------------------------------------------------------------------------------



Creating volume: dbacademy.labuser10983516_1758894578.auto_loader_staging_files if not exists.

Creating volume: dbacademy.labuser10983516_1758894578.csv_files_autoloader_source if not exists.




----------------Loading files to user's volume: '/Volumes/dbacademy/labuser10983516_1758894578/csv_files_autoloader_source/'----------------
File number 1 - Copying file /Volumes/dbacademy_ecommerce/v01/raw/sales-csv/000.csv --> /Volumes/dbacademy/labuser10983516_1758894578/csv_files_autoloader_source/000.csv.

----------------Loading files to user's volume: '/Volumes/dbacademy/labuser10983516_1758894578/auto_loader_staging_files/'----------------
File number 1 - Copying file /Volumes/dbacademy_ecommerce/v01/raw/sales-csv/000.csv --> /Volumes/dbacademy/labuser10983516_1758894578/auto_loader_staging_files/000.csv.
File number 2 - Copying file /Volumes/dbacademy_ecommerce/v01/raw/sales-csv/001.csv --> /Volumes/dbacademy/labuser10983516_1758894578/auto_loader_staging_files/001.csv.
File number 3 - Copying file /Volumes/dbacademy_ecommerce/v01/raw/sales-csv/002.csv --> /Volumes/dbacademy/labuser10983516_1758894578/auto_loader_staging_files/002.csv.


## REQUIRED - SELECT YOUR SERVERLESS SQL WAREHOUSE

**NOTE: Creating streaming tables with Databricks SQL requires a SQL warehouse.**.

<!-- ![Select Cluster](./Includes/images/selecting_cluster_info.png) -->

Before executing cells in this notebook, please select the **SHARED SQL WAREHOUSE** in the lab. Follow these steps:

1. Navigate to the top-right of this notebook and click the drop-down to select compute (it might say **Connect**). Complete one of the following below:

   a. Under **Recent resources**, check to see if you have a **shared_warehouse SQL**. If you do, select it.

   b. If you do not have a **shared_warehouse** under **Recent resources**, complete the following:

    - In the same drop-down, select **More**.

    - Then select the **SQL Warehouse** button.

    - In the drop-down, make sure **shared_warehouse** is selected.

    - Then, at the bottom of the pop-up, select **Start and attach**.

</br>
   <img src="./Includes/images/sql_warehouse.png" alt="SQL Warehouse" width="600">

2. Run the following cell to configure your working environment for this notebook.

    **NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course in the lab environment.

In [0]:
%run ./Includes/Classroom-Setup-SQL-Auto-Loader

3. View the default catalog and schema. Confirm the default catalog is **dbacademy** and the default schema is your **labuser** schema.

In [0]:
SELECT current_catalog(), current_schema()

current_catalog(),current_schema()
dbacademy,labuser10983516_1758894578


## B. Create Streaming Tables for Incremental Processing

1. Complete the following to explore the volume `/Volumes/dbacademy/your-lab-user-schema/csv_files_autoloader_source` and confirm it contains a single CSV file.

   a. Select the catalog icon on the left ![Catalog Icon](./Includes/images/catalog_icon.png).

   b. Expand the **dbacademy** catalog.

   c. Expand your **labuser** schema.

   d. Expand **Volumes**.

   e. Expand the **csv_files_autoloader_source** volume.

   f. Confirm it contains a single CSV file named **000.csv**.

2. Run the query below to view the data in the CSV file(s) in your cloud storage location. Notice that it was returned in tabular format and contains 3,149 rows.

In [0]:
SELECT *
FROM read_files(
  '/Volumes/dbacademy/' || DA.schema_name || '/csv_files_autoloader_source',
  format => 'CSV',
  sep => '|',
  header => true
);

order_id,email,transactions_timestamp,total_item_quantity,purchase_revenue_in_usd,unique_items,items,_rescued_data
298592,sandovalaustin@holder.com,1592629288475307,1,850.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_F', 'item_name': 'Standard Full Mattress', 'item_revenue_in_usd': 850.5, 'price_in_usd': 945.0, 'quantity': 1}]",
299024,msmith@monroe.com,1592636869915092,2,1092.6,2,"[{'coupon': 'NEWBED10', 'item_id': 'M_PREM_T', 'item_name': 'Premium Twin Mattress', 'item_revenue_in_usd': 985.5, 'price_in_usd': 1095.0, 'quantity': 1}, {'coupon': 'NEWBED10', 'item_id': 'P_DOWN_S', 'item_name': 'Standard Down Pillow', 'item_revenue_in_usd': 107.10000000000001, 'price_in_usd': 119.0, 'quantity': 1}]",
300048,robertstimothy@hotmail.com,1592649862529478,1,1075.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_K', 'item_name': 'Standard King Mattress', 'item_revenue_in_usd': 1075.5, 'price_in_usd': 1195.0, 'quantity': 1}]",
298711,lovejamie@yahoo.com,1592631406799948,1,850.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_F', 'item_name': 'Standard Full Mattress', 'item_revenue_in_usd': 850.5, 'price_in_usd': 945.0, 'quantity': 1}]",
301760,jennifer7054@gmail.com,1592661071882666,1,940.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_Q', 'item_name': 'Standard Queen Mattress', 'item_revenue_in_usd': 940.5, 'price_in_usd': 1045.0, 'quantity': 1}]",
302809,ywhite@kane.org,1592665563660982,1,1075.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_K', 'item_name': 'Standard King Mattress', 'item_revenue_in_usd': 1075.5, 'price_in_usd': 1195.0, 'quantity': 1}]",
309136,karen61@hotmail.com,1592689638083947,1,1795.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_PREM_K', 'item_name': 'Premium King Mattress', 'item_revenue_in_usd': 1795.5, 'price_in_usd': 1995.0, 'quantity': 1}]",
303941,deborah18@conrad-gallagher.com,1592669885794924,1,850.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_F', 'item_name': 'Standard Full Mattress', 'item_revenue_in_usd': 850.5, 'price_in_usd': 945.0, 'quantity': 1}]",
305920,khanedwin@gmail.com,1592676863608194,1,1075.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_STAN_K', 'item_name': 'Standard King Mattress', 'item_revenue_in_usd': 1075.5, 'price_in_usd': 1195.0, 'quantity': 1}]",
298795,samantha4354@hotmail.com,1592632916516773,1,985.5,1,"[{'coupon': 'NEWBED10', 'item_id': 'M_PREM_T', 'item_name': 'Premium Twin Mattress', 'item_revenue_in_usd': 985.5, 'price_in_usd': 1095.0, 'quantity': 1}]",



#### Create a STREAMING TABLE using Databricks SQL
3. Your goal is to create an incremental pipeline that only ingests new files (instead of using traditional batch ingestion). You can achieve this by using [streaming tables in Databricks SQL](https://docs.databricks.com/aws/en/dlt/dbsql/streaming) (Auto Loader).

   - The SQL code below creates a streaming table that will be scheduled to incrementally ingest only new data every week. 
   
   - A pipeline is automatically created for each streaming table. You can use streaming tables for incremental data loading from Kafka and cloud object storage.

   **NOTE:** Incremental batch ingestion automatically detects new records in the data source and ignores records that have already been ingested. This reduces the amount of data processed, making ingestion jobs faster and more efficient in their use of compute resources.

   **REQUIRED: Please insert the path of your csv_files_autoloader_source volume in the `read_files` function. This process will take about a minute to run and set up the incremental ingestion pipeline.**

In [0]:
-- YOU WILL HAVE TO REPLACE THE EXAMPLE PATH BELOW WITH THE PATH TO YOUR csv_file_autoloader_source VOLUME.
-- You can find the volume in your navigation bar on the right and insert the path
-- OR you can replace `your-labuser-name` with your specific labuser name (name of your schema)

CREATE OR REFRESH STREAMING TABLE sql_csv_autoloader
SCHEDULE EVERY 1 WEEK     -- Scheduling the refresh is optional
AS
SELECT *
FROM STREAM read_files(
  '/Volumes/dbacademy/labuser10983516_1758894578/csv_files_autoloader_source',  -- Insert the path to you csv_files_autoloader_source volume (example shown)
  format => 'CSV',
  sep => '|',
  header => true
);

Name,Type
order_id,int
email,string
transactions_timestamp,bigint
total_item_quantity,int
purchase_revenue_in_usd,double
unique_items,int
items,string
_rescued_data,string


4. Complete the following to view the streaming table in your catalog.

   a. Select the catalog icon on the left ![Catalog Icon](./Includes/images/catalog_icon.png).

   b. Expand the **dbacademy** catalog.

   c. Expand your **labuser** schema.

   d. Expand your **Tables**.

   e. Find the **sql_csv_autoloader** table. Notice that the Delta streaming table icon is slightly different from a traditional Delta table:
    
    ![Streaming table icon](./Includes/images/streaming_table_icon.png)

5. Run the cell below to view the streaming table. Confirm that the results contain **3,149 rows**.

In [0]:
SELECT *
FROM sql_csv_autoloader;

org.apache.spark.sql.catalyst.ExtendedAnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `sql_csv_autoloader` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 2 pos 5;
'Project [*]
+- 'UnresolvedRelation [sql_csv_autoloader], [], false

	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.tableNotFound(package.scala:94)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:345)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:307)
	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:303)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNod

6. Describe the STREAMING TABLE and view the results. Notice the following:

- Under **Detailed Table Information**, notice the following rows:
  - **View Text**: The query that created the table.
  - **Type**: Specifies that it is a STREAMING TABLE.
  - **Provider**: Indicates that it is a Delta table.

- Under **Refresh Information**, you can see specific refresh details. Example shown below:

##### Refresh Information

| Field                   | Value                                                                                                                                         |
|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| Last Refreshed          | 2025-06-17T16:12:49.168Z                                                                                                                      |
| Last Refresh Type       | INCREMENTAL                                                                                                                                   |
| Latest Refresh Status   | Succeeded                                                                                                                                     |
| Latest Refresh          | https://example.url.databricks.com/#joblist/pipelines/bed6c715-a7c1-4d45-b57c-4fdac9f956a7/updates/9455a2ef-648c-4339-b61e-d282fa76a92c (this is the path to the Declarative Pipeline that was created for you)|
| Refresh Schedule        | EVERY 1 WEEK                                                                                                                                 |

In [0]:
DESCRIBE TABLE EXTENDED sql_csv_autoloader;

7. The `DESCRIBE HISTORY` statement displays a detailed list of all changes, versions, and metadata associated with a Delta streaming table, including information on updates, deletions, and schema changes.

    Run the cell below and view the results. Notice the following:

    - In the **operation** column, you can see that a streaming table performs three operations: **CREATE TABLE**, **DLT SETUP** and **STREAMING UPDATE**.
    
    - Scroll to the right and find the **operationMetrics** column. In row 1 (Version 2 of the table), the value shows that the **numOutputRows** is 3149, indicating that 3149 rows were added to the **sql_csv_autoloader** table.

In [0]:
DESCRIBE HISTORY sql_csv_autoloader;

8. Complete the following steps to manually add another file to your cloud storage location:  
   `/Volumes/dbacademy/your-lab-user-schema/csv_files_autoloader_source`.

   a. Click the catalog icon on the left ![Catalog Icon](./Includes/images/catalog_icon.png).

   b. Expand the **dbacademy** catalog.

   c. Expand your **labuser** schema.

   d. Expand **Volumes**.

   e. Open the **auto_loader_staging_files** volume.

   f. Right-click on the **001.csv** file and select **Download volume file** to download the file locally.

   g. Upload the downloaded **001.csv** file to the **csv_files_autoloader_source** volume:

      - Right-click on the **csv_files_autoloader_source** volume. 

      - Select **Upload to volume**.  

      - Choose and upload the **001.csv** file from your local machine.

   h. Confirm your volume **csv_files_autoloader_source** contains two CSV files (**000.csv** and **001.csv**).


    **NOTE:** Depending on your laptop’s security settings, you may not be able to download files locally.


9. Next, manually refresh the STREAMING TABLE using `REFRESH STREAMING TABLE table-name`. 

- [Refresh a streaming table](https://docs.databricks.com/aws/en/dlt/dbsql/streaming#refresh-a-streaming-table) documentation

    **NOTE:** You can also go back to **Create a STREAMING TABLE using Databricks SQL (direction number 3)** and rerun that cell to incrementally ingest only new files. Once complete come back to step 8.

In [0]:
REFRESH STREAMING TABLE sql_csv_autoloader;

10. Run the cell below to view the data in the **sql_csv_autoloader** table. Notice that the table now contains **6,081 rows**.


In [0]:
SELECT *
FROM sql_csv_autoloader;

11. Describe the history of the **sql_csv_autoloader** table. Observe the following:

  - Version 3 of the streaming table includes another **STREAMING UPDATE**.

  - Expand the **operationMetrics** column and note that only **2,932 rows** were incrementally ingested into the table from the new **001.csv** file.


In [0]:
DESCRIBE HISTORY sql_csv_autoloader;

12. Drop the streaming table.

In [0]:
DROP TABLE IF EXISTS sql_csv_autoloader;

## Additional Resources

- [Streaming Tables Documentation](https://docs.databricks.com/gcp/en/dlt/streaming-tables)

- [CREATE STREAMING TABLE Syntax](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table)

- [Using Streaming Tables in Databricks SQL](https://docs.databricks.com/aws/en/dlt/dbsql/streaming)

- [REFRESH (MATERIALIZED VIEW or STREAMING TABLE)](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-refresh-full)

- [COPY INTO (legacy)](https://docs.databricks.com/aws/en/ingestion/#copy-into-legacy)

- [Lakeflow Declarative Pipelines](https://docs.databricks.com/aws/en/dlt/)
---

#### BONUS Material: Course Appendix

In the course **Appendix** folder, you'll find a demonstration using Python Auto Loader in the **A2 - Python Auto Loader** notebook. This is extra material that you can explore outside of class.


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>