### Lesson
- In this lesson, we will create a streaming table to incrementally ingest files from a volume using Auto Loader with SQL

### Learning Objectives
- Objective 1: Create streaming tables in Databricks SQL for incremental data ingestion
- Objective 2: Refresh streaming tables using the REFRESH statement

### Recommendation
- The CREATE STREAMING TABLE SQL command is the recommended alternative to the legacy COPY INTO SQL command for incremental ingestion form cloud object storage.
A streaming table is a table that is registered to the UNity Catalog with extra support for streaming/incremental data processing.

### 01 Run the Command below to setup the data for the lab
- 2 folders will be created in workspace.data_engineering_labs.v01, with csv files in them.
  - csv_files_autoloader_source
    - 000.csv (3150) rows
  - csv_files_autoloader_staging
    - 001.csv  (1000) rows
    - 002.csv (2000) rows

In [0]:
%run ../01_Data_Engineer_Learning_Plan/Lab-Setup/lab-setup-02

### 02 Run the query below to view the data in csv_files_autoloader_source.
- Note that it has 3150 rows

In [0]:
%sql
SELECT * 
FROM read_files(
  '/Volumes/workspace/data_engineering_labs_00/v01/csv_files_autoloader_source/',
  format => 'CSV',
  sep => ',',
  header => 'true'
)

### 03 Objective 1: Create a Streaming Table in Databricks SQL
- The code will create a streaming table that incrementally ingests new data every week
- The incremental batch ingestion will automatically detect new records in the data source and ignores records that have already been ingested.

* Note: Seems like there is a change as compared to tutorial in databricks
  -  Tutorial says that a pipeline will be automatically created.
  - However, now after running code, it seems like we must schedule a pipeline.
    - This is in line with what we have learn with LSDP(which is pretty new)
  - Solution is to create a Pipeline and paste the below cell there to run

In [0]:
%sql
-- Creates the Streaming table
-- STREAM read_files enables autoloader

CREATE OR REFRESH STREAMING TABLE workspace.data_engineering_labs_00.sql_csv_autoloader
SCHEDULE EVERY 1 WEEK AS

SELECT * 
FROM
STREAM read_files('/Volumes/workspace/data_engineering_labs_00/v01/csv_files_autoloader_source/',
format => 'CSV',
sep = ',',
header => true
);

- View the table after running in pipeline.

In [0]:
%sql
SELECT * 
FROM workspace.data_engineering_labs_00.sql_csv_autoloader

- We can see that the table is a streaming table
- My refresh schedule seems to be manual and not 1 week, (Maybe as i did not set it to be that way in the pipeline, despite the code)
- Again, we should use LSDP

In [0]:
%sql
DESCRIBE TABLE EXTENDED workspace.data_engineering_labs_00.sql_csv_autoloader

In [0]:
%sql
DESCRIBE HISTORY workspace.data_engineering_labs_00.sql_csv_autoloader

### 04 Objective 2: Try to run Refresh
- Download a csv file from staging folder  
- Upload it to the source folder
- Manually refresh the streaming table

In [0]:
%sql
REFRESH STREAMING TABLE workspace.data_engineering_labs_00.sql_csv_autoloader