With Dynamic Tables, customers provide a query and Snowflake automatically materializes the results of that query.

That means, instead of creating a separate target table and writing code to transform source data and update the data in that table, you can define the target table as a Dynamic Table, specifying the query that performs the transformation and just forget about the scheduling and orchestration.

The user specifies a minimum acceptable freshness in the result (target lag), and Snowflake automatically tries to meet that target, further enhancing the flexibility and control data engineers can have over their pipelines without the normally associated complexity.


For our first dynamic table we will extract the sales information from the salesdata table and join it with customer information to build the customer_sales_data_history, note that we are extracting raw json data(schema on read) and transforming it into meaningful columns and data type

In [None]:
use database SNOW_DYNAMIC_TABLES_DE;
use schema data;

For our dynamic tables, we need to specify a warehouse. 

Dynamic tables require virtual warehouses to refresh - that is, run queries against base objects when they are initialized and refreshed, including both scheduled and manual refreshes. These operations use compute resources, which consume credits.

We're going to create a warehouse specifically for our dynamic tables

In [None]:
CREATE WAREHOUSE DYNAMIC_TABLE_WH
WAREHOUSE_TYPE = STANDARD
  WAREHOUSE_SIZE = XSMALL
  AUTO_SUSPEND = 1;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE customer_sales_data_history
    LAG='DOWNSTREAM'
    WAREHOUSE=DYNAMIC_TABLE_WH
AS
select 
    s.custid as customer_id,
    c.cname as customer_name,
    s.purchase:"prodid"::number(5) as product_id,
    s.purchase:"purchase_amount"::number(10) as saleprice,
    s.purchase:"quantity"::number(5) as quantity,
    s.purchase:"purchase_date"::date as salesdate
from
    cust_info c inner join salesdata s on c.custid = s.custid
;

Dynamic table refresh is triggered based on how out of date the data might be, or what is commonly referred to as target lag. The target lag for a dynamic table is measured relative to the base tables at the root of the graph, not the dynamic tables directly upstream. Snowflake schedules refreshes to keep the actual lag of your dynamic tables below their target lag. The duration of each refresh depends on the query, data pattern, and warehouse size. When choosing a target lag, consider the time needed to refresh each dynamic table in a chain to the root. If you don’t, some refreshes might be skipped, leading to a higher actual lag.

To see the graph of tables connected to your dynamic table, see Use Snowsight to examine the graph of dynamic tables.

Target lag is specified in one of following ways:

1) Measure of freshness: Defines the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables.

> The following example sets the product dynamic table to refresh and maintain freshness every hour:

> ```ALTER DYNAMIC TABLE product SET TARGET_LAG = '1 hour';```

2) Downstream: Specifies that the dynamic table should refresh on demand when other dependent dynamic tables refresh. This refresh can be triggered by a manual or scheduled refresh of a downstream dynamic table.

> In the following example, product is based on other dynamic tables and is set to refresh based on the target lag of its downstream dynamic tables:

> ```ALTER DYNAMIC TABLE product SET TARGET_LAG = DOWNSTREAM;```

Target lag is inversely proportional to the dynamic table’s refresh frequency: frequent refreshes imply a lower lag.

In [None]:
-- quick check
select * from customer_sales_data_history limit 10;
select count(*) from customer_sales_data_history;

Now, lets combine these results with the product table and create a transformation.

In [None]:
CREATE OR REPLACE DYNAMIC TABLE salesreport
    LAG = '1 MINUTE'
    WAREHOUSE=DYNAMIC_TABLE_WH
AS
    Select
        t1.customer_id,
        t1.customer_name, 
        t1.product_id,
        p.pname as product_name,
        t1.saleprice,
        t1.quantity,
        (t1.saleprice/t1.quantity) as unitsalesprice,
        t1.salesdate as CreationTime,
        customer_id || '-' || t1.product_id  || '-' || t1.salesdate AS CUSTOMER_SK,
        LEAD(CreationTime) OVER (PARTITION BY t1.customer_id ORDER BY CreationTime ASC) AS END_TIME
    from 
        customer_sales_data_history t1 inner join prod_stock_inv p 
        on t1.product_id = p.pid
       
;

In [None]:
-- Another quick check
select * from salesreport limit 10;
select count(*) from salesreport;

Let's test this DAG by adding some raw data in the base tables.

In [None]:
-- Add new records
insert into salesdata select * from table(gen_cust_purchase(10000,2));

-- Check raw base table
select count(*) from salesdata;


In [None]:

-- Check Dynamic Tables after a minute
select count(*) from customer_sales_data_history;

While we're waiting, lets open a new window and check the dynamic table graph and refresh history in Snowsight.

That's it, we created a DAG using Dynamic Tables. It runs whenever there is data in the raw base tables and infers the lag based on the downstream dynamic tables lag using the LAG parameter as "DOWNSTREAM". In this example the CUSTOMER_SALES_DATA_HISTORY table will refresh based on the lag of its downstream table ("1 Minute") and data in the raw table (SALESDATA).