# Dynamic Tables Demo
**Understood for All - Hands On Lab**

---

## What are Dynamic Tables?

Dynamic Tables let you define **WHAT** you want (a SELECT statement), and Snowflake handles **HOW** (scheduling, refreshes, dependencies).

**Benefits:**
- No tasks to schedule
- No streams to manage
- No failure handling to write
- Automatic dependency tracking
- Visual DAG in Snowsight

---

**Demo Flow (~15 min):**
1. Setup database/schema
2. Create fake data generators
3. Generate base tables
4. Create Dynamic Tables pipeline
5. View DAG in Snowsight
6. Insert data and watch auto-refresh

## Setup: Connect to Snowflake

In [None]:
from snowflake.snowpark.context import get_active_session

session = get_active_session()
print(f"Connected as: {session.get_current_user()}")
print(f"Role: {session.get_current_role()}")

## Section 1: Create Database and Schema

We'll create a dedicated schema for this demo.

In [None]:
DB_NAME = 'UNDERSTOOD_DEMO'
SCHEMA_NAME = 'DYNAMIC_TABLES'
WAREHOUSE_NAME = 'UNDERSTOOD_DEMO_WH'  # Created by 0_setup.ipynb

session.sql(f"USE DATABASE {DB_NAME}").collect()
session.sql(f"USE SCHEMA {SCHEMA_NAME}").collect()
session.sql(f"USE WAREHOUSE {WAREHOUSE_NAME}").collect()

print(f"Database: {DB_NAME}")
print(f"Schema: {SCHEMA_NAME}")
print(f"Warehouse: {WAREHOUSE_NAME}")

---
## Section 2: Create Fake Data Generators

We'll create Python UDFs that generate realistic fake data using the `Faker` library.

**Three generators:**
- `gen_cust_info` - Customer names and spend limits
- `gen_prod_inv` - Product names and stock levels
- `gen_cust_purchase` - Sales transactions

### Customer Generator
Creates fake customer names with random spending limits using the `Faker` library.

In [None]:
# Customer data generator
session.sql("""
CREATE OR REPLACE FUNCTION gen_cust_info(num_records NUMBER)
RETURNS TABLE (custid NUMBER(10), cname VARCHAR(100), spendlimit NUMBER(10,2))
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='CustTab'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random

fake = Faker()

class CustTab:
    def process(self, num_records):
        customer_id = 1000
        for _ in range(num_records):
            custid = customer_id + 1
            cname = fake.name()
            spendlimit = round(random.uniform(1000, 10000), 2)
            customer_id += 1
            yield (custid, cname, spendlimit)
$$
""").collect()
print("Created: gen_cust_info")

### Product Generator
Creates fake product names with stock levels.

In [None]:
# Product data generator
session.sql("""
CREATE OR REPLACE FUNCTION gen_prod_inv(num_records NUMBER)
RETURNS TABLE (pid NUMBER(10), pname VARCHAR(100), stock NUMBER(10,2), stockdate DATE)
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='ProdTab'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random
from datetime import datetime, timedelta

fake = Faker()

class ProdTab:
    def process(self, num_records):
        product_id = 100
        for _ in range(num_records):
            pid = product_id + 1
            pname = fake.catch_phrase()
            stock = round(random.uniform(500, 1000), 0)
            current_date = datetime.now()
            min_date = current_date - timedelta(days=90)
            stockdate = fake.date_between_dates(min_date, current_date)
            product_id += 1
            yield (pid, pname, stock, stockdate)
$$
""").collect()
print("Created: gen_prod_inv")

### Sales Generator
Creates fake sales transactions with product IDs, quantities, and amounts stored as JSON (VARIANT type).

In [None]:
# Sales data generator
session.sql("""
CREATE OR REPLACE FUNCTION gen_cust_purchase(num_records NUMBER, ndays NUMBER)
RETURNS TABLE (custid NUMBER(10), purchase VARIANT)
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='genCustPurchase'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random
from datetime import datetime, timedelta

fake = Faker()

class genCustPurchase:
    def process(self, num_records, ndays):
        for _ in range(num_records):
            c_id = fake.random_int(min=1001, max=1999)
            current_date = datetime.now()
            min_date = current_date - timedelta(days=ndays)
            pdate = fake.date_between_dates(min_date, current_date)
            purchase = {
                'prodid': fake.random_int(min=101, max=199),
                'quantity': fake.random_int(min=1, max=5),
                'purchase_amount': round(random.uniform(10, 1000), 2),
                'purchase_date': pdate
            }
            yield (c_id, purchase)
$$
""").collect()
print("Created: gen_cust_purchase")

---
## Section 3: Create Base Tables

Now we'll generate fake data:
- **1,000 customers**
- **100 products**
- **10,000 sales transactions**

In [None]:
# Generate 1,000 customers
session.sql("CREATE OR REPLACE TABLE cust_info AS SELECT * FROM TABLE(gen_cust_info(1000)) ORDER BY 1").collect()
print("Created: cust_info (1,000 customers)")

# Generate 100 products
session.sql("CREATE OR REPLACE TABLE prod_stock_inv AS SELECT * FROM TABLE(gen_prod_inv(100)) ORDER BY 1").collect()
print("Created: prod_stock_inv (100 products)")

# Generate 10,000 sales transactions
session.sql("CREATE OR REPLACE TABLE salesdata AS SELECT * FROM TABLE(gen_cust_purchase(10000, 10))").collect()
print("Created: salesdata (10,000 transactions)")

In [None]:
# Show row counts
session.sql("""
SELECT 'cust_info' as table_name, COUNT(*) as row_count FROM cust_info
UNION ALL SELECT 'prod_stock_inv', COUNT(*) FROM prod_stock_inv
UNION ALL SELECT 'salesdata', COUNT(*) FROM salesdata
""").to_pandas()

In [None]:
# Preview customers
session.sql("SELECT * FROM cust_info LIMIT 5").to_pandas()

In [None]:
# Preview products
session.sql("SELECT * FROM prod_stock_inv LIMIT 5").to_pandas()

In [None]:
# Preview sales (note the VARIANT column with nested JSON)
session.sql("SELECT * FROM salesdata LIMIT 5").to_pandas()

---
## Section 4: Create Dynamic Tables

Now the magic! We'll create two Dynamic Tables that form a pipeline:

```
cust_info ─────┐
               ├──► customer_sales_data_history ──► salesreport
salesdata ─────┘                                        │
                                                        │
prod_stock_inv ─────────────────────────────────────────┘
```

**Key LAG settings:**
- `LAG = 'DOWNSTREAM'` → "Refresh me when a downstream DT needs fresh data"
- `LAG = '1 MINUTE'` → "Keep this data no more than 1 minute stale"

### Dynamic Table 1: Customer Sales History
This joins customers with their purchases. `LAG = 'DOWNSTREAM'` means it only refreshes when another Dynamic Table downstream needs updated data - saving compute costs.

In [None]:
# Dynamic Table 1: Join customers with their purchases
# LAG='DOWNSTREAM' means it refreshes when salesreport needs it

session.sql(f"""
CREATE OR REPLACE DYNAMIC TABLE customer_sales_data_history
    LAG = 'DOWNSTREAM'
    WAREHOUSE = {WAREHOUSE_NAME}
AS
SELECT 
    s.custid as customer_id,
    c.cname as customer_name,
    s.purchase:"prodid"::NUMBER(5) as product_id,
    s.purchase:"purchase_amount"::NUMBER(10,2) as saleprice,
    s.purchase:"quantity"::NUMBER(5) as quantity,
    s.purchase:"purchase_date"::DATE as salesdate
FROM cust_info c 
INNER JOIN salesdata s ON c.custid = s.custid
""").collect()

print("Created: customer_sales_data_history (LAG=DOWNSTREAM)")

In [None]:
# Preview the first Dynamic Table
session.sql("SELECT * FROM customer_sales_data_history LIMIT 10").to_pandas()

### Dynamic Table 2: Sales Report
This is the final report with product names. `LAG = '1 MINUTE'` means Snowflake guarantees this data is never more than 1 minute stale - it automatically refreshes!

In [None]:
# Dynamic Table 2: Sales report with product details
# LAG='1 MINUTE' means Snowflake keeps this no more than 1 minute stale

session.sql(f"""
CREATE OR REPLACE DYNAMIC TABLE salesreport
    LAG = '1 MINUTE'
    WAREHOUSE = {WAREHOUSE_NAME}
AS
SELECT
    t1.customer_id,
    t1.customer_name,
    t1.product_id,
    p.pname as product_name,
    t1.saleprice,
    t1.quantity,
    ROUND(t1.saleprice / NULLIF(t1.quantity, 0), 2) as unit_price,
    t1.salesdate
FROM customer_sales_data_history t1 
INNER JOIN prod_stock_inv p ON t1.product_id = p.pid
""").collect()

print("Created: salesreport (LAG=1 MINUTE)")

In [None]:
# Preview the final sales report
session.sql("SELECT * FROM salesreport LIMIT 10").to_pandas()

---
## Section 5: View the DAG in Snowsight

**Now go to Snowsight UI:**

1. Navigate to **Data** → **Databases** → **UNDERSTOOD_DEMO** → **DYNAMIC_TABLES**
2. Click on **salesreport**
3. Click the **Graph** tab

You'll see the visual dependency chain showing how data flows through the pipeline!

---
## Section 6: Demonstrate Auto-Refresh

Now let's prove the Dynamic Table automatically updates when source data changes!

In [None]:
# Check current counts BEFORE inserting new data
session.sql("""
SELECT 'salesdata (source)' as table_name, COUNT(*) as row_count FROM salesdata
UNION ALL 
SELECT 'salesreport (DT)', COUNT(*) FROM salesreport
""").to_pandas()

### Let's prove it works!
We'll insert new sales data and watch the Dynamic Table automatically pick up the changes.

In [None]:
# Insert 500 new sales transactions
session.sql("INSERT INTO salesdata SELECT * FROM TABLE(gen_cust_purchase(500, 2))").collect()
print("Inserted 500 new sales records!")

In [None]:
# Check source immediately - it's updated
session.sql("SELECT 'salesdata AFTER insert' as status, COUNT(*) as row_count FROM salesdata").to_pandas()

### Wait ~1 minute for auto-refresh

The Dynamic Table has a **1-minute lag**, so within 1 minute, it will auto-refresh.

Run the next cell after waiting to see the updated count!

In [None]:
# Check if salesreport has been refreshed
session.sql("SELECT 'salesreport AFTER refresh' as status, COUNT(*) as row_count FROM salesreport").to_pandas()

In [None]:
# View refresh history - shows when each refresh happened
session.sql("""
SELECT 
    NAME,
    STATE,
    REFRESH_START_TIME,
    REFRESH_END_TIME,
    DATEDIFF('second', REFRESH_START_TIME, REFRESH_END_TIME) as duration_seconds
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY())
WHERE NAME = 'SALESREPORT'
ORDER BY REFRESH_START_TIME DESC
LIMIT 5
""").to_pandas()

### Check the refresh history
Snowflake tracks every refresh - you can see when it happened and how long it took.

---
## Summary

**What we demonstrated:**

1. **Declarative pipelines** - We wrote SELECT statements, Snowflake built the pipeline
2. **Automatic dependencies** - The DAG is tracked automatically
3. **Incremental refresh** - Snowflake refreshes incrementally when possible (no full recompute)
4. **No orchestration code** - No tasks, no streams, no failure handling
5. **Built-in monitoring** - Refresh history available via INFORMATION_SCHEMA

---

**How Understood could use this:**
- User engagement metrics pipeline
- Content performance aggregations
- Donation/grant reporting rollups
- Any multi-step data transformation that needs to stay fresh

---
## Cleanup (Optional)

Run this cell to remove all demo objects.

In [None]:
# Uncomment to clean up
# session.sql("DROP DYNAMIC TABLE IF EXISTS salesreport").collect()
# session.sql("DROP DYNAMIC TABLE IF EXISTS customer_sales_data_history").collect()
# session.sql("DROP TABLE IF EXISTS salesdata").collect()
# session.sql("DROP TABLE IF EXISTS prod_stock_inv").collect()
# session.sql("DROP TABLE IF EXISTS cust_info").collect()
# session.sql("DROP SCHEMA IF EXISTS DYNAMIC_TABLES").collect()
# print("Cleanup complete!")