In [None]:
# Snowflake Dynamic Tables Demo

## Overview
This notebook demonstrates Snowflake's Dynamic Tables functionality using the TPC-H sample dataset. We'll explore:
- **Incremental refresh patterns** (cost-efficient, real-time)
- **Full refresh strategies** (accuracy-focused, batch)
- **Cascading pipeline dependencies**

## What are Dynamic Tables?
Dynamic Tables in Snowflake automatically refresh data based on changes in underlying tables, providing:
- Automated data pipeline management
- Flexible refresh strategies (incremental vs full)
- Dependency-aware refresh ordering
- Cost optimization through intelligent refresh scheduling


In [None]:
## 1. Setup and Connection

First, let's establish connection to Snowflake and set up our environment.


In [None]:
import snowflake.connector
import pandas as pd
from snowflake.connector.pandas_tools import write_pandas
import os
from datetime import datetime

# Connection parameters (update with your credentials)
conn_params = {
    'user': os.getenv('SNOWFLAKE_USER', 'your_username'),
    'password': os.getenv('SNOWFLAKE_PASSWORD', 'your_password'),
    'account': os.getenv('SNOWFLAKE_ACCOUNT', 'your_account'),
    'warehouse': 'COMPUTE_WH',
    'database': 'DEMO_DB',
    'schema': 'DYNAMIC_TABLES'
}

# Establish connection
try:
    conn = snowflake.connector.connect(**conn_params)
    cursor = conn.cursor()
    print("✅ Successfully connected to Snowflake!")
except Exception as e:
    print(f"❌ Connection failed: {e}")
    print("Please update your connection parameters above.")


In [None]:
## 2. Explore TPC-H Sample Data

Let's examine the existing TPC-H dataset in Snowflake's sample data.


In [None]:
# Explore available tables in TPC-H dataset
cursor.execute("SHOW TABLES IN SCHEMA SFC_SAMPLE_DATA.TPCH_SF1")
tables = cursor.fetchall()

print("📊 Available TPC-H Tables:")
for table in tables:
    print(f"  - {table[1]}")  # table name


In [None]:
# Examine key tables structure and sample data
key_tables = ['ORDERS', 'CUSTOMER', 'LINEITEM', 'PART']

for table in key_tables:
    print(f"\n🔍 Table: {table}")
    print("=" * 50)
    
    # Show structure
    cursor.execute(f"DESCRIBE TABLE SFC_SAMPLE_DATA.TPCH_SF1.{table}")
    columns = cursor.fetchall()
    print("Columns:")
    for col in columns[:5]:  # Show first 5 columns
        print(f"  - {col[0]}: {col[1]}")
    if len(columns) > 5:
        print(f"  ... and {len(columns) - 5} more columns")
    
    # Show row count
    cursor.execute(f"SELECT COUNT(*) FROM SFC_SAMPLE_DATA.TPCH_SF1.{table}")
    count = cursor.fetchone()[0]
    print(f"Row count: {count:,}")
    
    # Show sample data
    cursor.execute(f"SELECT * FROM SFC_SAMPLE_DATA.TPCH_SF1.{table} LIMIT 3")
    sample_data = cursor.fetchall()
    print("Sample data:")
    for row in sample_data:
        print(f"  {row[:3]}...")  # Show first 3 columns


In [None]:
## 3. Create Demo Database and Schema

Set up our demo environment for Dynamic Tables.


In [None]:
# Create demo database and schema
setup_commands = [
    "CREATE DATABASE IF NOT EXISTS DEMO_DB",
    "USE DATABASE DEMO_DB",
    "CREATE SCHEMA IF NOT EXISTS DYNAMIC_TABLES",
    "USE SCHEMA DYNAMIC_TABLES"
]

for cmd in setup_commands:
    try:
        cursor.execute(cmd)
        print(f"✅ {cmd}")
    except Exception as e:
        print(f"❌ Error executing '{cmd}': {e}")

print("\n🏗️ Demo environment setup complete!")


In [None]:
## 4. Dynamic Table Example 1: Incremental Refresh Pattern

Create a Dynamic Table that aggregates order data with **incremental refresh** for cost-efficiency and real-time updates.


In [None]:
# Create Dynamic Table with incremental refresh (LAG = '5 minutes')
incremental_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE customer_order_summary_incremental
TARGET_LAG = '5 minutes'
WAREHOUSE = COMPUTE_WH
AS
SELECT 
    c.C_CUSTKEY,
    c.C_NAME,
    c.C_NATIONKEY,
    COUNT(o.O_ORDERKEY) AS total_orders,
    SUM(o.O_TOTALPRICE) AS total_spent,
    AVG(o.O_TOTALPRICE) AS avg_order_value,
    MAX(o.O_ORDERDATE) AS last_order_date,
    CURRENT_TIMESTAMP() AS last_updated
FROM SFC_SAMPLE_DATA.TPCH_SF1.CUSTOMER c
LEFT JOIN SFC_SAMPLE_DATA.TPCH_SF1.ORDERS o ON c.C_CUSTKEY = o.O_CUSTKEY
GROUP BY c.C_CUSTKEY, c.C_NAME, c.C_NATIONKEY
"""

try:
    cursor.execute(incremental_dt_sql)
    print("✅ Created Dynamic Table: customer_order_summary_incremental")
    print("📈 Refresh Mode: Incremental (TARGET_LAG = '5 minutes')")
    print("💡 Use case: Real-time customer analytics dashboard")
except Exception as e:
    print(f"❌ Error creating incremental Dynamic Table: {e}")


In [None]:
# View the incremental Dynamic Table data
cursor.execute("SELECT * FROM customer_order_summary_incremental LIMIT 10")
incremental_data = cursor.fetchall()

print("📊 Sample data from incremental Dynamic Table:")
print("CUSTKEY | NAME | TOTAL_ORDERS | TOTAL_SPENT | AVG_ORDER_VALUE")
print("-" * 70)
for row in incremental_data:
    print(f"{row[0]} | {row[1][:20]:<20} | {row[3]:>12} | {row[4]:>11.2f} | {row[5]:>15.2f}")


In [None]:
## 5. Dynamic Table Example 2: Full Refresh Strategy

Create a Dynamic Table with **full refresh** for accuracy-focused, batch processing scenarios.


In [None]:
# Create Dynamic Table with full refresh (LAG = '1 hour')
full_refresh_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE product_sales_analysis_full
TARGET_LAG = '1 hour'
WAREHOUSE = COMPUTE_WH
REFRESH_MODE = FULL
AS
SELECT 
    p.P_PARTKEY,
    p.P_NAME,
    p.P_BRAND,
    p.P_TYPE,
    p.P_SIZE,
    COUNT(l.L_ORDERKEY) AS total_orders,
    SUM(l.L_QUANTITY) AS total_quantity_sold,
    SUM(l.L_EXTENDEDPRICE) AS total_revenue,
    AVG(l.L_EXTENDEDPRICE / l.L_QUANTITY) AS avg_unit_price,
    SUM(l.L_EXTENDEDPRICE * (1 - l.L_DISCOUNT)) AS net_revenue,
    CURRENT_TIMESTAMP() AS analysis_timestamp
FROM SFC_SAMPLE_DATA.TPCH_SF1.PART p
LEFT JOIN SFC_SAMPLE_DATA.TPCH_SF1.LINEITEM l ON p.P_PARTKEY = l.L_PARTKEY
GROUP BY p.P_PARTKEY, p.P_NAME, p.P_BRAND, p.P_TYPE, p.P_SIZE
HAVING COUNT(l.L_ORDERKEY) > 0
"""

try:
    cursor.execute(full_refresh_dt_sql)
    print("✅ Created Dynamic Table: product_sales_analysis_full")
    print("🔄 Refresh Mode: Full (TARGET_LAG = '1 hour')")
    print("💡 Use case: Comprehensive product performance analysis")
except Exception as e:
    print(f"❌ Error creating full refresh Dynamic Table: {e}")


In [None]:
# View the full refresh Dynamic Table data
cursor.execute("SELECT * FROM product_sales_analysis_full ORDER BY total_revenue DESC LIMIT 10")
full_refresh_data = cursor.fetchall()

print("📊 Top 10 products by revenue (Full Refresh Dynamic Table):")
print("PARTKEY | PRODUCT_NAME | BRAND | TOTAL_ORDERS | TOTAL_REVENUE")
print("-" * 80)
for row in full_refresh_data:
    print(f"{row[0]} | {row[1][:25]:<25} | {row[2]:<10} | {row[5]:>12} | {row[7]:>13.2f}")


In [None]:
## 6. Cascading Pipeline Dependencies

Create a multi-layer pipeline where Dynamic Tables depend on each other, demonstrating cascading refresh behavior.


In [None]:
# Layer 1: Base aggregation (depends on source tables)
layer1_sql = """
CREATE OR REPLACE DYNAMIC TABLE order_line_summary
TARGET_LAG = '10 minutes'
WAREHOUSE = COMPUTE_WH
AS
SELECT 
    o.O_ORDERKEY,
    o.O_CUSTKEY,
    o.O_ORDERDATE,
    o.O_ORDERSTATUS,
    COUNT(l.L_LINENUMBER) AS line_count,
    SUM(l.L_QUANTITY) AS total_quantity,
    SUM(l.L_EXTENDEDPRICE) AS order_total,
    SUM(l.L_EXTENDEDPRICE * (1 - l.L_DISCOUNT)) AS net_total
FROM SFC_SAMPLE_DATA.TPCH_SF1.ORDERS o
JOIN SFC_SAMPLE_DATA.TPCH_SF1.LINEITEM l ON o.O_ORDERKEY = l.L_ORDERKEY
GROUP BY o.O_ORDERKEY, o.O_CUSTKEY, o.O_ORDERDATE, o.O_ORDERSTATUS
"""

# Layer 2: Monthly aggregation (depends on Layer 1)
layer2_sql = """
CREATE OR REPLACE DYNAMIC TABLE monthly_sales_summary
TARGET_LAG = '30 minutes'
WAREHOUSE = COMPUTE_WH
AS
SELECT 
    DATE_TRUNC('MONTH', O_ORDERDATE) AS sales_month,
    O_ORDERSTATUS,
    COUNT(O_ORDERKEY) AS orders_count,
    SUM(line_count) AS total_lines,
    SUM(total_quantity) AS total_quantity,
    SUM(order_total) AS gross_revenue,
    SUM(net_total) AS net_revenue,
    AVG(order_total) AS avg_order_value
FROM order_line_summary
GROUP BY DATE_TRUNC('MONTH', O_ORDERDATE), O_ORDERSTATUS
"""

# Layer 3: KPI dashboard (depends on Layer 2)
layer3_sql = """
CREATE OR REPLACE DYNAMIC TABLE sales_kpi_dashboard
TARGET_LAG = '1 hour'
WAREHOUSE = COMPUTE_WH
AS
SELECT 
    sales_month,
    SUM(CASE WHEN O_ORDERSTATUS = 'F' THEN net_revenue ELSE 0 END) AS completed_revenue,
    SUM(CASE WHEN O_ORDERSTATUS = 'O' THEN net_revenue ELSE 0 END) AS pending_revenue,
    SUM(net_revenue) AS total_monthly_revenue,
    COUNT(DISTINCT CASE WHEN O_ORDERSTATUS = 'F' THEN sales_month END) AS months_with_completed_orders,
    AVG(avg_order_value) AS avg_monthly_order_value,
    CURRENT_TIMESTAMP() AS dashboard_updated
FROM monthly_sales_summary
GROUP BY sales_month
ORDER BY sales_month DESC
"""

# Execute cascading pipeline creation
pipeline_steps = [
    ("Layer 1 - Order Line Summary", layer1_sql),
    ("Layer 2 - Monthly Sales Summary", layer2_sql), 
    ("Layer 3 - Sales KPI Dashboard", layer3_sql)
]

print("🔗 Creating Cascading Pipeline:")
for step_name, sql in pipeline_steps:
    try:
        cursor.execute(sql)
        print(f"✅ {step_name}")
    except Exception as e:
        print(f"❌ Error creating {step_name}: {e}")

print("\n🎯 Cascading pipeline created successfully!")


In [None]:
# View the final KPI dashboard
cursor.execute("SELECT * FROM sales_kpi_dashboard LIMIT 10")
kpi_data = cursor.fetchall()

print("📈 Sales KPI Dashboard (Final Layer):")
print("MONTH | COMPLETED_REV | PENDING_REV | TOTAL_REV | AVG_ORDER_VALUE")
print("-" * 75)
for row in kpi_data:
    month = row[0].strftime("%Y-%m") if row[0] else "N/A"
    print(f"{month} | {row[1]:>13.2f} | {row[2]:>11.2f} | {row[3]:>9.2f} | {row[5]:>15.2f}")


In [None]:
## 7. Monitoring and Performance

Monitor Dynamic Table refresh status, performance, and dependencies.


In [None]:
# Monitor Dynamic Table status and metadata
monitoring_queries = {
    "Dynamic Tables Overview": """
    SELECT 
        name,
        target_lag,
        refresh_mode,
        scheduling_state,
        last_refresh_time,
        next_refresh_time,
        is_current
    FROM INFORMATION_SCHEMA.DYNAMIC_TABLES 
    WHERE schema_name = 'DYNAMIC_TABLES'
    ORDER BY name
    """,
    
    "Refresh History": """
    SELECT 
        dynamic_table_name,
        refresh_start_time,
        refresh_end_time,
        DATEDIFF('seconds', refresh_start_time, refresh_end_time) AS duration_seconds,
        refresh_trigger,
        state
    FROM INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY 
    WHERE schema_name = 'DYNAMIC_TABLES'
    ORDER BY refresh_start_time DESC
    LIMIT 20
    """,
    
    "Dependencies Graph": """
    SELECT 
        referenced_object_name AS source_table,
        referencing_object_name AS dependent_table,
        dependency_type
    FROM INFORMATION_SCHEMA.OBJECT_DEPENDENCIES 
    WHERE referencing_object_domain = 'TABLE' 
    AND referenced_object_domain = 'TABLE'
    AND (referencing_object_name LIKE '%summary%' OR referencing_object_name LIKE '%dashboard%')
    ORDER BY dependent_table
    """
}

for query_name, query in monitoring_queries.items():
    print(f"\n🔍 {query_name}")
    print("=" * 60)
    try:
        cursor.execute(query)
        results = cursor.fetchall()
        if results:
            for row in results[:5]:  # Limit to first 5 rows for readability
                print(f"  {row}")
            if len(results) > 5:
                print(f"  ... and {len(results) - 5} more rows")
        else:
            print("  No data found")
    except Exception as e:
        print(f"  ❌ Error: {e}")


In [None]:
## 8. Testing Different LAG Settings and Refresh Modes

Compare performance and behavior of different configuration options.


In [None]:
# Test different LAG configurations
test_configs = [
    {
        "name": "high_frequency_orders",
        "lag": "2 minutes",
        "mode": "AUTO",
        "sql": """
        SELECT 
            O_ORDERSTATUS,
            COUNT(*) AS status_count,
            SUM(O_TOTALPRICE) AS total_value
        FROM SFC_SAMPLE_DATA.TPCH_SF1.ORDERS 
        GROUP BY O_ORDERSTATUS
        """
    },
    {
        "name": "daily_batch_summary", 
        "lag": "1 day",
        "mode": "FULL",
        "sql": """
        SELECT 
            DATE_TRUNC('DAY', O_ORDERDATE) AS order_day,
            COUNT(*) AS daily_orders,
            AVG(O_TOTALPRICE) AS avg_order_value,
            MAX(O_TOTALPRICE) AS max_order_value
        FROM SFC_SAMPLE_DATA.TPCH_SF1.ORDERS
        GROUP BY DATE_TRUNC('DAY', O_ORDERDATE)
        """
    }
]

print("🧪 Testing Different LAG Settings:")
print("=" * 50)

for config in test_configs:
    table_name = f"test_{config['name']}"
    refresh_mode = f"REFRESH_MODE = {config['mode']}" if config['mode'] != 'AUTO' else ""
    
    create_sql = f"""
    CREATE OR REPLACE DYNAMIC TABLE {table_name}
    TARGET_LAG = '{config['lag']}'
    WAREHOUSE = COMPUTE_WH
    {refresh_mode}
    AS
    {config['sql']}
    """
    
    try:
        cursor.execute(create_sql)
        print(f"✅ Created: {table_name}")
        print(f"   LAG: {config['lag']}, Mode: {config['mode']}")
        
        # Show sample data
        cursor.execute(f"SELECT * FROM {table_name} LIMIT 3")
        sample = cursor.fetchall()
        print(f"   Sample: {sample[0] if sample else 'No data'}")
        
    except Exception as e:
        print(f"❌ Error creating {table_name}: {e}")
    
    print("-" * 40)


In [None]:
## 9. Summary and Best Practices

Key takeaways and recommendations for using Dynamic Tables effectively.


In [None]:
### ✨ Demo Achievements

✅ **Incremental Refresh Pattern**: `customer_order_summary_incremental` (5-minute LAG)
- Use case: Real-time customer analytics
- Benefits: Cost-efficient, near real-time updates

✅ **Full Refresh Strategy**: `product_sales_analysis_full` (1-hour LAG) 
- Use case: Comprehensive product analysis
- Benefits: Data accuracy, complex transformations

✅ **Cascading Pipeline Dependencies**: 3-layer architecture
- Layer 1: `order_line_summary` (10 min)
- Layer 2: `monthly_sales_summary` (30 min)  
- Layer 3: `sales_kpi_dashboard` (1 hour)

✅ **Different LAG Settings**: From 2 minutes to 1 day
✅ **Monitoring Capabilities**: Status, history, dependencies

### 🎯 Best Practices Demonstrated

1. **Choose Appropriate LAG Settings**
   - Real-time dashboards: 1-15 minutes
   - Operational reports: 30 minutes - 2 hours
   - Analytical workloads: 4-24 hours

2. **Select Refresh Mode Wisely**
   - **AUTO (Incremental)**: For append-only or time-partitioned data
   - **FULL**: For complex joins, aggregations, or when data integrity is critical

3. **Design Efficient Pipelines**
   - Start with base aggregations (frequent refresh)
   - Build summary layers (moderate refresh)
   - Create final KPIs (less frequent refresh)

4. **Monitor and Optimize**
   - Track refresh duration and frequency
   - Monitor warehouse usage and costs
   - Review dependency chains for bottlenecks

### 🔗 Resources
- [Dynamic Tables Quickstart](https://quickstarts.snowflake.com/guide/getting_started_with_dynamic_tables/)
- [Dynamic Tables Documentation](https://docs.snowflake.com/en/user-guide/dynamic-tables-about)


In [None]:
# Clean up connection
cursor.close()
conn.close()
print("🔚 Demo completed! Connection closed.")
