# Getting Started with Snowflake Dynamic Tables

This notebook is based on the [Snowflake Dynamic Tables Quickstart Guide](https://quickstarts.snowflake.com/guide/getting_started_with_dynamic_tables/index.html).

## Overview

Dynamic Tables in Snowflake are a powerful feature that enables you to create automated, continuously updated materialized views. This tutorial will walk you through:

- Setting up your Snowflake environment
- Creating sample data using Python User-Defined Table Functions (UDTFs)
- Building dynamic tables for data transformations
- Implementing data validation and monitoring
- Managing and optimizing dynamic table pipelines

## Prerequisites

Before starting this tutorial, ensure you have:
- Access to a Snowflake account ([30-day free trial available](https://trial.snowflake.com))
- Basic knowledge of SQL and database concepts
- Basic understanding of Python programming
- Snowflake Python connector installed: `pip install snowflake-connector-python`


## Step 1: Snowflake Connection Setup

First, let's establish a connection to Snowflake. Make sure to update the connection parameters with your Snowflake account details.


In [None]:
import snowflake.connector
import pandas as pd
from IPython.display import display

# Snowflake connection parameters - UPDATE THESE WITH YOUR DETAILS
connection_params = {
    'account': 'your-account-identifier',  # e.g., 'abc12345.us-east-1'
    'user': 'your-username',
    'password': 'your-password',
    'warehouse': 'COMPUTE_WH',  # We'll create our own later
    'database': 'SNOWFLAKE_SAMPLE_DATA',
    'schema': 'PUBLIC'
}

# Create connection
conn = snowflake.connector.connect(**connection_params)
cursor = conn.cursor()

print("✅ Connected to Snowflake successfully!")


## Step 2: Environment Setup

Now let's create the necessary database, schema, and warehouse for our Dynamic Tables demo.


In [None]:
# Helper function to execute SQL commands
def execute_sql(sql, description=""):
    """Execute SQL command and display results."""
    try:
        cursor.execute(sql)
        if description:
            print(f"✅ {description}")
        
        # Fetch results if it's a SELECT statement
        if sql.strip().upper().startswith('SELECT') or sql.strip().upper().startswith('SHOW'):
            results = cursor.fetchall()
            columns = [desc[0] for desc in cursor.description]
            df = pd.DataFrame(results, columns=columns)
            display(df)
            return df
        else:
            results = cursor.fetchall()
            if results:
                print(f"Result: {results}")
    except Exception as e:
        print(f"❌ Error: {e}")
        
# Create database and schema
execute_sql("CREATE DATABASE IF NOT EXISTS DEMO;", "Created DEMO database")
execute_sql("CREATE SCHEMA IF NOT EXISTS DEMO.DT_DEMO;", "Created DT_DEMO schema")
execute_sql("USE SCHEMA DEMO.DT_DEMO;", "Using DEMO.DT_DEMO schema")


In [None]:
# Create a virtual warehouse
warehouse_sql = """
CREATE WAREHOUSE IF NOT EXISTS XSMALL_WH 
WAREHOUSE_TYPE = STANDARD
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
"""

execute_sql(warehouse_sql, "Created XSMALL_WH warehouse")

# Use the warehouse
execute_sql("USE WAREHOUSE XSMALL_WH;", "Using XSMALL_WH warehouse")


## Step 3: Create Sample Data with Python UDTFs

Dynamic Tables work best with realistic data. Let's create sample datasets using Python User-Defined Table Functions (UDTFs). We'll create three main tables:
1. **Customer Information** - Basic customer details
2. **Product Stock Inventory** - Product catalog and stock levels
3. **Sales Data** - Transaction records


### 3.1 Customer Information Table

First, let's create a Python UDTF to generate customer data:


In [None]:
# Create Customer Info UDTF
customer_udtf_sql = """
CREATE OR REPLACE FUNCTION gen_cust_info(num_records NUMBER)
RETURNS TABLE (custid NUMBER(10), cname VARCHAR(100), spendlimit NUMBER(10,2))
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='CustTab'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random

fake = Faker()

class CustTab:
    def process(self, num_records):
        customer_id = 1000
        for _ in range(num_records):
            custid = customer_id + 1
            cname = fake.name()
            spendlimit = round(random.uniform(1000, 10000), 2)
            customer_id += 1
            yield (custid, cname, spendlimit)
$$;
"""

execute_sql(customer_udtf_sql, "Created gen_cust_info UDTF")

# Create the customer info table with 1000 records
create_cust_table_sql = """
CREATE OR REPLACE TABLE cust_info AS 
SELECT * FROM TABLE(gen_cust_info(1000)) ORDER BY 1;
"""

execute_sql(create_cust_table_sql, "Created cust_info table with 1000 customers")


In [None]:
# Let's preview the customer data
execute_sql("SELECT * FROM cust_info LIMIT 10;", "Preview of customer data")


### 3.2 Product Stock Inventory Table

Now let's create a product inventory table with stock levels:


In [None]:
# Create Product Stock Inventory UDTF
product_udtf_sql = """
CREATE OR REPLACE FUNCTION gen_prod_stock_inv(num_records NUMBER)
RETURNS TABLE (pid NUMBER(5), product_name VARCHAR(100), stock NUMBER(5))
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='ProdTab'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random

fake = Faker()

class ProdTab:
    def process(self, num_records):
        product_types = ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones', 
                        'Tablet', 'Phone', 'Printer', 'Camera', 'Speaker',
                        'Router', 'Cable', 'Charger', 'Webcam', 'Microphone']
        
        for product_id in range(1, num_records + 1):
            product_base = random.choice(product_types)
            product_name = f"{fake.company()} {product_base} {fake.bothify('###')}"
            stock = random.randint(50, 500)
            yield (product_id, product_name, stock)
$$;
"""

execute_sql(product_udtf_sql, "Created gen_prod_stock_inv UDTF")

# Create the product stock inventory table
create_product_table_sql = """
CREATE OR REPLACE TABLE prod_stock_inv AS 
SELECT * FROM TABLE(gen_prod_stock_inv(100)) ORDER BY 1;
"""

execute_sql(create_product_table_sql, "Created prod_stock_inv table with 100 products")


### 3.3 Sales Data Table

Finally, let's create sales transaction data with JSON structure:


In [None]:
# Create Sales Data UDTF
sales_udtf_sql = """
CREATE OR REPLACE FUNCTION gen_sales_data(num_records NUMBER)
RETURNS TABLE (custid NUMBER(10), purchase VARIANT, creationtime TIMESTAMP)
LANGUAGE PYTHON
RUNTIME_VERSION=3.10
HANDLER='SalesTab'
PACKAGES = ('Faker')
AS $$
from faker import Faker
import random
import json
from datetime import datetime, timedelta

fake = Faker()

class SalesTab:
    def process(self, num_records):
        # Generate sales data over the past 30 days
        end_date = datetime.now()
        start_date = end_date - timedelta(days=30)
        
        for _ in range(num_records):
            custid = random.randint(1001, 2000)  # Customer IDs from our range
            prodid = random.randint(1, 100)     # Product IDs from our range
            quantity = random.randint(1, 10)
            unit_price = round(random.uniform(10, 500), 2)
            purchase_amount = round(quantity * unit_price, 2)
            
            # Random date within the last 30 days
            purchase_date = fake.date_between(start_date=start_date, end_date=end_date)
            
            # Create JSON purchase data
            purchase_json = {
                "prodid": prodid,
                "quantity": quantity,
                "unit_price": unit_price,
                "purchase_amount": purchase_amount,
                "purchase_date": purchase_date.strftime('%Y-%m-%d')
            }
            
            # Random timestamp for creation
            creationtime = fake.date_time_between(start_date=purchase_date, end_date=end_date)
            
            yield (custid, json.dumps(purchase_json), creationtime)
$$;
"""

execute_sql(sales_udtf_sql, "Created gen_sales_data UDTF")

# Create the sales data table
create_sales_table_sql = """
CREATE OR REPLACE TABLE salesdata AS 
SELECT * FROM TABLE(gen_sales_data(5000)) ORDER BY creationtime;
"""

execute_sql(create_sales_table_sql, "Created salesdata table with 5000 transactions")


In [None]:
# Preview the sales data structure
execute_sql("SELECT * FROM salesdata LIMIT 5;", "Preview of sales data")

# Let's also check our product inventory
execute_sql("SELECT * FROM prod_stock_inv LIMIT 10;", "Preview of product inventory")


## Step 4: Creating Dynamic Tables

Dynamic Tables automatically maintain the results of a query as underlying data changes. Let's create several dynamic tables to build a data pipeline.

### Key Concepts:
- **LAG**: How much delay is acceptable between source data changes and updates to the dynamic table
- **WAREHOUSE**: The compute warehouse used for refreshes
- **Dependencies**: Dynamic tables can depend on other tables and dynamic tables


### 4.1 Customer Sales Data History

Our first dynamic table will combine customer and sales data, extracting JSON fields:


In [None]:
# Create customer sales data history dynamic table
customer_sales_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE customer_sales_data_history
LAG='DOWNSTREAM'
WAREHOUSE=XSMALL_WH
AS
SELECT 
    s.custid AS customer_id,
    c.cname AS customer_name,
    s.purchase:"prodid"::NUMBER(5) AS product_id,
    s.purchase:"purchase_amount"::NUMBER(10) AS saleprice,
    s.purchase:"quantity"::NUMBER(5) AS quantity,
    s.purchase:"purchase_date"::DATE AS salesdate,
    s.creationtime
FROM
    cust_info c 
    INNER JOIN salesdata s ON c.custid = s.custid;
"""

execute_sql(customer_sales_dt_sql, "Created customer_sales_data_history dynamic table")


In [None]:
# Preview the dynamic table
execute_sql("SELECT * FROM customer_sales_data_history LIMIT 10;", "Preview of customer sales data history")


### 4.2 Sales Report Dynamic Table

Let's create a sales report that enriches our sales data with product information:


In [None]:
# Create sales report dynamic table
sales_report_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE salesreport
LAG='DOWNSTREAM'
WAREHOUSE=XSMALL_WH
AS
SELECT 
    csdh.customer_id,
    csdh.customer_name,
    csdh.product_id,
    psi.product_name,
    psi.stock,
    csdh.saleprice,
    csdh.quantity,
    csdh.salesdate,
    csdh.creationtime
FROM
    customer_sales_data_history csdh
    INNER JOIN prod_stock_inv psi ON csdh.product_id = psi.pid;
"""

execute_sql(sales_report_dt_sql, "Created salesreport dynamic table")


### 4.3 Cumulative Purchase Dynamic Table

This dynamic table will track cumulative purchases by customer:


In [None]:
# Create cumulative purchase dynamic table
cumulative_purchase_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE cumulative_purchase
LAG='DOWNSTREAM'
WAREHOUSE=XSMALL_WH
AS
SELECT 
    customer_id,
    customer_name,
    SUM(saleprice) AS total_spent,
    COUNT(*) AS total_purchases,
    AVG(saleprice) AS avg_purchase_amount,
    MIN(salesdate) AS first_purchase_date,
    MAX(salesdate) AS latest_purchase_date,
    MAX(creationtime) AS last_updated
FROM
    customer_sales_data_history
GROUP BY 
    customer_id, customer_name;
"""

execute_sql(cumulative_purchase_dt_sql, "Created cumulative_purchase dynamic table")


In [None]:
# Preview cumulative purchase data
execute_sql("SELECT * FROM cumulative_purchase ORDER BY total_spent DESC LIMIT 10;", "Top 10 customers by total spending")


## Step 5: Data Validation and Alerts

One powerful use case for Dynamic Tables is monitoring data quality and creating alerts. Let's create a dynamic table that monitors product inventory levels.


In [None]:
# Create product inventory alert dynamic table
prod_inv_alert_dt_sql = """
CREATE OR REPLACE DYNAMIC TABLE PROD_INV_ALERT
LAG = '1 MINUTE'
WAREHOUSE=XSMALL_WH
AS
SELECT 
    S.PRODUCT_ID, 
    S.PRODUCT_NAME,
    S.CREATIONTIME AS LATEST_SALES_DATE,
    S.STOCK AS BEGINNING_STOCK,
    SUM(S.QUANTITY) OVER (PARTITION BY S.PRODUCT_ID ORDER BY S.CREATIONTIME) AS TOTALUNITSOLD, 
    (S.STOCK - SUM(S.QUANTITY) OVER (PARTITION BY S.PRODUCT_ID ORDER BY S.CREATIONTIME)) AS UNITSLEFT,
    ROUND(((S.STOCK - SUM(S.QUANTITY) OVER (PARTITION BY S.PRODUCT_ID ORDER BY S.CREATIONTIME)) / S.STOCK) * 100, 2) AS PERCENT_UNITLEFT,
    CURRENT_TIMESTAMP() AS ROWCREATIONTIME
FROM SALESREPORT S 
QUALIFY ROW_NUMBER() OVER (PARTITION BY PRODUCT_ID ORDER BY CREATIONTIME DESC) = 1;
"""

execute_sql(prod_inv_alert_dt_sql, "Created PROD_INV_ALERT dynamic table")


In [None]:
# Check for low inventory products (less than 10% remaining)
execute_sql("SELECT * FROM PROD_INV_ALERT WHERE PERCENT_UNITLEFT < 10 ORDER BY PERCENT_UNITLEFT;", "Products with low inventory (< 10%)")


## Step 6: Monitoring and Managing Dynamic Tables

Snowflake provides several ways to monitor your dynamic tables and understand their refresh patterns.


### 6.1 Check Dynamic Table Refresh History


In [None]:
# Check dynamic table refresh history
refresh_history_sql = """
SELECT * 
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY())
WHERE NAME IN ('SALESREPORT', 'CUSTOMER_SALES_DATA_HISTORY', 'PROD_INV_ALERT', 'CUMULATIVE_PURCHASE')
ORDER BY DATA_TIMESTAMP DESC, REFRESH_END_TIME DESC
LIMIT 10;
"""

execute_sql(refresh_history_sql, "Dynamic table refresh history")


### 6.2 View Dynamic Table Properties


In [None]:
# View dynamic table properties
dt_properties_sql = """
SHOW DYNAMIC TABLES IN SCHEMA DEMO.DT_DEMO;
"""

execute_sql(dt_properties_sql, "Dynamic tables in our schema")


### 6.3 Force Refresh a Dynamic Table

You can manually refresh a dynamic table if needed:


In [None]:
# Force refresh a dynamic table (uncomment to run)
# execute_sql("ALTER DYNAMIC TABLE PROD_INV_ALERT REFRESH;", "Manually refreshed PROD_INV_ALERT")

print("💡 Tip: You can manually refresh any dynamic table using ALTER DYNAMIC TABLE <name> REFRESH;")


## Step 7: Exploring Your Data Pipeline

Now that we have our dynamic tables set up, let's run some analytical queries to see the power of our automated data pipeline.


In [None]:
# Sales summary by customer
execute_sql("""
SELECT 
    customer_name,
    total_purchases,
    total_spent,
    avg_purchase_amount,
    first_purchase_date,
    latest_purchase_date
FROM cumulative_purchase 
ORDER BY total_spent DESC 
LIMIT 15;
""", "Top 15 customers by total spending")


In [None]:
# Product performance analysis
execute_sql("""
SELECT 
    product_name,
    COUNT(*) as total_sales,
    SUM(quantity) as units_sold,
    SUM(saleprice) as total_revenue,
    AVG(saleprice) as avg_sale_price
FROM salesreport 
GROUP BY product_name 
ORDER BY total_revenue DESC 
LIMIT 10;
""", "Top 10 products by revenue")


In [None]:
# Daily sales trends
execute_sql("""
SELECT 
    salesdate,
    COUNT(*) as transactions,
    SUM(saleprice) as daily_revenue,
    SUM(quantity) as units_sold
FROM customer_sales_data_history 
GROUP BY salesdate 
ORDER BY salesdate DESC 
LIMIT 15;
""", "Daily sales trends (last 15 days)")


## Step 8: Dynamic Table Pipeline Visualization

In Snowsight (Snowflake's web interface), you can visualize your dynamic table pipeline:

1. Navigate to **Data > Databases > DEMO > DT_DEMO > Dynamic Tables**
2. Click on any dynamic table to view its directed acyclic graph (DAG)
3. This shows dependencies between tables and refresh status

The pipeline we've created looks like this:

```
[Base Tables]           [Dynamic Tables]           [Alert Tables]
cust_info      ─────┐
                     ├──> customer_sales_data_history ──┐
salesdata      ─────┘                                  │
                                                        ├──> salesreport ──> PROD_INV_ALERT
prod_stock_inv ─────────────────────────────────────┘
                                                        
customer_sales_data_history ────> cumulative_purchase
```


## Step 9: Best Practices and Optimization

### LAG Settings
- **DOWNSTREAM**: Refreshes when downstream dependencies need it
- **Specific time**: e.g., '10 minutes', '1 hour' for regular refresh intervals

### Performance Tips
1. Use appropriate LAG settings based on your business requirements
2. Right-size your warehouse for dynamic table refreshes
3. Consider the dependency chain when designing your pipeline
4. Monitor refresh history to optimize performance


## Step 10: Cleanup (Optional)

If you want to clean up the resources created in this demo, uncomment and run the following cell:


In [None]:
# Cleanup commands (uncomment to execute)
cleanup_commands = [
    # "DROP DYNAMIC TABLE IF EXISTS PROD_INV_ALERT;",
    # "DROP DYNAMIC TABLE IF EXISTS cumulative_purchase;",
    # "DROP DYNAMIC TABLE IF EXISTS salesreport;", 
    # "DROP DYNAMIC TABLE IF EXISTS customer_sales_data_history;",
    # "DROP TABLE IF EXISTS salesdata;",
    # "DROP TABLE IF EXISTS prod_stock_inv;",
    # "DROP TABLE IF EXISTS cust_info;",
    # "DROP FUNCTION IF EXISTS gen_sales_data(NUMBER);",
    # "DROP FUNCTION IF EXISTS gen_prod_stock_inv(NUMBER);",
    # "DROP FUNCTION IF EXISTS gen_cust_info(NUMBER);",
    # "DROP WAREHOUSE IF EXISTS XSMALL_WH;",
    # "DROP SCHEMA IF EXISTS DEMO.DT_DEMO;",
    # "DROP DATABASE IF EXISTS DEMO;"
]

# for cmd in cleanup_commands:
#     execute_sql(cmd, f"Executed: {cmd}")

print("💡 Uncomment and run the above commands if you want to clean up all demo resources.")


## Conclusion

Congratulations! 🎉 You've successfully completed the Snowflake Dynamic Tables quickstart. In this tutorial, you've learned how to:

✅ **Set up your Snowflake environment** with databases, schemas, and warehouses  
✅ **Generate realistic sample data** using Python UDTFs with the Faker library  
✅ **Create dynamic tables** that automatically maintain transformed data  
✅ **Implement monitoring and alerts** for data quality and inventory management  
✅ **Monitor and manage** your dynamic table pipeline  
✅ **Run analytical queries** on your automated data pipeline  

### Key Takeaways

1. **Dynamic Tables simplify data pipeline management** by automatically refreshing when source data changes
2. **LAG settings** control refresh frequency and can be tuned based on business requirements  
3. **Dependency chains** allow you to build complex data transformations with automatic propagation
4. **Monitoring capabilities** help you track performance and ensure data quality
5. **Integration with Snowsight** provides visual pipeline management

### Next Steps

- Explore more complex transformations using window functions and advanced SQL
- Experiment with different LAG settings to balance freshness and cost
- Integrate dynamic tables with Snowflake streams and tasks for even more automation
- Set up alerts and notifications for critical business metrics

### Resources

- [Snowflake Dynamic Tables Documentation](https://docs.snowflake.com/en/user-guide/dynamic-tables-about)
- [Dynamic Tables Best Practices](https://docs.snowflake.com/en/user-guide/dynamic-tables-best-practices)
- [Snowflake Python Connector Documentation](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector)

---

*Happy coding with Snowflake Dynamic Tables!* ❄️
