### Use Case 2
Description: Demonstrate how to represent Data Dependencies in my pipelines in an intuitive and easy to understand way. We often have cases where multiple ETL jobs run in sequence across a set of tables, and tracking that lineage is difficult. If I modify something earlier in the chain, it can have knock on effects. Demonstrate how to run tests against a development branch that modifies and early object in a Data Dependency chain, and test that other inheriting jobs run successfully.



Dynamic Tables in Snowflake solve pipeline dependency challenges by:

1. **Automatic Dependency Management**
   - Self-maintaining tables that automatically update when source data changes
   - Clearly defined lineage through SQL definition
   - No need for complex scheduling or orchestration

2. **Testing Changes Safely**
   - Clone source tables to test modifications
   - Dynamic tables automatically propagate changes through the chain
   - Test entire pipeline without affecting production


In [None]:
-- Snowflake Data Engineering Demo SQL Worksheet

-- =========================================
-- 1. Setting Up the Environment
-- =========================================
-- Using the appropriate role and warehouse for this demo.
USE ROLE tasty_data_engineer;
USE WAREHOUSE tasty_de_wh;


1. **Creation and Configuration**
   - Dynamic tables are created using a CREATE DYNAMIC TABLE statement that specifies the target lag time (how often it should refresh), warehouse assignment (compute resources), and refresh mode (full or incremental)
   - The table is defined by a SELECT query that pulls from one or more source tables, establishing the data transformation logic

2. **Automated Maintenance**
   - Once created, dynamic tables are completely self-maintaining, automatically detecting and processing changes from source tables without manual intervention or scheduling
   - The system continuously monitors source data and ensures the dynamic table stays updated within the specified target lag time

3. **Intelligent Processing**
   - When running in incremental mode, only changed data is processed, making updates highly efficient
   - The system tracks dependencies and automatically propagates changes through chains of dynamic tables
   - Processing occurs on the assigned warehouse, optimizing resource usage

4. **Simplified Pipeline Management**
   - Eliminates need for complex scheduling and orchestration tools
   - Provides clear visibility into data lineage through SQL definitions
   - Reduces maintenance overhead and risk of pipeline failures
   - Enables easy testing of changes through cloning and automatic propagation

In [None]:
-- =========================================
-- 2. Creating the Silver Layer with Dynamic Tables
-- =========================================
-- Dynamic Tables allow for continuous transformation without complex scheduling.
CREATE OR REPLACE DYNAMIC TABLE FROSTBYTE_TASTY_BYTES.HARMONIZED.ORDERS_DT
TARGET_LAG = 'DOWNSTREAM'
WAREHOUSE = 'TASTY_DE_WH'
REFRESH_MODE=INCREMENTAL
AS 
SELECT 
    oh.order_id,
    oh.truck_id,
    oh.order_ts,
    od.order_detail_id,
    od.line_number,
    od.menu_item_id,
    od.quantity,
    od.unit_price,
    od.price,
    oh.order_amount,
    oh.order_tax_amount,
    oh.order_discount_amount,
    oh.order_total,
    oh.location_id,
    oh.customer_id
FROM frostbyte_tasty_bytes.raw_pos.order_detail od
JOIN frostbyte_tasty_bytes.raw_pos.order_header oh
    ON od.order_id = oh.order_id;

We can chain dynamic tables to create a medallion architecture within Snowflake. 

In [None]:
-- =========================================
-- 3. Enriching the Data – Gold Layer
-- =========================================
-- Bringing in additional dimensional data without manual pipeline management.
CREATE OR REPLACE DYNAMIC TABLE FROSTBYTE_TASTY_BYTES.HARMONIZED.ORDERS_ENRICHED_DT
TARGET_LAG = '5 minutes'
WAREHOUSE = 'TASTY_DE_WH'
REFRESH_MODE=INCREMENTAL
AS 
SELECT 
    s.*,
    m.truck_brand_name,
    m.menu_type,
    m.menu_item_name,
    t.primary_city,
    t.region,
    t.country,
    t.franchise_flag,
    t.franchise_id,
    f.first_name AS franchisee_first_name,
    f.last_name AS franchisee_last_name,
    cl.first_name,
    cl.last_name,
    cl.e_mail,
    cl.phone_number,
    cl.children_count,
    cl.gender,
    cl.marital_status
FROM FROSTBYTE_TASTY_BYTES.HARMONIZED.ORDERS_DT s
JOIN frostbyte_tasty_bytes.raw_pos.truck t 
    ON s.truck_id = t.truck_id
JOIN frostbyte_tasty_bytes.raw_pos.menu m 
    ON s.menu_item_id = m.menu_item_id
JOIN frostbyte_tasty_bytes.raw_pos.franchise f 
    ON t.franchise_id = f.franchise_id
LEFT JOIN frostbyte_tasty_bytes.raw_customer.customer_loyalty cl
    ON s.customer_id = cl.customer_id;


### Snowflake Data Management Features

1. **Zero-Copy Cloning**
   - Creates instant copy of table without duplicating storage
   - Only new/modified data consumes additional space
   - Perfect for testing, development, and backup scenarios
   - Syntax: `CREATE TABLE new_table CLONE source_table`

2. **Time Travel**
   - Allows querying historical data states
   - Access data from a specific point in time
   - No need to restore backups
   - Syntax: `SELECT * FROM table AT (TIMESTAMP => time_expression)`
   - Default retention: 24 hours (Enterprise: up to 90 days)

In [None]:
-- =========================================
-- 4. Zero-Copy Cloning
-- =========================================
-- Instantly create a copy of a table without consuming extra storage.
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE TABLE FROSTBYTE_TASTY_BYTES.RAW_CUSTOMER.CUSTOMER_LOYALTY_DEV
CLONE FROSTBYTE_TASTY_BYTES.RAW_CUSTOMER.CUSTOMER_LOYALTY;



### Use Case 4

Description:Demonstrate how to identify the root cause from notifications in a pipeline failure.

Let's go into the UI to look at our dynamic tables and pipeline failures. 

### Use Case 8
Description: Demonstrate if and how we can set policies to enable data to be cached. Demonstrate if the platform provides ways to automatically optimize memory or storage.

Snowflake has three built-in mechanisms for caching:
- Result Cache: Stores query results for 24 hours
- Metadata Cache: Stores table structure info
- Data Cache: Stores temp data in warehouse local storage

All caches are automatically managed by Snowflake.

In the example below, we use Snowpark (Python for Snowflake). If we run the same query again, we use the RESULT CACHE. This does not incur a compute cost.

In [None]:
from snowflake.snowpark.functions import col
from snowflake.snowpark.context import get_active_session

session = get_active_session()
df = session.table('FROSTBYTE_TASTY_BYTES.HARMONIZED.ORDERS_ENRICHED_DT') \
    .select(col('menu_type'), col('region')) \
    .groupBy('menu_type', 'region') \
    .count()
df.show()

In [None]:
#Running it again

df.show()

We can also go into the query history to see whether the query used a warehouse or used a cache.