# Streaming Data Pipeline with Snowpark Python and Dynamic Tables

## Objective
This notebook demonstrates an enhanced approach to building a real-time analytics pipeline using Snowflake Dynamic Tables, Snowpark Python procedures, and Triggered Tasks. It focuses on transforming raw streaming ski resort data into actionable insights, with improved daily visit tracking and a structured aggregation hierarchy.

## 1. Setup and Initialization

Python includes and initialize Snowpark environment

In [None]:
# Import python packages
import streamlit as st
import pandas as pd
from snowflake.core import Root

# Grab active Snowpark session
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# Initialize Snowflake Python API for object management
root = Root(session)

## 2. Initial Data Exploration

Before building transformations, let's examine the structure of our raw streaming data. This helps in understanding the source tables we'll be working with.

In [None]:
-- Lift usage events (core activity data)
SELECT * FROM LIFT_RIDE LIMIT 20;

In [None]:
-- Day ticket purchases
SELECT * FROM RESORT_TICKET LIMIT 20;

In [None]:
-- Season pass purchases
SELECT * FROM SEASON_PASS LIMIT 20;

## 3. Initial Data Pipeline Setup

This section covers the additional setup required for this use case, including creating streams and reference tables.

### 3.1. Create Stream on Raw Lift Ride Data

A stream is created on the `LIFT_RIDE` table to capture new lift ride events. This stream will be the source for the Snowpark procedure that populates daily visit information.

In [None]:
CREATE OR REPLACE STREAM LIFT_RIDE_STREAM ON TABLE LIFT_RIDE APPEND_ONLY = TRUE SHOW_INITIAL_ROWS = TRUE;

### 3.2. Resort Capacity Reference Table

Create and populate a reference table for resort capacities, which will be used in downstream calculations.

In [None]:
-- Reference table for resort capacity
CREATE OR REPLACE TABLE RESORT_CAPACITY (
    RESORT VARCHAR(100) PRIMARY KEY,
    MAX_CAPACITY INTEGER,
    HOURLY_CAPACITY INTEGER,
    BASE_LIFT_COUNT INTEGER,
    IANA_TIMEZONE VARCHAR(50) 
);

INSERT INTO RESORT_CAPACITY (RESORT, MAX_CAPACITY, HOURLY_CAPACITY, BASE_LIFT_COUNT, IANA_TIMEZONE) VALUES
('Vail', 7000, 1100, 34, 'America/Denver'),
('Beaver Creek', 5500, 900, 25, 'America/Denver'),
('Breckenridge', 6500, 1000, 35, 'America/Denver'),
('Keystone', 4500, 700, 21, 'America/Denver'),
('Heavenly', 5000, 800, 27, 'America/Los_Angeles');

## 4. Automated Daily Visit Processing with Snowpark

This section details the setup for accurately tracking daily visits using a Snowpark Stored Procedure and a Task to automate its execution.

### 4.1. `DAILY_VISITS` Table

This table will store unique daily visits per RFID at each resort, along with their first ride details and season pass status. It is populated by a Snowpark procedure.

In [None]:
CREATE OR REPLACE TABLE DAILY_VISITS (
    VISIT_DATE DATE,
    RESORT STRING,
    RFID STRING,
    NAME STRING,
    FIRST_RIDE_TIME DATETIME,
    FIRST_LIFT STRING,
    HAS_SEASON_PASS BOOLEAN,
    PURCHASE_PRICE_USD DECIMAL(7,2),    
    ACTIVATION_USAGE_COUNT INTEGER,
    TICKET_ORIGINAL_DURATION INTEGER
);

### 4.2. Stage for Deployed Snowpark Code

Create a stage to store Snowpark Python code for stored procedures.

In [None]:
create stage if not exists snowpark_apps;

### 4.3. Snowpark Python Function: `populate_daily_visits`

This Python function will ultimately be deployed as a Python Stored Procedure. It processes new records from `LIFT_RIDE_STREAM`, identifies the first ride for each visitor per day at each resort, enriches the data with customer details and pass status, and inserts new, unique daily visits into the `DAILY_VISITS` table.

In [None]:
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, row_number, coalesce, when
from snowflake.snowpark.window import Window

def populate_daily_visits(session: Session) -> str:
    """
    Populate DAILY_VISITS table using Snowpark Python
    Handles data from any date in the stream, deduplicates by RFID per resort per day
    This process is designed to be run frequently from a triggered task
    """
    
    # Step 1: Get new rides from stream
    lift_ride_stream = session.table("LIFT_RIDE_STREAM")
    
    # Deduplicate by RFID per resort per day - get earliest ride time
    window_spec = Window.partition_by(
        col("RESORT"), 
        col("RFID"), 
        col("VISIT_DATE")
    ).order_by(col("RIDE_TIME").asc())
    
    first_rides_df = lift_ride_stream.select(
        col("RESORT"),
        col("RFID"),
        col("LIFT").alias("FIRST_LIFT"),
        col("RIDE_TIME").alias("FIRST_RIDE_TIME"),        
        col("RIDE_TIME").cast('DATE').alias("VISIT_DATE"),
        col("ACTIVATION_DAY_COUNT").alias("ACTIVATION_USAGE_COUNT"), # Ride data includes total number of days ticket or pass has been activated
        row_number().over(window_spec).alias("rn")
    )
    # Filter to only first ride of each day for each RFID at each resort
    first_rides_df = first_rides_df.filter(col("rn") == 1) #.drop(col("rn"))
    
    # Step 2: Join with customer data to get customer details and determine visit type
    season_pass_df = session.table("SEASON_PASS")
    resort_ticket_df = session.table("RESORT_TICKET")
    
    # Left join with season pass
    first_rides_df = first_rides_df.join(season_pass_df, col("RFID") == col("RFID_PASS"), "left", rsuffix="_PASS")
    
    # Left join with resort ticket
    first_rides_df = first_rides_df.join(resort_ticket_df, col("RFID") == col("RFID_TICKET"), "left", rsuffix="_TICKET")
     
    first_rides_df = first_rides_df.select(
        first_rides_df.col("RESORT"),
        first_rides_df.col("RFID"),
        first_rides_df.col("FIRST_LIFT"),
        first_rides_df.col("FIRST_RIDE_TIME"),
        first_rides_df.col("VISIT_DATE"), 
        coalesce(season_pass_df.col("NAME"), resort_ticket_df.col("NAME")).alias("NAME"), # Name on ticket or pass
        when(season_pass_df.col("RFID").is_not_null(), True).otherwise(False).alias("HAS_SEASON_PASS"),
        coalesce(season_pass_df.col("PRICE_USD"), resort_ticket_df.col("PRICE_USD")).alias("PURCHASE_PRICE_USD"), # Price of ticket or pass        
        first_rides_df.col("ACTIVATION_USAGE_COUNT"),
        resort_ticket_df.col("DAYS").alias("TICKET_ORIGINAL_DURATION") #Will be null for passes
    )
    
    # Step 3: Anti-join with existing DAILY_VISITS
    daily_visits_df = session.table("DAILY_VISITS").select(
            col("VISIT_DATE"),
            col("RESORT"),
            col("RFID")
    )
    # Create the anti-join condition - check for any existing record for this RFID/resort/date combination
    new_visits_df = first_rides_df.join(daily_visits_df, 
        ((col("VISIT_DATE") == col("VISIT_DATE_DV")) &
        (col("RESORT") == col("RESORT_DV")) &
        (col("RFID") == col("RFID_DV"))), "left", rsuffix="_DV").filter(col("RESORT_DV").is_null())  # Anti-join condition        
    new_visits_df = new_visits_df.select(
        first_rides_df.col("VISIT_DATE"),
        first_rides_df.col("RESORT"),
        first_rides_df.col("RFID"),
        first_rides_df.col("NAME"),
        first_rides_df.col("FIRST_RIDE_TIME"),
        first_rides_df.col("FIRST_LIFT"),
        first_rides_df.col("HAS_SEASON_PASS"),
        first_rides_df.col("PURCHASE_PRICE_USD"),
        first_rides_df.col("ACTIVATION_USAGE_COUNT"),
        first_rides_df.col("TICKET_ORIGINAL_DURATION") 
    )
    
    # Step 4: Append new visits into DAILY_VISITS table
    try:
        # Write the data to the table
        new_visits_df.write.mode("append").save_as_table("DAILY_VISITS", column_order="name")        
        return "OK"
    except Exception as e:
        return f"ERROR: {str(e)}"

### 4.4. Manually invoke Python function (for testing/setup)

Prior to deploying as a Snowflake task, let's run the Python function to make sure it's working properly. This step will backfill initial data if `SHOW_INITIAL_ROWS=TRUE` was used for the stream and it's the first run.

In [None]:
populate_daily_visits(session)

### 4.5. Create Triggered Task to Automate `populate_daily_visits`

Define and create a Snowflake Triggered Task to automatically run `populate_daily_visits` as a Python stored procedure when new data arrives in the `LIFT_RIDE_STREAM`.

In [None]:
from snowflake.core.task import StoredProcedureCall, Task

populate_dv_task = Task(
    "populate_daily_visits",
    StoredProcedureCall(populate_daily_visits, stage_location="@snowpark_apps"),
    warehouse="STREAMING_INGEST", 
    condition="SYSTEM$STREAM_HAS_DATA('lift_ride_stream')",
    allow_overlapping_execution=False
)
populate_dv_task_res = root.databases['streaming_ingest'].schemas['streaming_ingest'].tasks["populate_daily_visits"]
populate_dv_task_res.create_or_alter(populate_dv_task)

## 5. Task Management

Commands to manage the `populate_daily_visits`, such as suspending, checking parameters, altering, and resuming.

In [None]:
populate_dv_task_res = root.databases['streaming_ingest'].schemas['streaming_ingest'].tasks["populate_daily_visits"]

In [None]:
populate_dv_task_res.suspend()

In [None]:
SHOW PARAMETERS LIKE 'USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS' IN TASK populate_daily_visits;

In [None]:
-- Note: USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS controls the minimum execution interval for triggered tasks.
-- By setting to 10 seconds, the task will run with maximum frequency.
ALTER TASK populate_daily_visits SET USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS = 10;

In [None]:
# Resume the task to start its execution based on the stream condition
# Ensure populate_dv_task_ref is defined from the PY_SUSPEND_POPULATE_DAILY_VISITS_TASK cell
populate_dv_task_res.resume()

In [None]:
describe task populate_daily_visits;

## 6. Dynamic Table Aggregation Pipeline

Define a series of Dynamic Tables to perform hierarchical aggregations (hourly, daily, weekly) on the ski resort data. These tables will automatically refresh as new data arrives.

### 6.1. Hourly Aggregations

These Dynamic Tables provide the first level of aggregation, summarizing data on an hourly basis.

In [None]:
CREATE OR REPLACE DYNAMIC TABLE HOURLY_LIFT_ACTIVITY
TARGET_LAG='1 minute'
WAREHOUSE = STREAMING_INGEST
REFRESH_MODE = incremental
AS
SELECT
    DATE(lr.RIDE_TIME) as RIDE_DATE,
    HOUR(lr.RIDE_TIME) as RIDE_HOUR,
    DATE_TRUNC('hour', lr.RIDE_TIME) as RIDE_HOUR_TIMESTAMP,
    lr.RESORT,
    COUNT(*) as TOTAL_RIDES,
    COUNT(DISTINCT lr.RFID) as VISITOR_COUNT,
    -- Use DAILY_VISITS to determine pass usage
    COUNT(DISTINCT CASE WHEN dv.HAS_SEASON_PASS = TRUE THEN lr.RFID END) as ACTIVE_PASSES,
    COUNT(CASE WHEN dv.HAS_SEASON_PASS = TRUE THEN 1 END) as PASS_RIDES
    -- COUNT(DISTINCT CASE
    --     WHEN dv.HAS_SEASON_PASS = TRUE
    --     AND HOUR(dv.FIRST_RIDE_TIME) = HOUR(lr.RIDE_TIME)
    --     THEN lr.RFID
    -- END) as PASSES_ACTIVATED,
    -- COUNT(DISTINCT CASE
    --     WHEN dv.HAS_SEASON_PASS = FALSE 
    --     AND HOUR(dv.FIRST_RIDE_TIME) = HOUR(lr.RIDE_TIME)
    --     THEN dv.RFID
    -- END) AS TICKETS_ACTIVATED 
FROM LIFT_RIDE lr
LEFT JOIN DAILY_VISITS dv ON lr.RFID = dv.RFID
    AND DATE(lr.RIDE_TIME) = dv.VISIT_DATE
    AND lr.RESORT = dv.RESORT
GROUP BY RIDE_DATE, RIDE_HOUR, RIDE_HOUR_TIMESTAMP, lr.RESORT;

In [None]:
select * from HOURLY_LIFT_ACTIVITY 
order by RIDE_HOUR_TIMESTAMP desc 
limit 100;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE HOURLY_AMORTIZED_REVENUE
TARGET_LAG = '1 minute'
WAREHOUSE = STREAMING_INGEST
REFRESH_MODE = INCREMENTAL
AS
SELECT
    dv.VISIT_DATE AS RIDE_DATE,
    HOUR(dv.FIRST_RIDE_TIME) AS RIDE_HOUR,
    DATE_TRUNC('hour', dv.FIRST_RIDE_TIME) AS RIDE_HOUR_TIMESTAMP,
    dv.RESORT,
    SUM(CASE
        WHEN NOT dv.HAS_SEASON_PASS -- It's a ticket
        THEN (dv.PURCHASE_PRICE_USD / GREATEST(dv.TICKET_ORIGINAL_DURATION, 1)) -- Use actual ticket duration
        ELSE 0
    END) AS RECOGNIZED_TICKET_REVENUE,

    COUNT(DISTINCT CASE WHEN NOT dv.HAS_SEASON_PASS THEN dv.RFID END) AS TICKET_ACTIVATIONS,

    SUM(CASE
        WHEN dv.HAS_SEASON_PASS AND dv.ACTIVATION_USAGE_COUNT <= 20 -- Usage cap for passes
        THEN (dv.PURCHASE_PRICE_USD / 20) -- Amortize pass price over a fixed number (e.g., 20 days)
        ELSE 0
    END) AS RECOGNIZED_PASS_REVENUE,

    COUNT(DISTINCT CASE WHEN dv.HAS_SEASON_PASS THEN dv.RFID END) AS PASS_ACTIVATIONS
FROM DAILY_VISITS dv
GROUP BY RIDE_HOUR_TIMESTAMP, dv.RESORT, dv.VISIT_DATE, RIDE_HOUR;

In [None]:
select * from HOURLY_AMORTIZED_REVENUE 
order by RIDE_HOUR_TIMESTAMP desc 
limit 100;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE HOURLY_LIFT_TICKET_SALES
TARGET_LAG='1 minute' 
WAREHOUSE = STREAMING_INGEST 
REFRESH_MODE = incremental
AS
SELECT 
    DATE(PURCHASE_TIME) as PURCHASE_DATE,
    HOUR(PURCHASE_TIME) as PURCHASE_HOUR,
    RESORT,
    SUM(PRICE_USD) as TICKET_REVENUE,
    COUNT(*) as TICKETS_SOLD
FROM RESORT_TICKET 
GROUP BY DATE(PURCHASE_TIME), HOUR(PURCHASE_TIME), RESORT;

In [None]:
select * from HOURLY_LIFT_TICKET_SALES 
order by PURCHASE_DATE desc, PURCHASE_HOUR desc
limit 100;

In [None]:
CREATE OR REPLACE DYNAMIC TABLE HOURLY_RESORT_SUMMARY
TARGET_LAG = '1 minute'
WAREHOUSE = STREAMING_INGEST
REFRESH_MODE = INCREMENTAL
AS
SELECT
    hla.RIDE_DATE,
    hla.RIDE_HOUR,
    hla.RIDE_HOUR_TIMESTAMP,
    hla.RESORT,
    hla.VISITOR_COUNT,
    hla.TOTAL_RIDES,

    -- Ticktet and pass activation counts
    har.TICKET_ACTIVATIONS, 
    har.PASS_ACTIVATIONS,  

    -- General activity metrics 
    hla.ACTIVE_PASSES, 
    hla.PASS_RIDES,        
    (hla.VISITOR_COUNT - hla.ACTIVE_PASSES) AS ACTIVE_TICKETS,
    (hla.TOTAL_RIDES - hla.PASS_RIDES) AS TICKET_RIDES,
    
    -- Recognized Revenue from HOURLY_AMORTIZED_REVENUE
    COALESCE(har.RECOGNIZED_TICKET_REVENUE, 0) AS RECOGNIZED_TICKET_REVENUE,
    COALESCE(har.RECOGNIZED_PASS_REVENUE, 0) AS RECOGNIZED_PASS_REVENUE,

    -- New Total Recognized Revenue
    (COALESCE(har.RECOGNIZED_TICKET_REVENUE, 0) + COALESCE(har.RECOGNIZED_PASS_REVENUE, 0)) AS TOTAL_RECOGNIZED_REVENUE,

    -- Calculate capacity percentage
    ROUND((hla.VISITOR_COUNT / rc.MAX_CAPACITY * 100), 1) AS CAPACITY_PCT,

    -- Capacity status
    CASE
        WHEN (hla.VISITOR_COUNT / rc.MAX_CAPACITY * 100) > 90 THEN 'HIGH'
        WHEN (hla.VISITOR_COUNT / rc.MAX_CAPACITY * 100) > 70 THEN 'MODERATE'
        ELSE 'NORMAL'
    END AS CAPACITY_STATUS

FROM HOURLY_LIFT_ACTIVITY hla
LEFT JOIN HOURLY_AMORTIZED_REVENUE har
    ON hla.RIDE_DATE = har.RIDE_DATE
    AND hla.RIDE_HOUR = har.RIDE_HOUR
    AND hla.RESORT = har.RESORT
JOIN RESORT_CAPACITY rc ON hla.RESORT = rc.RESORT;

In [None]:
select * from HOURLY_RESORT_SUMMARY 
where resort = 'Vail' and ride_date = '2026-08-10'
order by RIDE_DATE desc, RIDE_HOUR desc
limit 100;

### 6.2. Daily Aggregations

This Dynamic Table rolls up hourly data and incorporates accurate daily visitor counts from the `DAILY_VISITS` table.

In [None]:
-- ========================================
-- DAILY RESORT SUMMARY DYNAMIC TABLE
-- Daily aggregation from hourly DT + accurate visitor counts from DAILY_VISITS
-- ========================================
CREATE OR REPLACE DYNAMIC TABLE DAILY_RESORT_SUMMARY
TARGET_LAG='1 minute' -- Or your desired lag, e.g., '60 minutes' if daily is less frequent
WAREHOUSE = STREAMING_INGEST
REFRESH_MODE = INCREMENTAL
AS
SELECT
    RIDE_DATE,
    RESORT,
    MAX(VISITOR_COUNT) AS PEAK_HOURLY_VISITORS,
    SUM(VISITOR_COUNT) AS TOTAL_VISITOR_HOURS,
    SUM(TOTAL_RIDES) AS TOTAL_RIDES,
    SUM(RECOGNIZED_TICKET_REVENUE) AS TOTAL_TICKET_REVENUE,
    SUM(RECOGNIZED_PASS_REVENUE) AS TOTAL_PASS_REVENUE,
    SUM(TOTAL_RECOGNIZED_REVENUE) AS TOTAL_REVENUE,
    SUM(TICKET_ACTIVATIONS) AS TOTAL_TICKET_ACTIVATIONS,
    SUM(PASS_ACTIVATIONS) AS TOTAL_PASS_ACTIVATIONS,    
    (TOTAL_TICKET_ACTIVATIONS + TOTAL_PASS_ACTIVATIONS) AS TOTAL_VISITORS,
    SUM(PASS_RIDES) AS TOTAL_PASS_RIDES,
    SUM(TICKET_RIDES) AS TOTAL_TICKET_RIDES,
    ROUND(AVG(CAPACITY_PCT), 1) AS AVG_CAPACITY_PCT,
    MAX(CAPACITY_PCT) AS PEAK_CAPACITY_PCT,
    COUNT(*) AS OPERATION_HOURS -- Counts number of hourly records
FROM HOURLY_RESORT_SUMMARY
GROUP BY RIDE_DATE, RESORT;

In [None]:
select * from DAILY_RESORT_SUMMARY 
order by RIDE_DATE desc
limit 100;

### 6.3. Weekly Aggregations

This Dynamic Table aggregates daily summaries to provide weekly insights.

In [None]:
-- ========================================
-- WEEKLY RESORT SUMMARY DYNAMIC TABLE
-- Weekly aggregation from daily DT
-- ========================================
CREATE OR REPLACE DYNAMIC TABLE WEEKLY_RESORT_SUMMARY
TARGET_LAG = '1 minute'
WAREHOUSE = STREAMING_INGEST
REFRESH_MODE = INCREMENTAL
AS
SELECT
    DATE_TRUNC('week', RIDE_DATE) AS WEEK_START_DATE,
    RESORT,
    MAX(TOTAL_VISITORS) AS MAX_DAILY_UNIQUE_VISITORS, -- Peak unique visitors on any single day in the week
    ROUND(AVG(TOTAL_VISITORS), 0) AS AVG_DAILY_UNIQUE_VISITORS, -- Average daily unique visitors
    SUM(TOTAL_VISITORS) AS WEEK_TOTAL_VISITORS, -- Sum of daily unique visitors (visitor-days)
    SUM(TOTAL_RIDES) AS WEEK_TOTAL_RIDES,
    SUM(TOTAL_PASS_RIDES) AS WEEK_TOTAL_PASS_RIDES,
    SUM(TOTAL_TICKET_RIDES) AS WEEK_TOTAL_TICKET_RIDES,     
    SUM(TOTAL_TICKET_REVENUE) AS WEEK_TOTAL_TICKET_REVENUE,
    SUM(TOTAL_PASS_REVENUE) AS WEEK_TOTAL_PASS_REVENUE,
    SUM(TOTAL_REVENUE) AS WEEK_TOTAL_REVENUE,
    ROUND(AVG(TOTAL_REVENUE), 0) AS AVG_DAILY_REVENUE,    
    SUM(TOTAL_TICKET_ACTIVATIONS) AS WEEK_TOTAL_TICKET_ACTIVATIONS,
    SUM(TOTAL_PASS_ACTIVATIONS) AS WEEK_TOTAL_PASS_ACTIVATIONS,    
    ROUND(AVG(AVG_CAPACITY_PCT), 1) AS AVG_WEEK_CAPACITY_PCT, -- Average of the daily average capacities
    MAX(PEAK_CAPACITY_PCT) AS WEEK_PEAK_CAPACITY_PCT,       -- Peak hourly capacity reached during the week
    COUNT(DISTINCT RIDE_DATE) AS OPERATION_DAYS -- Count of distinct days with operations in the week
FROM DAILY_RESORT_SUMMARY
GROUP BY DATE_TRUNC('week', RIDE_DATE), RESORT;

In [None]:
select * from WEEKLY_RESORT_SUMMARY 
order by WEEK_START_DATE desc
limit 100;

## 7. Analytical Views for Reporting

Create views on top of base tables and/or dynamic tables for easier querying and dashboarding.

In [None]:
-- ========================================
-- VIEW: V_RT_LIFT_PERFORMANCE
-- Real-time lift performance based on last 30 minutes of activity from LIFT_RIDE table
-- ========================================
CREATE OR REPLACE VIEW V_RT_LIFT_PERFORMANCE AS
WITH simulation_clock AS (
    -- Determine the latest ride time to simulate a 'current time' for simulated data
    SELECT
        MAX(RIDE_TIME) as MAX_RIDE_TIME
    FROM LIFT_RIDE
),
recent_activity AS (
SELECT
    lr.RESORT,
    lr.LIFT,
    COUNT(*) as RIDES_30MIN,
    COUNT(DISTINCT lr.RFID) as VISITORS_30MIN,
    MAX(lr.RIDE_TIME) as LAST_ACTIVITY_TIME
FROM LIFT_RIDE lr
    CROSS JOIN simulation_clock clock
WHERE lr.RIDE_TIME >= DATEADD(MINUTE, -30, clock.MAX_RIDE_TIME)
GROUP BY lr.RESORT, lr.LIFT
)
SELECT
    ra.RESORT,
    ra.LIFT,
    ra.RIDES_30MIN,
    ra.VISITORS_30MIN,
    ra.LAST_ACTIVITY_TIME,
    ROUND(ra.RIDES_30MIN * 2.0, 1) as ESTIMATED_RIDES_PER_HOUR,
    ROW_NUMBER() OVER (PARTITION BY ra.RESORT ORDER BY ra.RIDES_30MIN DESC) as USAGE_RANK_IN_RESORT
FROM recent_activity ra
ORDER BY ra.RESORT, ra.RIDES_30MIN DESC;

In [None]:
select * from V_RT_LIFT_PERFORMANCE
order by RESORT, LIFT;

In [None]:
-- ========================================
-- VIEW: V_DAILY_REVENUE_PERFORMANCE
-- Daily revenue vs targets, derived from DAILY_RESORT_SUMMARY and RESORT_CAPACITY
-- ========================================
CREATE OR REPLACE VIEW V_DAILY_REVENUE_PERFORMANCE AS
WITH daily_targets AS (
    SELECT
        RESORT,
        (MAX_CAPACITY * 0.7 * 100) as REVENUE_TARGET_USD -- Example target: 70% of max capacity value, assuming $100 per visitor
    FROM RESORT_CAPACITY
)
SELECT
    d.RIDE_DATE,
    d.RESORT,
    d.TOTAL_REVENUE,
    t.REVENUE_TARGET_USD,
    CASE
        WHEN t.REVENUE_TARGET_USD > 0 THEN ROUND((d.TOTAL_REVENUE / t.REVENUE_TARGET_USD * 100), 1)
        ELSE NULL
        END as REVENUE_TARGET_PCT,
    CASE
        WHEN d.TOTAL_REVENUE >= t.REVENUE_TARGET_USD THEN 'ABOVE_TARGET'
        WHEN d.TOTAL_REVENUE >= t.REVENUE_TARGET_USD * 0.9 THEN 'NEAR_TARGET'
        ELSE 'BELOW_TARGET'
        END as PERFORMANCE_STATUS
FROM DAILY_RESORT_SUMMARY d
         JOIN daily_targets t ON d.RESORT = t.RESORT;

In [None]:
select * from V_DAILY_REVENUE_PERFORMANCE order by RIDE_DATE DESC LIMIT 100;

In [None]:
-- ========================================
-- VIEW: V_DAILY_NETWORK_METRICS
-- Simplified network-wide metrics for dashboard, derived from DAILY_RESORT_SUMMARY
-- ========================================
CREATE OR REPLACE VIEW V_DAILY_NETWORK_METRICS AS
SELECT 
    RIDE_DATE,
    SUM(TOTAL_VISITORS) as TOTAL_NETWORK_VISITORS,
    SUM(TOTAL_REVENUE) as TOTAL_NETWORK_REVENUE,
    ROUND(AVG(AVG_CAPACITY_PCT), 1) as AVG_NETWORK_CAPACITY_PCT, -- Average of average daily capacities
    SUM(TOTAL_RIDES) as TOTAL_NETWORK_RIDES,
    COUNT(DISTINCT RESORT) as ACTIVE_RESORTS
FROM DAILY_RESORT_SUMMARY
GROUP BY RIDE_DATE;

In [None]:
select * from V_DAILY_NETWORK_METRICS order by RIDE_DATE DESC LIMIT 100;

## 9. Schema Verification

Show tables and views to verify the created objects.

In [None]:
-- List base tables
SHOW TABLES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "is_dynamic" = 'N';

In [None]:
-- List dynamic tables
SHOW DYNAMIC TABLES;

In [None]:
-- List views
SHOW VIEWS;

## 10. Dynamic Table Observability

Monitor the health, refresh history, and status of your Dynamic Tables.

In [None]:
-- Check refresh history for performance monitoring
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(NAME_PREFIX => 'STREAMING_INGEST.STREAMING_INGEST.'))
ORDER BY refresh_start_time DESC;

## 11. Conclusion and Next Steps

This notebook has established an end-to-end streaming data pipeline incorporating Snowpark for complex transformations (`DAILY_VISITS`) and a hierarchy of Dynamic Tables for efficient, incremental aggregations.

**Key Features Implemented:**
- Automated daily unique visitor tracking using a Snowpark procedure and Task.
- Multi-level aggregation pipeline (Hourly → Daily → Weekly) using Dynamic Tables.
- Analytical views for simplified reporting and dashboarding.
- Observability queries for monitoring Dynamic Table performance and health.

**Potential Next Steps:**
- Build Streamlit applications or connect BI tools to these views and Dynamic Tables for visualization.
- Extend the pipeline with more advanced analytics, such as anomaly detection or predictive modeling.
- Implement alerting based on DT status or data quality checks.