Skip to content

monroesolisdata/snowflake-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake Analytics

Advanced Analytics on the Cloud Data Warehouse — QUALIFY, Time Travel, VARIANT, Tasks

Snowflake-native SQL features: semi-structured data, materialized views, CDC streams, and zero-copy cloning


Snowflake Python SQL License


Note: A Snowflake account is needed for full execution (free 30-day trial at snowflake.com). All Python logic is unit-testable locally via MockSnowflakeClient — tests pass without credentials.


Snowflake-Native Features Demonstrated

Feature Description File
Virtual Warehouse Auto-suspend/resume compute — no idle cost 01_setup.sql
VARIANT column Schema-flexible JSON storage (no schema migrations needed) 01_setup.sql
QUALIFY clause Window function filter without subquery 02_analytics_queries.sql
PARSE_JSON + FLATTEN Query nested JSON arrays with LATERAL FLATTEN 02_analytics_queries.sql
Time Travel Query historical data: AT(OFFSET => -3600) 02_analytics_queries.sql
Zero-copy clone CLONE TABLE — instant dev copy, no storage cost 01_setup.sql
Materialized View Pre-computed daily revenue — auto-refreshed 02_analytics_queries.sql
Task Scheduled SQL — cron-like pipeline without Airflow 02_analytics_queries.sql
Stream (CDC) Capture every INSERT/UPDATE/DELETE incrementally 01_setup.sql

Snowflake vs Traditional Databases

Feature Traditional DB Snowflake
Scaling compute Manual (DBA, downtime) Instant (ALTER WAREHOUSE SET SIZE = 'LARGE')
Semi-structured data Separate JSON column with text Native VARIANT type with indexing
Historical queries Manual backups Built-in Time Travel (up to 90 days)
Dev environments Full data copy Zero-copy clone (instant, no storage cost)
Scheduled pipelines Cron + scripts Native Tasks (no external scheduler needed)
Data sharing ETL to partner Native Data Sharing (no copying)

Quick Start

# 1. Run tests (no Snowflake account needed)
pip install pandas pytest
pytest tests/ -v

# 2. Connect to real Snowflake
cp .env.example .env
# Fill in your credentials
pip install snowflake-connector-python python-dotenv
python -c "
from dotenv import load_dotenv; load_dotenv()
from src.snowflake_client import SnowflakeClient
client = SnowflakeClient()
df = client.get_revenue_by_region()
print(df)
"

# 3. Run SQL files in Snowflake UI (Snowsight)
# Upload sql/01_setup.sql → run to create warehouse + tables
# Upload sql/02_analytics_queries.sql → run analytical queries

What I Learned

  • Virtual Warehouses: Snowflake separates storage from compute — you can have 10 warehouses of different sizes all querying the same data simultaneously
  • QUALIFY clause: cleaner than wrapping a window function in a subquery — unique to Snowflake/BigQuery/Databricks
  • VARIANT type: store JSON without a schema — use : operator to navigate nested fields (METADATA:channel::STRING)
  • Time Travel: query data from any point in the past — invaluable for debugging pipeline failures ("what did the table look like before that ETL ran?")
  • Zero-copy cloning: how Snowflake enables instant dev environments without duplicating storage

License

MIT

Part of a 10-project Data Analyst portfolio

About

Snowflake SQL analytics showcase: QUALIFY window functions, PARSE_JSON/FLATTEN for VARIANT data, Time Travel, Materialized Views, Streams CDC, and Tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages