Production-grade SQL queries demonstrating analytical patterns used in real data warehouses
All queries run locally on SQLite — no database server required. The syntax is ANSI-standard SQL, compatible with Snowflake, BigQuery, PostgreSQL, and Redshift.
| File | Topic | Concepts Covered |
|---|---|---|
01_window_functions.sql |
Window Functions | ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, running totals, moving averages |
02_cte_and_subqueries.sql |
CTEs & Subqueries | Basic CTE, multi-CTE chains, recursive CTE, correlated subqueries |
03_kpi_dashboard_queries.sql |
KPI Dashboards | MoM growth, salesperson scorecard, cohort analysis, Pareto analysis |
04_data_quality_sql.sql |
Data Quality | Null audits, duplicate detection, outlier detection (Z-score), referential integrity |
05_generate_sample_data.sql |
Sample Data | 30 sales records across 5 products, 5 regions, 6 salespeople |
-- Running total (cumulative revenue)
SUM(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Month-over-month growth
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) / LAG(revenue, 1) OVER (ORDER BY month) * 100
-- Top N per group (salesperson's top 3 orders)
ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY revenue DESC)
-- Percent of total without GROUP BY collapse
revenue / SUM(revenue) OVER () * 100git clone https://github.com/YOUR_USERNAME/sql-analytics-portfolio.git
cd sql-analytics-portfolio
python src/run_queries.py # Run all queries and display results
python src/run_queries.py --query 01 # Window functions only
python src/run_queries.py --query 03 # KPI queries onlyOr open data/portfolio.db in DB Browser for SQLite (free GUI).
- Window functions vs GROUP BY: GROUP BY collapses rows; window functions add a computed column while keeping all rows — this is the key insight
- CTE readability: complex 5-table subquery chains become maintainable when broken into named CTEs
- Recursive CTEs: the only SQL approach for hierarchical data (org charts, category trees) without application-level loops
- Cohort analysis: how to calculate retention by anchoring on the first activity date per user/group
- Pareto principle in SQL: cumulative percentage queries using
SUM() OVER (ORDER BY ... ROWS UNBOUNDED PRECEDING)
MIT
Part of a 10-project Data Analyst portfolio