An experimental DuckDB extension implementing Julian Hyde's "Measures in SQL" paper (arXiv:2406.00251).
Yardstick adds measure-aware SQL to DuckDB. Measures are aggregations that know how to re-aggregate themselves when the query context changes. This enables:
- Percent of total calculations without CTEs or window functions
- Year-over-year comparisons with simple syntax
- Drill-down analytics that automatically adjust aggregation context
-- Load the extension
LOAD 'yardstick';
-- Create a view with measures
CREATE VIEW sales_v AS
SELECT
year,
region,
SUM(amount) AS MEASURE revenue,
COUNT(*) AS MEASURE order_count
FROM sales
GROUP BY year, region;
-- Query with AGGREGATE() and AT modifiers (SEMANTIC prefix required)
SEMANTIC SELECT
year,
region,
AGGREGATE(revenue) AS revenue,
AGGREGATE(revenue) AT (ALL region) AS year_total,
AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS pct_of_year
FROM sales_v
GROUP BY year, region;CREATE VIEW view_name AS
SELECT
dimension1,
dimension2,
AGG(expr) AS MEASURE measure_name
FROM table
GROUP BY dimension1, dimension2;Supported aggregations: SUM, COUNT, AVG, MIN, MAX
Queries using AGGREGATE() must use the SEMANTIC prefix:
SEMANTIC SELECT
dimensions,
AGGREGATE(measure_name) [AT modifier]
FROM view_name
GROUP BY dimensions;| Modifier | Description | Example |
|---|---|---|
AT (ALL) |
Grand total across all dimensions | AGGREGATE(revenue) AT (ALL) |
AT (ALL dim) |
Total excluding specific dimension | AGGREGATE(revenue) AT (ALL region) |
AT (SET dim = val) |
Fix dimension to specific value | AGGREGATE(revenue) AT (SET year = 2022) |
AT (SET dim = expr) |
Fix dimension to expression | AGGREGATE(revenue) AT (SET year = year - 1) |
AT (WHERE cond) |
Pre-aggregation filter | AGGREGATE(revenue) AT (WHERE region = 'US') |
AT (VISIBLE) |
Use query's WHERE clause | AGGREGATE(revenue) AT (VISIBLE) |
SEMANTIC SELECT
region,
AGGREGATE(revenue) AS revenue,
100.0 * AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL) AS pct_total
FROM sales_v
GROUP BY region;SEMANTIC SELECT
year,
AGGREGATE(revenue) AS revenue,
AGGREGATE(revenue) AT (SET year = year - 1) AS prior_year,
100.0 * (AGGREGATE(revenue) - AGGREGATE(revenue) AT (SET year = year - 1))
/ AGGREGATE(revenue) AT (SET year = year - 1) AS yoy_growth
FROM sales_v
GROUP BY year;SEMANTIC SELECT
year,
region,
AGGREGATE(revenue) AS revenue,
AGGREGATE(revenue) AT (ALL region) AS year_total,
100.0 * AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS contribution
FROM sales_v
GROUP BY year, region;Prerequisites:
- CMake 3.5+
- C++17 compiler
- Cargo
make # builds Rust library and DuckDB extension
make test # runs testsThe extension will be at build/release/extension/yardstick/yardstick.duckdb_extension
See LIMITATIONS.md for known issues and workarounds.
Key limitations:
- Chained AT modifiers collapse to grand total instead of removing dimensions sequentially
- Derived measures (measures referencing other measures) not yet supported
- Window function measures not supported
- Julian Hyde, "Measures in SQL" (2024). arXiv:2406.00251
- DuckDB Extension Template
MIT