Fix aggregation parsing for complex expressions#31
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11a079a1d9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if isinstance(parsed.this, exp.Distinct): | ||
| agg_func = "count_distinct" | ||
| inner_expr = distinct_match.group(1).strip() | ||
|
|
||
| data["agg"] = agg_func | ||
| data["sql"] = inner_expr | ||
| # Extract the actual expression from inside Distinct | ||
| if parsed.this.expressions: | ||
| inner_expr = parsed.this.expressions[0].sql(dialect="duckdb") |
There was a problem hiding this comment.
Preserve all DISTINCT expressions in COUNT
When COUNT(DISTINCT ...) has multiple expressions (e.g., COUNT(DISTINCT a, b)), sqlglot exposes them in Distinct.expressions. This code now only takes expressions[0], which drops the remaining expressions and changes the aggregation semantics (distinct on a instead of (a,b)), so counts will be too high for multi-column DISTINCTs. The previous regex path preserved the full a, b list, so this is a regression for any multi-column distinct usage.
Useful? React with 👍 / 👎.
76070d1 to
1aee662
Compare
- Replace greedy regex with sqlglot in Metric class to properly handle expressions like SUM(x) / SUM(y) without mangling them - Fix COUNT DISTINCT detection using isinstance(parsed.this, exp.Distinct) - Add expression metric support in SQL generator for metrics with inline aggregations (agg=None, type=None, sql=<expression>) - Fix dependency analyzer to skip resolution for expression metrics with inline aggregations - Fix cumulative metrics to properly resolve references to other measures and generate valid aliases Add kitchen sink tests using patterns from rill-examples to catch edge cases.
1aee662 to
3c2ecef
Compare
Summary
SUM(x) / SUM(y)without mangling themisinstance(parsed.this, exp.Distinct)Bugs Fixed
SUM(deletions) / SUM(changes)was being parsed asagg=sum, sql=deletions) / SUM(changesCOUNT(DISTINCT user_id)was parsed asagg=count, sql=DISTINCT user_idinstead ofagg=count_distinct, sql=user_idTest Results