Skip to content

Add cohort metric type for two-level aggregation#124

Merged
nicosuave merged 13 commits intomainfrom
cohort-metric
Mar 30, 2026
Merged

Add cohort metric type for two-level aggregation#124
nicosuave merged 13 commits intomainfrom
cohort-metric

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Summary

  • New cohort metric type that supports two-level aggregation with HAVING
  • Aggregate per entity, filter with HAVING, then re-aggregate at the query's dimension grain
  • Supports entity, inner_metrics (list of {name, agg, sql}), entity_dimensions (carried through to outer GROUP BY), having, and outer agg
  • Same dispatch pattern as retention/conversion: generates its own complete query, can't combine with other metric types
  • Useful for "users active N+ days", "users on multiple platforms", "accounts with revenue > $X", etc.

Changes:

  • metric.py: Added cohort to type literal, new fields, validation
  • adapters/sidemantic.py: Parse cohort fields in both metric construction paths
  • sql/generator.py: Route cohort metrics through _generate_cohort_metric_query(), handles segments, filters, entity dimensions, and order/limit/offset

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05a1f28a4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2508 to +2510
for dim_ref in parsed_dims:
dim_model_name = dim_ref.get("model", model_name)
dim_name = dim_ref.get("dimension")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unpack parsed dimension tuples before accessing fields

_parse_dimension_refs returns (dimension_ref, granularity) tuples, but this loop treats each parsed entry like a dict and calls .get(...). When a cohort query includes any dimension, SQL generation fails immediately with AttributeError: 'tuple' object has no attribute 'get', so cohort metrics cannot be queried at a dimension grain.

Useful? React with 👍 / 👎.

Comment on lines +2374 to +2378
if not model:
for m_name, m in self.graph.models.items():
candidate = m.get_metric(local_name if not model_name else metric_name.split(".", 1)[-1])
if candidate and candidate.type == "cohort":
model = m
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve graph-level cohort metrics before model scan

This resolver only scans model.get_metric(...) across models and never checks self.graph.get_metric(...), so cohort metrics added via SemanticGraph.add_metric are not resolvable here. Those metrics are still classified as cohort earlier in query planning, but this method then raises No model found for cohort metric ..., breaking unqualified graph-level cohort metrics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b66b2db5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

entity_sql = _replace_model_placeholder(entity_sql)

# Process segments from filters
all_filters = list(filters or [])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include cohort metric filters when building WHERE clause

The cohort branch initializes all_filters from request-time filters only, so Metric.filters on a type="cohort" metric are never applied. This yields silently incorrect counts whenever a cohort metric defines built-in scoping (for example, region = 'US'): the generated SQL omits that predicate and computes the cohort over the full dataset instead.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e54114d53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2512 to +2513
stripped = self._strip_model_prefixes([f], model_name or "")[0] if model_name else f
stripped = _replace_model_placeholder(stripped)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve semantic filter dims before building cohort WHERE

The cohort path strips model prefixes but never runs _resolve_filter_dimensions, so filters that use semantic dimensions backed by expressions (for example city mapped to json_extract_string(...)) are emitted as bare column names in WHERE. In _generate_cohort_metric_query, this causes valid cohort queries with such filters to fail at execution with missing-column errors (city not found) instead of applying the dimension SQL expression.

Useful? React with 👍 / 👎.

else:
dim_model_name = model_name
dim_name = dim_ref_str
alias = dim_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep granularity suffix in cohort dimension aliases

When a cohort query dimension includes a granularity suffix (e.g. events.ts__month), the alias is reduced to the base name (ts) instead of preserving ts__month. As a result, ORDER BY events.ts__month is rewritten to ORDER BY ts__month but no such output column exists, producing a runtime binder error. This also makes cohort output column naming inconsistent with the non-cohort paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 075f3ffa38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3597 to +3601
# Handle cohort metrics separately - two-level aggregation pattern
if cohort_metrics:
if len(cohort_metrics) > 1:
raise ValueError("Only one cohort metric can be queried at a time")
non_cohort = (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate cohort+conversion mixes before early conversion return

This cohort compatibility guard is unreachable when a query includes both a conversion metric and a cohort metric, because _generate_with_window_functions returns from the conversion branch first. In that case the cohort metric is silently ignored and the generated SQL only reflects the conversion metric, which produces incorrect results instead of raising the intended "cannot be combined" error for mixed special metric types.

Useful? React with 👍 / 👎.

Comment on lines +2603 to +2604
if granularity:
dim_sql = f"DATE_TRUNC('{granularity}', {dim_sql})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use dialect-aware truncation for cohort time dimensions

Cohort dimension granularity is emitted with a hard-coded DATE_TRUNC('{granularity}', ...) expression, which is not valid for dialects like BigQuery (DATE_TRUNC(expr, MONTH)). This means cohort queries with time-grain dimensions can compile into invalid SQL on non-Postgres/DuckDB dialects, even though the generator already has a dialect-aware _date_trunc helper used elsewhere.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db336e858a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +328 to +329
query = _normalize_sql_query(payload.query)
result = current_layer.adapter.execute(query)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict /raw endpoint to SELECT statements

run_raw_sql executes payload.query directly after normalization, which bypasses the QueryRewriter guard used by /sql that enforces read-only SELECT semantics. In deployments where auth is disabled (or when a token is shared broadly), this enables mutating statements like DELETE, UPDATE, or DROP against the backing database through the API, which is a significant safety regression from the existing SQL endpoint behavior.

Useful? React with 👍 / 👎.

Comment on lines +2599 to +2602
if dim_model_name == model_name:
dim = model.get_dimension(dim_name)
if dim:
dim_sql = _replace_model_placeholder(dim.sql_expr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Raise on unresolved cohort dimensions

This loop only adds a dimension when it belongs to the cohort metric's model and model.get_dimension(...) returns a match, but it has no error path for misses. As a result, typoed dimensions or dimensions from joined models are silently dropped from the generated SQL, so the query returns a different grain/shape than requested without surfacing an error, which can corrupt downstream analyses.

Useful? React with 👍 / 👎.

Comment on lines +2567 to +2568
if outer_sql:
outer_sql = _replace_model_placeholder(outer_sql)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Build outer cohort SQL in outer scope

For cohort metrics with an explicit sql, the code rewrites {model} placeholders using _replace_model_placeholder, which maps to t when model.sql is present. That alias only exists inside the inner subquery, but outer_expr is emitted in the outer SELECT ... FROM (...) cohort_sub, so expressions like "{model}.user_id" become out-of-scope references and fail at execution time.

Useful? React with 👍 / 👎.

nicosuave and others added 5 commits March 30, 2026 06:14
Cohort metrics aggregate per entity with a HAVING filter, then
re-aggregate the filtered results. Common analytics pattern for
questions like "users active 2+ days" or "users on multiple platforms."

Example:
  - name: retained_users
    type: cohort
    entity: person_id
    entity_dimensions: [platform]
    inner_metrics:
      - name: active_days
        agg: count_distinct
        sql: "cast(timestamp as date)"
    having: "active_days >= 2"
    agg: count

Generates:
  SELECT platform, COUNT(*) AS retained_users
  FROM (
    SELECT person_id, platform, COUNT(DISTINCT CAST(timestamp AS DATE)) AS active_days
    FROM events WHERE ... GROUP BY person_id, platform HAVING active_days >= 2
  ) cohort_sub
  GROUP BY platform
Three fixes in _generate_cohort_metric_query:

1. Graph-level cohort metrics (added via graph.add_metric) now resolve
   correctly. Previously only scanned model.get_metric(), missing
   metrics registered at graph level. Now checks graph.get_metric()
   first, matching the retention resolver pattern.

2. _parse_dimension_refs returns (dim_ref, granularity) tuples, not
   dicts. The dimension loop was calling .get() on tuples, causing
   AttributeError when any dimension was included in a cohort query.

3. Query-level dimensions were computed but never added to the inner
   SELECT/GROUP BY (dead variable inner_select_cols_extra). Moved the
   inner_select/inner_group join after the dimension loop so dimensions
   are properly included in both inner and outer queries.

Also formats cli.py (pre-existing).
Two fixes:

1. The conversion branch returned early without checking if cohort
   metrics were also present, silently dropping them. Now validates
   that conversion metrics aren't mixed with other special metric
   types, matching the retention and cohort patterns.

2. Cohort dimension granularity used hardcoded DATE_TRUNC('gran', col)
   which is invalid on BigQuery. Now uses self._date_trunc() for
   dialect-aware truncation.
Outer SQL was using _replace_model_placeholder which maps {model} to the
inner table alias (t), but the outer query operates on cohort_sub.
Also, unresolved or cross-model dimensions were silently dropped instead
of raising an error.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 144fb3bf26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

elif outer_agg == "COUNT":
outer_expr = f"COUNT({outer_sql or '*'})"
else:
outer_expr = f"{outer_agg}({outer_sql or '*'})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require sql for non-count outer cohort aggregations

If a cohort metric sets agg to something other than count/count_distinct and leaves sql unset (currently allowed by Metric validation), this branch emits invalid SQL such as AVG(*) or SUM(*). That fails at execution time on DuckDB and other dialects, so valid-looking cohort metric definitions break unless users discover they must provide sql manually.

Useful? React with 👍 / 👎.

elif im_agg == "COUNT":
expr = f"COUNT({im_sql or '*'})"
else:
expr = f"{im_agg}({im_sql or '*'})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require sql for non-count inner cohort metrics

Inner cohort metrics have the same issue: for inner_metrics entries with agg like sum/avg and no sql, the generator builds expressions such as SUM(*), which are invalid in SQL engines. Because inner_metrics entries are not validated for this case, these definitions parse successfully but then fail at query execution in the inner aggregation step.

Useful? React with 👍 / 👎.

AVG(*), SUM(*), etc. are invalid SQL. Now raises ValueError at generation
time when a cohort metric (outer or inner) uses a non-count aggregation
without specifying a sql expression.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a988d4c01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +221 to +222
if not self.inner_metrics:
raise ValueError("cohort metric requires 'inner_metrics' field")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate inner_metrics entries for cohort metrics

This validation only enforces that inner_metrics is present and non-empty, but does not check each item’s shape (for example requiring name and a valid agg). A malformed definition like inner_metrics: [{}] will pass model parsing and then fail later during SQL generation when _generate_cohort_metric_query accesses im["name"], causing a runtime KeyError instead of a clear config-time error. Adding per-entry validation here would prevent broken cohort metrics from reaching query execution.

Useful? React with 👍 / 👎.

Each inner_metrics entry must have a 'name' key, and non-count
aggregations must include a 'sql' field. Catches malformed definitions
like inner_metrics: [{}] early instead of failing with KeyError during
SQL generation.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a02f2e91e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2559 to +2561
if im_agg == "COUNT_DISTINCT":
expr = f"COUNT(DISTINCT {im_sql or '*'})"
elif im_agg == "COUNT":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require SQL expression for inner COUNT_DISTINCT cohorts

When an inner cohort metric is defined with agg: count_distinct and no sql, this branch emits COUNT(DISTINCT *), which is invalid SQL in common engines (including DuckDB/Postgres) and fails at execution time. Because cohort validation currently allows this shape, users can define a seemingly valid metric that only breaks when queried. Please either require sql for inner count_distinct metrics or fall back to a valid default column (for example, the cohort entity).

Useful? React with 👍 / 👎.

COUNT(DISTINCT *) is invalid SQL. Only bare COUNT can default to *.
Updated both metric validation and generator runtime check.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bae2937cea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +239 to +240
type: Literal["ratio", "derived", "cumulative", "time_comparison", "conversion", "retention", "cohort"] | None = (
Field(None, description="Metric type for complex calculations")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow cohort metrics in semantic-layer validation

Adding "cohort" to Metric.type here makes cohort metrics constructible, but validate_metric() still rejects that type (sidemantic/validation.py only allows ratio/derived/cumulative/time_comparison/conversion/retention). Because SemanticLayer.add_metric() and file loaders route graph-level metrics through that validator, cohort metrics fail with MetricValidationError during normal API/CLI loading even though SQL generation now supports them.

Useful? React with 👍 / 👎.

Comment on lines +316 to +318
inner_metrics=measure_def.get("inner_metrics"),
entity_dimensions=measure_def.get("entity_dimensions"),
having=measure_def.get("having"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cohort fields during sidemantic export

These new cohort fields are parsed, but the adapter’s export paths (_export_model and _export_metric) never serialize inner_metrics, entity_dimensions, or having. As a result, parse→export round trips silently drop required cohort configuration and emit incomplete metric definitions, which can break downstream validation/reloads in CLI-first workflows.

Useful? React with 👍 / 👎.

- Add 'cohort' to allowed metric types in validate_metric() and skip
  agg validation for cohort metrics in validate_model()
- Serialize inner_metrics, entity_dimensions, and having in both
  model-level and graph-level metric export paths
- Export agg for graph-level metrics (needed for cohort outer agg)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b9150f0c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +422 to +424
# Cohort metrics need special handling
if metric.type == "cohort":
return True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve unqualified cohort metrics before window-path check

The new cohort dispatch in metric_needs_window only works after a metric is found, but unqualified names are resolved via self.graph.get_metric(...) only. Model-scoped cohort metrics are not auto-registered in SemanticGraph.add_model, so a query like metrics=["multi_platform_users"] (defined on a model, not added at graph level) is not detected as cohort, falls through the standard planner, and can fail with No models found for query instead of generating cohort SQL. Please mirror the model-scan fallback used by _generate_cohort_metric_query when resolving unqualified metrics here.

Useful? React with 👍 / 👎.

Both metric_needs_window and resolve_metric_ref (inside
_generate_with_window_functions) only checked graph-level metrics for
unqualified names. Model-scoped cohort metrics like
metrics=["multi_platform_users"] were not found, causing either
"No models found" errors or infinite recursion. Added model-scan
fallback to both resolution paths.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc42856b7a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2443 to +2447
candidate = m.get_metric(local_name)
if candidate and candidate.type == "cohort":
model = m
metric = candidate
model_name = m_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect ambiguous unqualified cohort metrics across models

When resolving unqualified cohort metric names, this branch picks the first model containing local_name and stops. If two models both define a cohort metric with the same name, generate(metrics=["that_name"]) silently uses insertion order instead of raising an ambiguity error, so queries can run against the wrong model and return incorrect results. The cohort resolver already has an ambiguity error path for other cases, so this scan should also disambiguate rather than break on first match.

Useful? React with 👍 / 👎.

Comment on lines +2628 to +2631
inner_select_cols.append(f"{dim_sql} AS {alias}")
inner_group_cols.append(dim_sql)
outer_select_cols.append(alias)
outer_group_cols.append(alias)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote cohort dimension aliases in outer SELECT/GROUP BY

Cohort dimensions are carried into outer_select_cols/outer_group_cols as raw identifiers, so reserved words or non-identifier names generate invalid SQL (for example, a dimension named order produces SELECT order ... GROUP BY order, which fails to parse). Other query paths use alias quoting helpers, but this cohort path does not, causing cohort-only regressions for otherwise valid model dimension names.

Useful? React with 👍 / 👎.

- Cohort metric name-scan now collects all matches across models and
  raises ValueError when >1 match found, instead of silently picking
  the first by insertion order.
- _quote_identifier now delegates to sqlglot for simple identifiers,
  which correctly quotes SQL reserved words like 'order', 'select',
  etc. This fixes invalid SQL in cohort (and all other) query paths
  when dimension names collide with reserved words.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc44a79041

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3495 to +3498
for m_name, m in self.graph.models.items():
found = m.get_metric(metric_ref)
if found:
return found, m_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect ambiguous unqualified window metrics across models

The new model-scan fallback in resolve_metric_ref returns the first matching metric name without checking for collisions, so unqualified window-path metrics (for example cumulative/offset-ratio names duplicated across two models) can now execute against whichever model was added first instead of raising an ambiguity error. That produces silently incorrect results for valid-but-ambiguous queries, because this branch is used to classify and plan window/special metrics in _generate_with_window_functions.

Useful? React with 👍 / 👎.

…CI flake

- resolve_metric_ref model-scan fallback now collects all matches and
  raises ValueError on ambiguity, preventing silent wrong-model queries
  for any metric type (not just cohort).
- Bump complex_rewrite_performance threshold from 15ms to 20ms to fix
  flaky CI on slower Python 3.11 runners (hit 15.625ms).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c44bc4621

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

else:
expr = f"{im_agg}({im_sql})"

inner_metric_selects.append(f"{expr} AS {im_name}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote cohort inner metric aliases in generated SQL

The inner cohort query emits inner_metrics[].name directly in AS {im_name} without quoting or validation, so a legal config like name: "order" or name: "active.days" generates invalid SQL in the inner SELECT/HAVING (AS order, HAVING order >= 2) and fails at parse time. This is a runtime break for cohort metrics that use non-simple aliases; quoting (or enforcing identifier-safe names at validation time) is needed here.

Useful? React with 👍 / 👎.

Comment on lines +2673 to +2676
order_fields.append(f"{parts[0]} {parts[1].upper()}")
else:
order_fields.append(field_name)
order_clause = f"\nORDER BY {', '.join(order_fields)}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote ORDER BY fields in cohort query output

Cohort order_by expressions are appended as raw field names after prefix stripping, so sorting by a dimension alias that requires quoting (for example events.order) produces ORDER BY order even when the same alias is correctly quoted in SELECT/GROUP BY. This causes parser errors on engines like DuckDB/Postgres for valid cohort queries that group by reserved-word dimensions and then sort by them.

Useful? React with 👍 / 👎.

Inner metric names, outer metric name, and ORDER BY field names are now
passed through _quote_alias so reserved words and special characters
produce valid SQL.
@nicosuave nicosuave merged commit b627ab4 into main Mar 30, 2026
15 checks passed
@nicosuave nicosuave deleted the cohort-metric branch March 30, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant