Skip to content

Implement symmetric aggregation for multi-model queries#28

Merged
nicosuave merged 2 commits intomainfrom
nicosuave/cube-kitchen-sink-test
Jan 3, 2026
Merged

Implement symmetric aggregation for multi-model queries#28
nicosuave merged 2 commits intomainfrom
nicosuave/cube-kitchen-sink-test

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Summary

Fixes fan-out issue when metrics from different join levels are queried together by pre-aggregating each metric separately to the dimension grain before joining. Includes comprehensive kitchen sink tests that found and fixed 6 bugs in the Cube adapter and SQL generator.

Changes

  • Symmetric aggregation: New _generate_with_preaggregation() method handles multi-model metric queries correctly
  • Cube adapter fixes: FK inference, derived metrics parsing, filtered measures, complex SQL metrics, one_to_many joins
  • Kitchen sink tests: 916 lines of comprehensive integration tests covering 44 test cases
  • All tests pass: 889 passed, 12 pre-existing failures (missing optional dependencies)

Fixes fan-out issue when metrics from different join levels are queried together.
Pre-aggregates each metric separately to the dimension grain before joining.
Fixes 6 bugs in Cube adapter through comprehensive kitchen sink tests.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41fd438d32

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sidemantic/sql/generator.py Outdated
Comment on lines +1138 to +1142
col_name = f"{dim_name}__{gran}" if gran else dim_name
# Use COALESCE to handle NULLs in join condition
# Actually for FULL OUTER JOIN, we need to compare the actual columns
# and handle NULLs with IS NOT DISTINCT FROM or COALESCE-based comparison
join_conditions.append(f"COALESCE({cte_names[0]}.{col_name}, '') = COALESCE({cte_name}.{col_name}, '')")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use NULL-safe equality in preagg joins

The pre-aggregation join condition coalesces every dimension to '' regardless of type. This will either throw (e.g., numeric/date columns can’t coalesce with an empty string in many dialects) or silently coerce values, and it also conflates NULL with empty string for text dimensions, causing incorrect row matching. This breaks multi-model queries whenever a dimension isn’t a string or when NULLs should remain distinct; a NULL-safe equality like IS NOT DISTINCT FROM (or a type-appropriate sentinel) is needed to preserve correctness.

Useful? React with 👍 / 👎.

Comment thread sidemantic/sql/generator.py Outdated
Comment on lines +863 to +867
# Wrap expression in CASE WHEN for filtering
# For count measures, return 1 if condition met, else NULL (not 0!)
# COUNT counts non-NULL values, so we need NULL to exclude non-matching rows
if measure.agg == "count":
base_expr = f"CASE WHEN {combined_filter} THEN 1 ELSE NULL END"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve COUNT(column) semantics under filters

For filtered COUNT measures that specify a SQL column (e.g., count with sql: user_id to count non‑NULLs), the new CASE expression returns 1 when the filter matches. This changes semantics from “count non‑NULL column values” to “count all rows that match the filter,” even when the column is NULL. This regression appears only for COUNT metrics with a SQL expression and filters, but it will overcount in datasets where the counted column can be NULL; use the column expression inside the CASE (THEN {base_expr}) to retain COUNT(column) behavior.

Useful? React with 👍 / 👎.

- Resolved merge conflicts in generator.py keeping both upstream improvements
  (visited tracking for recursion, replace_model_placeholder) and local fixes
  (CASE WHEN for filtered measures, pre-aggregation for symmetric aggregation)
- Added support for ratio type metrics in dependency collection
- Fixed pre-aggregation to handle no dimensions (use CROSS JOIN)
- Fixed metric name collision handling in pre-aggregation output
@nicosuave nicosuave merged commit 03ca6e1 into main Jan 3, 2026
10 checks passed
@nicosuave nicosuave deleted the nicosuave/cube-kitchen-sink-test branch January 3, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant