Implement symmetric aggregation for multi-model queries by nicosuave · Pull Request #28 · sidequery/sidemantic

nicosuave · 2026-01-03T19:33:20Z

Summary

Fixes fan-out issue when metrics from different join levels are queried together by pre-aggregating each metric separately to the dimension grain before joining. Includes comprehensive kitchen sink tests that found and fixed 6 bugs in the Cube adapter and SQL generator.

Changes

Symmetric aggregation: New _generate_with_preaggregation() method handles multi-model metric queries correctly
Cube adapter fixes: FK inference, derived metrics parsing, filtered measures, complex SQL metrics, one_to_many joins
Kitchen sink tests: 916 lines of comprehensive integration tests covering 44 test cases
All tests pass: 889 passed, 12 pre-existing failures (missing optional dependencies)

Fixes fan-out issue when metrics from different join levels are queried together. Pre-aggregates each metric separately to the dimension grain before joining. Fixes 6 bugs in Cube adapter through comprehensive kitchen sink tests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41fd438d32

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-03T19:35:33Z

+                col_name = f"{dim_name}__{gran}" if gran else dim_name
+                # Use COALESCE to handle NULLs in join condition
+                # Actually for FULL OUTER JOIN, we need to compare the actual columns
+                # and handle NULLs with IS NOT DISTINCT FROM or COALESCE-based comparison
+                join_conditions.append(f"COALESCE({cte_names[0]}.{col_name}, '') = COALESCE({cte_name}.{col_name}, '')")


Use NULL-safe equality in preagg joins

The pre-aggregation join condition coalesces every dimension to '' regardless of type. This will either throw (e.g., numeric/date columns can’t coalesce with an empty string in many dialects) or silently coerce values, and it also conflates NULL with empty string for text dimensions, causing incorrect row matching. This breaks multi-model queries whenever a dimension isn’t a string or when NULLs should remain distinct; a NULL-safe equality like IS NOT DISTINCT FROM (or a type-appropriate sentinel) is needed to preserve correctness.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-03T19:35:33Z

+                    # Wrap expression in CASE WHEN for filtering
+                    # For count measures, return 1 if condition met, else NULL (not 0!)
+                    # COUNT counts non-NULL values, so we need NULL to exclude non-matching rows
+                    if measure.agg == "count":
+                        base_expr = f"CASE WHEN {combined_filter} THEN 1 ELSE NULL END"


Preserve COUNT(column) semantics under filters

For filtered COUNT measures that specify a SQL column (e.g., count with sql: user_id to count non‑NULLs), the new CASE expression returns 1 when the filter matches. This changes semantics from “count non‑NULL column values” to “count all rows that match the filter,” even when the column is NULL. This regression appears only for COUNT metrics with a SQL expression and filters, but it will overcount in datasets where the counted column can be NULL; use the column expression inside the CASE (THEN {base_expr}) to retain COUNT(column) behavior.

Useful? React with 👍 / 👎.

- Resolved merge conflicts in generator.py keeping both upstream improvements (visited tracking for recursion, replace_model_placeholder) and local fixes (CASE WHEN for filtered measures, pre-aggregation for symmetric aggregation) - Added support for ratio type metrics in dependency collection - Fixed pre-aggregation to handle no dimensions (use CROSS JOIN) - Fixed metric name collision handling in pre-aggregation output

chatgpt-codex-connector Bot reviewed Jan 3, 2026

View reviewed changes

nicosuave merged commit 03ca6e1 into main Jan 3, 2026
10 checks passed

nicosuave deleted the nicosuave/cube-kitchen-sink-test branch January 3, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement symmetric aggregation for multi-model queries#28

Implement symmetric aggregation for multi-model queries#28
nicosuave merged 2 commits intomainfrom
nicosuave/cube-kitchen-sink-test

nicosuave commented Jan 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jan 3, 2026

Uh oh!

chatgpt-codex-connector Bot Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicosuave commented Jan 3, 2026

Summary

Changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant