Skip to content

[BugFix] Fix SQL aggregate window with ORDER BY defaulting to whole partition#5526

Merged
Swiddis merged 1 commit into
opensearch-project:mainfrom
dai-chen:fix/sql-window-frame-order-by
Jun 8, 2026
Merged

[BugFix] Fix SQL aggregate window with ORDER BY defaulting to whole partition#5526
Swiddis merged 1 commit into
opensearch-project:mainfrom
dai-chen:fix/sql-window-frame-order-by

Conversation

@dai-chen

@dai-chen dai-chen commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Description

In the unified SQL path, an aggregate window function with an ORDER BY but no explicit frame (e.g. COUNT(DISTINCT x) OVER(ORDER BY y)) defaulted to the whole partition, returning the partition-wide total on every row. This PR defaults such aggregate windows to ROWS UNBOUNDED PRECEDING .. CURRENT ROW so each row reflects a running aggregate — PPL window functions carry no ORDER BY and are unaffected.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 9999d08)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The default frame is applied to all aggregate window functions with ORDER BY, including those with an explicit frame already set. If ctx.windowFrameClause() is present, it should be parsed and used instead of defaulting to rangeToCurrentRow(). The current logic unconditionally overwrites any user-specified frame.

UnresolvedExpression function = visit(ctx.function);
WindowFunction windowFunction = new WindowFunction(function, partitionByList, sortList);

// Aggregate window with ORDER BY defaults to a running RANGE frame (ranking ignores it).
if (function instanceof AggregateFunction && !sortList.isEmpty()) {
  windowFunction.setWindowFrame(WindowFrame.rangeToCurrentRow());
}
return windowFunction;

Aggregate window functions with ORDER BY but no explicit frame defaulted
to the whole partition, so COUNT(DISTINCT x) OVER(ORDER BY y) returned the
total for every row instead of a running value. Default SQL aggregates to
RANGE UNBOUNDED PRECEDING .. CURRENT ROW, matching the SQL standard and the
non-Calcite engine's peer semantics (ranking functions ignore frames). PPL
windows carry no ORDER BY, so are unaffected.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the fix/sql-window-frame-order-by branch from 820da3e to 9999d08 Compare June 8, 2026 22:31
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9999d08

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Verify aggregate function frame semantics

The logic assumes all AggregateFunction instances should get a RANGE frame when
ORDER BY is present. However, this may not be correct for all aggregate functions.
Consider verifying if specific aggregate functions (like COUNT DISTINCT) require
different frame semantics or if certain aggregates should be excluded from this
default behavior.

sql/src/main/java/org/opensearch/sql/sql/parser/AstExpressionBuilder.java [225-232]

 UnresolvedExpression function = visit(ctx.function);
 WindowFunction windowFunction = new WindowFunction(function, partitionByList, sortList);
 
 // Aggregate window with ORDER BY defaults to a running RANGE frame (ranking ignores it).
+// Note: Verify if all aggregate functions should use RANGE frame or if exceptions exist
 if (function instanceof AggregateFunction && !sortList.isEmpty()) {
   windowFunction.setWindowFrame(WindowFrame.rangeToCurrentRow());
 }
 return windowFunction;
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about whether all AggregateFunction instances should receive the same RANGE frame treatment. However, it only asks for verification without providing concrete evidence of issues, and the 'improved_code' merely adds a comment rather than implementing a substantive change.

Low

@dai-chen

dai-chen commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Flaky test:

2026-06-08T22:59:24.4535580Z 6836 tests completed, 4 failed, 235 skipped
2026-06-08T22:59:24.4537140Z Tests with failures:
2026-06-08T22:59:24.4640020Z  - org.opensearch.sql.calcite.remote.CalciteExplainIT.testNoMvWithEval
2026-06-08T22:59:24.4871830Z  - org.opensearch.sql.calcite.remote.CalciteExplainIT.testNoMvBasic

@Swiddis Swiddis merged commit f2acb89 into opensearch-project:main Jun 8, 2026
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants