Close gaps from top/rare analytics-engine wiring by ahkcs · Pull Request #5433 · opensearch-project/sql

ahkcs · 2026-05-11T23:31:03Z

Description

Two SQL-plugin–side fixes that close the three out-of-scope failures called out in opensearch-project/OpenSearch#21593 (window-function wiring for top/rare on the analytics-engine route). With that PR plus these fixes, CalciteTopCommandIT and CalciteRareCommandIT go from 8/11 → 11/11 against the force-routed analytics-engine path.

Fix 1 — Forward `PPL_SYNTAX_LEGACY_PREFERRED` through the unified context

RestUnifiedQueryAction.applyClusterOverrides previously only forwarded PPL_REX_MAX_MATCH_LIMIT into the per-request UnifiedQueryContext. As a result, cluster-side updates to plugins.ppl.syntax.legacy.preferred were ignored on the analytics-engine route — PPLQueryParser → AstBuilder → ArgumentFactory reads the legacy-preferred flag from the unified context's settings map, which never received the override. Queries like top age with the cluster setting flipped to false behaved as if usenull=true (legacy default) on the analytics route only.

Refactored the override builder into a small forwardClusterSetting helper and forward both PPL_REX_MAX_MATCH_LIMIT and PPL_SYNTAX_LEGACY_PREFERRED. Future keys are now one-liners.

Unblocks testTopCommandLegacyFalse and testRareCommandLegacyFalse.

Fix 2 — Stable tie-break for `RareTopN` ROW_NUMBER

CalciteRelNodeVisitor.visitRareTopN lowers rare/top to ROW_NUMBER() OVER (PARTITION BY ... ORDER BY count [DESC]). With only the count column in the ORDER BY clause, ties at the same count resolved via the upstream operator's insertion order, which differed between backends (in-process Calcite vs. analytics-engine vs. Lucene pushdown). testRareWithGroup failed on the analytics route because ROW_NUMBER picked NV at count=8 while the test expected AR.

Appended the rare/top field columns as secondary ASC keys so ties resolve alphabetically and deterministically across backends. This matches the existing OpenSearch terms-aggregation pushdown, which already tie-breaks on _key:asc.

RareTopPushdownRule now accepts the new shape: 1 or 2 order keys, where the optional second key must be the rare/top target field in ASC direction. The pushdown's wire payload is unchanged — same OS terms-agg request as before.

Unblocks testRareWithGroup.

Pass-rate impact

Measured on the SQL plugin's :integTestRemote task against an externally-managed cluster running opensearch-project/OpenSearch#21593 (feature/toprare-analytics-verify @ 114e8bf8e3a) with -Dtests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true:

IT	Before (PR #21593 alone)	After (PR #21593 + this PR)
`CalciteTopCommandIT`	5/6	6/6
`CalciteRareCommandIT`	3/5	5/5
Combined	8/11	11/11

Regression sweep (all green)

In-process Calcite (integTestRemote without force_routing): Top + Rare → 11/11 (no regression)
Legacy v2 (RareCommandIT, TopCommandIT): 5/5
:ppl:test :core:test :opensearch:test :api:test: clean
CalciteExplainIT (consumes the 5 updated explain YAML fixtures): clean

Files

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java — Fix 1
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java — Fix 2 (tie-break in lowering)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java — Fix 2 (accept the new 2-key shape)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java — updated 11 expected RelNode / result / SparkSQL strings to reflect deterministic tie-break order
integ-test/src/test/resources/expectedOutput/calcite/explain_{rare,top}_usenull_{true,false}.yaml, explain_nested_agg_top_push.yaml — updated 5 explain fixtures

Issues resolved

N/A — closes the out-of-scope follow-ups documented in opensearch-project/OpenSearch#21593.

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

`RestUnifiedQueryAction.applyClusterOverrides` previously only forwarded `PPL_REX_MAX_MATCH_LIMIT` into the per-request `UnifiedQueryContext`. As a result, cluster-side updates to `plugins.ppl.syntax.legacy.preferred` were ignored on the analytics-engine route: `PPLQueryParser` -> `AstBuilder` -> `ArgumentFactory` read the legacy-preferred flag from the unified context's settings map, which never received the override. This caused queries like `top age` / `rare state` with `usenull` defaulting off to behave as if `usenull=true` on the analytics route. Refactor the override builder into a small helper and forward both `PPL_REX_MAX_MATCH_LIMIT` and `PPL_SYNTAX_LEGACY_PREFERRED`. Future keys can be added with a one-liner. Fixes `CalciteTopCommandIT.testTopCommandLegacyFalse` and `CalciteRareCommandIT.testRareCommandLegacyFalse` against the analytics route (`-Dtests.analytics.force_routing=true`). Signed-off-by: Kai Huang <huangkaics@gmail.com>

… order `CalciteRelNodeVisitor.visitRareTopN` lowers `rare`/`top` to a `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY count [DESC])` window. With only the count column in the ORDER BY clause, ties at the same count resolved via the upstream operator's insertion order, which differed between backends (in-process Calcite vs. analytics-engine vs. Lucene pushdown). On the analytics-engine route, `testRareWithGroup` failed because ROW_NUMBER picked NV at count=8 while the test expected AR. Append the rare/top field columns as secondary ASC keys so ties resolve alphabetically and deterministically across backends. This matches the behavior of the existing OpenSearch terms-aggregation pushdown, which tie-breaks on `_key:asc`. Update `RareTopPushdownRule` to accept the new shape: 1 or 2 order keys, where the (optional) second key must be the rare/top target field in ASC direction. The pushdown's wire payload is unchanged. Update the matching unit-test expectations in `CalcitePPLRareTopNTest` (11 RelNode/result/SparkSQL strings) and 5 explain YAML fixtures. Fixes `CalciteRareCommandIT.testRareWithGroup` against the analytics route (and removes the same class of tie-break flakiness across other rare/top tests). Signed-off-by: Kai Huang <huangkaics@gmail.com>

github-actions · 2026-05-11T23:32:07Z

PR Reviewer Guide 🔍

(Review updated until commit `59c9f7d`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-05-11T23:32:32Z

PR Code Suggestions ✨

Latest suggestions up to 59c9f7d
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Prevent potential NoSuchElementException The code assumes `tieBreakIndices` will always produce exactly one element when mapped to `tieBreakList`, but doesn't validate this before calling `getFirst()`. If `tieBreakIndices` is empty, `tieBreakList.getFirst()` will throw `NoSuchElementException`. Add an explicit size check before accessing the first element. opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java [74-84] if (orderKeys.size() == 2) { RexFieldCollation tieBreakKey = orderKeys.get(1); if (tieBreakKey.getDirection() != RelFieldCollation.Direction.ASCENDING) { return; } List<Integer> tieBreakIndices = PlanUtils.getSelectColumns(List.of(tieBreakKey.getKey())); + if (tieBreakIndices.isEmpty()) { + return; + } List<String> tieBreakList = tieBreakIndices.stream().map(fieldNameList::get).toList(); if (tieBreakList.size() != 1 \|\| !tieBreakList.getFirst().equals(targetName)) { return; } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that `tieBreakIndices` could theoretically be empty, which would cause `tieBreakList.getFirst()` to throw `NoSuchElementException`. Adding an explicit check for `tieBreakIndices.isEmpty()` before proceeding improves defensive programming and prevents potential runtime errors, though the likelihood of this occurring in practice may be low given the surrounding validation logic.	Medium
General	Clarify list capacity calculation The `orderKeys` list is initialized with capacity `tieBreakKeys.size() + 1`, but then `countField` is added first followed by all `tieBreakKeys`. This means the list will contain `tieBreakKeys.size() + 1` elements, matching the initial capacity. However, if `tieBreakKeys` is empty, the capacity is 1 which is correct. Consider verifying that `fieldList` is never null to prevent potential NPE in `rexVisitor.analyze()`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3044-3047] List<RexNode> tieBreakKeys = rexVisitor.analyze(fieldList, context); -List<RexNode> orderKeys = new ArrayList<>(tieBreakKeys.size() + 1); +List<RexNode> orderKeys = new ArrayList<>(1 + tieBreakKeys.size()); orderKeys.add(countField); orderKeys.addAll(tieBreakKeys); Suggestion importance[1-10]: 3 __ Why: The suggestion proposes reordering the capacity calculation from `tieBreakKeys.size() + 1` to `1 + tieBreakKeys.size()`, which is mathematically equivalent and doesn't change behavior. While the suggestion mentions verifying `fieldList` is not null, this is not reflected in the `improved_code`. The change offers minimal value.	Low

Previous suggestions

Suggestions up to commit f6b8b98

Category	Suggestion	Impact
General	Optimize ArrayList initial capacity Pre-allocate the `ArrayList` with the correct initial capacity to avoid potential resizing. The size should be `tieBreakKeys.size() + 1` to accommodate both the count field and all tie-break keys. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3045-3047] List<RexNode> tieBreakKeys = rexVisitor.analyze(fieldList, context); -List<RexNode> orderKeys = new ArrayList<>(tieBreakKeys.size() + 1); +List<RexNode> orderKeys = new ArrayList<>(1 + tieBreakKeys.size()); orderKeys.add(countField); orderKeys.addAll(tieBreakKeys); Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies a minor optimization opportunity by pre-allocating the `ArrayList` with the exact capacity needed. However, the existing code already does this (`new ArrayList<>(tieBreakKeys.size() + 1)`), making the suggested change (`1 + tieBreakKeys.size()`) functionally identical and offering no actual improvement.	Low
General	Add empty list check before access The validation logic for the two-key case should verify that `tieBreakList` is not empty before calling `getFirst()`. Although `tieBreakList.size() != 1` would catch an empty list, explicitly checking prevents potential `NoSuchElementException` if the size check is modified in the future. opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java [74-84] if (orderKeys.size() == 2) { RexFieldCollation tieBreakKey = orderKeys.get(1); if (tieBreakKey.getDirection() != RelFieldCollation.Direction.ASCENDING) { return; } List<Integer> tieBreakIndices = PlanUtils.getSelectColumns(List.of(tieBreakKey.getKey())); List<String> tieBreakList = tieBreakIndices.stream().map(fieldNameList::get).toList(); - if (tieBreakList.size() != 1 \|\| !tieBreakList.getFirst().equals(targetName)) { + if (tieBreakList.isEmpty() \|\| tieBreakList.size() != 1 \|\| !tieBreakList.getFirst().equals(targetName)) { return; } } Suggestion importance[1-10]: 2 __ Why: The suggestion proposes adding an `isEmpty()` check before accessing `tieBreakList.getFirst()`. However, the existing condition `tieBreakList.size() != 1` already handles the empty list case (size 0), making the additional check redundant and offering minimal defensive value.	Low

The stable tie-break added in the previous commit appends the rare/top field columns as secondary ASC keys to the `ROW_NUMBER` `ORDER BY`. ASC ordering uses NULLS LAST by default, so the existing `usenull=true email` examples in `docs/user/ppl/cmd/rare.md` and `docs/user/ppl/cmd/top.md` now emit `null` last instead of first. Update the doctest expected output blocks accordingly. No behavior change for the non-null rows. Signed-off-by: Kai Huang <huangkaics@gmail.com>

github-actions · 2026-05-11T23:45:04Z

Persistent review updated to latest commit 59c9f7d

ahkcs added 2 commits May 11, 2026 16:25

ahkcs requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 11, 2026 23:31

ahkcs changed the title ~~Close out-of-scope gaps from top/rare analytics-engine wiring~~ Close gaps from top/rare analytics-engine wiring May 11, 2026

ahkcs mentioned this pull request May 11, 2026

[Analytics Engine] Wire window-function support for PPL top / rare opensearch-project/OpenSearch#21593

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close gaps from top/rare analytics-engine wiring#5433

Close gaps from top/rare analytics-engine wiring#5433
ahkcs wants to merge 3 commits into
opensearch-project:mainfrom
ahkcs:fix/toprare-out-of-scope

ahkcs commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ahkcs commented May 11, 2026

Description

Fix 1 — Forward PPL_SYNTAX_LEGACY_PREFERRED through the unified context

Fix 2 — Stable tie-break for RareTopN ROW_NUMBER

Pass-rate impact

Regression sweep (all green)

Files

Issues resolved

Check List

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 59c9f7d)

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix 1 — Forward `PPL_SYNTAX_LEGACY_PREFERRED` through the unified context

Fix 2 — Stable tie-break for `RareTopN` ROW_NUMBER

github-actions Bot commented May 11, 2026 •

edited

Loading

(Review updated until commit `59c9f7d`)

github-actions Bot commented May 11, 2026 •

edited

Loading