[Calcite] Make containsNestedAggregator tolerate flat-leaf schemas by mch2 · Pull Request #5423 · opensearch-project/sql

mch2 · 2026-05-08T16:59:10Z

Description

The check splits each aggregator argument on '.' and looks up the
leading token as a top-level ARRAY column (the OpenSearch nested
marker). It used relBuilder.field(root), which throws when the column
doesn't exist.

That works for the classic path, which always exposes a top-level
column for object/nested parents. It breaks the analytics-engine
path, which emits only flat leaves ("city.name", "city.location.latitude")
— parent placeholders can't round-trip through Substrait. Any
aggregation against an object-field leaf (e.g. stats max(city.location.latitude))
crashed with "Field [city] not found".

Use RelDataType.getField, which returns null on miss. A null lookup
is the correct "not nested" answer. Behavior unchanged for relations
that do have a top-level ARRAY column.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The check splits each aggregator argument on '.' and looks up the leading token as a top-level ARRAY column (the OpenSearch `nested` marker). It used relBuilder.field(root), which throws when the column doesn't exist. That works for the classic path, which always exposes a top-level column for object/nested parents. It breaks the analytics-engine path, which emits only flat leaves ("city.name", "city.location.latitude") — parent placeholders can't round-trip through Substrait. Any aggregation against an object-field leaf (e.g. stats max(city.location.latitude)) crashed with "Field [city] not found". Use RelDataType.getField, which returns null on miss. A null lookup is the correct "not nested" answer. Behavior unchanged for relations that do have a top-level ARRAY column. Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

github-actions · 2026-05-08T17:00:41Z

PR Reviewer Guide 🔍

(Review updated until commit `ba80518`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-05-08T17:00:54Z

PR Code Suggestions ✨

Latest suggestions up to ba80518
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add index bounds validation Add bounds checking before accessing the field name by index. If `r.getIndex()` exceeds the size of `getFieldNames()`, this will throw an `IndexOutOfBoundsException`. Validate the index is within bounds before accessing. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1551-1559] return aggCallRefs.stream() + .filter(r -> r.getIndex() < rowType.getFieldNames().size()) .map(r -> rowType.getFieldNames().get(r.getIndex())) .map(name -> org.apache.commons.lang3.StringUtils.substringBefore(name, ".")) .anyMatch( root -> { RelDataTypeField field = rowType.getField(root, /* caseSensitive= / true, / elideRecord= */ false); return field != null && field.getType().getSqlTypeName() == SqlTypeName.ARRAY; }); Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential `IndexOutOfBoundsException` risk when accessing `rowType.getFieldNames().get(r.getIndex())`. Adding bounds checking is a valid defensive programming practice, though the likelihood of this occurring depends on the upstream data integrity guarantees.	Medium

Previous suggestions

Suggestions up to commit 772be11

Category	Suggestion	Impact
Possible issue	Add index bounds validation Add bounds checking before accessing the field name by index. If `r.getIndex()` exceeds the size of `getFieldNames()`, this will throw an `IndexOutOfBoundsException`. Consider validating the index or using a safer access pattern. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [1551-1557] return aggCallRefs.stream() + .filter(r -> r.getIndex() < rowType.getFieldCount()) .map(r -> rowType.getFieldNames().get(r.getIndex())) .map(name -> org.apache.commons.lang3.StringUtils.substringBefore(name, ".")) .anyMatch(root -> { RelDataTypeField field = rowType.getField(root, /caseSensitive=/ true, /elideRecord=/ false); return field != null && field.getType().getSqlTypeName() == SqlTypeName.ARRAY; }); Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential `IndexOutOfBoundsException` risk when accessing `rowType.getFieldNames().get(r.getIndex())`. Adding bounds checking with `filter(r -> r.getIndex() < rowType.getFieldCount())` is a valid defensive programming practice that improves robustness, though the likelihood of this occurring in practice depends on the upstream data integrity.	Medium

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

github-actions · 2026-05-08T17:55:05Z

Persistent review updated to latest commit ba80518

@ignore

Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (opensearch-project#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (opensearch-project#5429) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (opensearch-project#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417); CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416) - plugins.calcite.enabled=true defaulted on the unified query path (opensearch-project#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418) - Calcite tolerance fixes: array() default type (opensearch-project#5421), containsNestedAggregator flat-leaf schemas (opensearch-project#5423) - Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426) ### Feature-branch commits squashed (22) opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413, opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267, opensearch-project#5397, opensearch-project#5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419, opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (opensearch-project#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (opensearch-project#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com>

@ignore

Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (opensearch-project#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (opensearch-project#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417); CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416) - plugins.calcite.enabled=true defaulted on the unified query path (opensearch-project#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418) - Calcite tolerance fixes: array() default type (opensearch-project#5421), containsNestedAggregator flat-leaf schemas (opensearch-project#5423) - Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426) ### Feature-branch commits squashed (22) opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413, opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267, opensearch-project#5397, opensearch-project#5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419, opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (opensearch-project#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (opensearch-project#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

mch2 requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 8, 2026 16:59

fix spotless

ba80518

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

ahkcs approved these changes May 8, 2026

View reviewed changes

penghuo approved these changes May 8, 2026

View reviewed changes

penghuo added the PPL Piped processing language label May 8, 2026

ahkcs merged commit 109375d into opensearch-project:feature/mustang-ppl-integration May 8, 2026
33 of 38 checks passed

This was referenced May 11, 2026

Land analytics-engine PPL integration into main #5431

Closed

Land analytics-engine PPL integration into main #5430

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Calcite] Make containsNestedAggregator tolerate flat-leaf schemas#5423

[Calcite] Make containsNestedAggregator tolerate flat-leaf schemas#5423
ahkcs merged 2 commits into
opensearch-project:feature/mustang-ppl-integrationfrom
mch2:feature/mustang-ppl-integration

mch2 commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mch2 commented May 8, 2026

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit ba80518)

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 8, 2026 •

edited

Loading

(Review updated until commit `ba80518`)

github-actions Bot commented May 8, 2026 •

edited

Loading