Fix CalcitePPLAggregationIT on the analytics-engine route (parquet testSimpleCount0 + APPROX_COUNT_DISTINCT name)#5525
Merged
ahkcs merged 2 commits intoJun 8, 2026
Conversation
Contributor
PR Reviewer Guide 🔍(Review updated until commit 2a8ca85)Here are some key observations to aid the review process:
|
Contributor
|
Persistent review updated to latest commit ce2a02a |
sandeshkr419
approved these changes
Jun 8, 2026
A bare auto-created index isn't composite/parquet-backed, so on the analytics-engine route it doesn't route to the analytics engine. Switch to TEST_INDEX_BANK (loaded via loadIndex, which injects parquet settings when the flag is set, 7 docs) so the test is meaningful on both routes. Diagnosis by Sandesh Kumar. Signed-off-by: Kai Huang <ahkcs@amazon.com>
distinct_count_approx() failed to bind on the analytics-engine (DataFusion) route because the SqlAggFunction was named DISTINCT_COUNT_APPROX; the backend resolves aggregates by the Calcite/Substrait-standard name APPROX_COUNT_DISTINCT. The Java field name and PPL function name are unchanged. The OpenSearch V3 path is unaffected (it overrides this via the external HLL registration). Analytics-route binding is completed by opensearch-project/OpenSearch#22013. Per Sandesh Kumar. Signed-off-by: Kai Huang <ahkcs@amazon.com>
ce2a02a to
2a8ca85
Compare
Contributor
|
Persistent review updated to latest commit 2a8ca85 |
Swiddis
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Two small fixes that make
CalcitePPLAggregationITpass on the analytics-engine route (-Dtests.analytics.parquet_indices=true). Both are no-ops on the v2 / Calcite path.1.
testSimpleCount0— use a parquet-backed index. The test built an ad-hoctestindex with raw_docPUTs; a bare auto-created index isn't composite/parquet-backed, soRestUnifiedQueryAction.isAnalyticsIndexdoesn't route it to the analytics engine. Switched toTEST_INDEX_BANK(loaded ininit()vialoadIndex, which injects the parquet settings when the flag is set, 7 docs), and dropped the now-unusedRequestimport. Verified passing on both routes.2.
distinct_count_approx— emit the Substrait-standard operator name.PPLBuiltinOperators.DISTINCT_COUNT_APPROXcreated itsSqlAggFunctionwith the runtime-resolution name"DISTINCT_COUNT_APPROX". The analytics-engine (DataFusion) backend resolves aggregates by the Calcite/Substrait-standard nameAPPROX_COUNT_DISTINCT, sodistinct_count_approx()failed to bind on the analytics route. EmitAPPROX_COUNT_DISTINCTinstead (the Java field name and PPL function name are unchanged).The OpenSearch V3 / Lucene path is unaffected: it overrides this operator via the external HyperLogLog registration in
OpenSearchExecutionEngine(whose name is unchanged), so explain output and execution on that path are byte-identical — verified by the unchangedexplain_*distinct_count*/explain_*dc*expected-output files and theCalciteExplainITdistinct-count tests still passing. The analytics-route binding (APPROX_COUNT_DISTINCT→ DataFusionapprox_distinct) is completed by opensearch-project/OpenSearch#22013.Testing
testSimpleCount0CalciteExplainITdistinct-count/dc explain (5)testCountDistinctApprox/…WithAlias¹ The operator rename is verified to be a no-op on v2 (external HLL registration takes precedence). The analytics-route pass for
distinct_count_approxis completed by OpenSearch#22013 (already merged onmain); local verification of that leg requires an analytics cluster built on a base that includes #22013.Diagnosis / analytics-engine side by Sandesh Kumar.
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.