Carry CalciteFieldFormatCommandIT through the helper-managed index path#5417
Conversation
PR Reviewer Guide 🔍(Review updated until commit 96fec3e)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 96fec3e
Previous suggestionsSuggestions up to commit 2f052e0
|
Same shape as opensearch-project#5407 for CalciteEvalCommandIT. The IT's init() previously created the in-test test_eval index via direct `PUT /test_eval/_doc/N` requests, relying on dynamic mapping. Two problems: 1. The doc PUTs auto-create the index with whatever settings the cluster defaults to. The analytics-engine compatibility path (force-routing on; tests.analytics.parquet_indices=true) needs parquet-backed indices, which TestUtils.createIndexByRestClient applies via TestUtils.makeParquetBacked when the system property is set. Direct PUTs sidestep that helper, so test_eval lands as Lucene-backed and the analytics planner rejects it with "No backend can scan all requested fields on index [test_eval]". All four working tests fail at execution. 2. init() runs before every @test method. The doc PUTs are doc-level idempotent, so re-running was wasteful but not failing. Once we switch to createIndexByRestClient, the index-level PUT is no longer idempotent and re-running throws "resource_already_exists_exception". Both addressed in one change: - test_eval is created via TestUtils.createIndexByRestClient with an explicit mapping (name/title=keyword, age=long). The helper honours tests.analytics.parquet_indices=true and produces a parquet-backed index for the analytics-engine sweep; on the v2 path the helper is a no-op around the index PUT, so behaviour is unchanged. - The whole init body is guarded by TestUtils.isIndexExist — same idempotency idiom that loadIndex uses for predefined fixtures. First @test method provisions; subsequent methods skip. Also pins the projection order on testFieldFormatStringConcatenation. The original query (`source=test_eval | fieldformat greeting = 'Hello ' + name`) had no `| fields` clause and relied on the implicit projection's column order — v2 returns Lucene-source insertion order, analytics returns parquet-storage order (alphabetical), so the assertion only matched on v2 by coincidence. Adding `| fields name, title, age, greeting` makes the assertion deterministic across paths; the existing expected rows (`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this order, so v2 behaviour is preserved. The other four tests already had explicit `| fields ...` clauses, so no change there. No semantic change for the v2 path: the explicit mapping types (keyword, long, keyword) resolve to the same PPL types ("string", "bigint", "string") that dynamic mapping inferred, and fieldformat reads from _source either way. Analytics-route compatibility goes from 1/5 to 4/5 (verified locally against a runTask cluster with analytics-engine + opensearch-sql-plugin). The remaining `testFieldFormatStringConcatenationWithNullFieldToString` needs a `tostring()` UDF on the analytics path — a multi-mode UDF (binary / hex / commas / duration) tracked separately as out of scope. Test plan: - ./gradlew :integ-test:integTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 5/5 green (v2 path, no regression). - ./gradlew :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 4/5 pass; the 5th fails on `tostring`'s missing capability registration, which is the documented out-of-scope category. Signed-off-by: Kai Huang <ahkcs@amazon.com>
2f052e0 to
96fec3e
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 96fec3e. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 96fec3e |
7da02d5
into
opensearch-project:feature/mustang-ppl-integration
…ation Brings the catch-up branch up to current upstream/main (4 commits since this PR was opened) and current feature/mustang-ppl-integration (9 commits since this PR was opened), so the PR is mergeable into feature/mustang-ppl-integration without conflicts. Squashed (rather than two real merge commits) for the same DCO reason the original commit was squashed: upstream commits authored by many contributors with inconsistent or missing Signed-off-by trailers would otherwise be brought into this PR's history. Newer main commits absorbed (4): - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) Newer feature commits absorbed (9): - opensearch-project#5403 (analytics-engine optional dependency — major rewiring) - opensearch-project#5407 (Carry CalciteEvalCommandIT through helper-managed index path) - opensearch-project#5413 (Default plugins.calcite.enabled=true on unified path) - opensearch-project#5415, opensearch-project#5416, opensearch-project#5417, opensearch-project#5409, opensearch-project#5400, opensearch-project#5406 (smaller carryovers + bumps) Conflict resolutions (10 from main side, 3 from feature side): api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. PR's RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept HEAD's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking. Same pattern as the original PR resolution; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced a duplicated handleException/getRawErrorCode block; HEAD already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so HEAD is the correct superset. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 TestEventReporterAsListener cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs get a parquet-backed index. Added back PR HEAD's test_eval_agent setup (needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java: took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter. Feature's design supersedes our prior wiring. plugin/.../SQLPlugin.java: took feature. The same opensearch-project#5403 simplification removed loadExtensions/EngineExtensionsHolder/executionEngineExtensions plumbing (no longer needed once analytics-engine is optionally bound). Feature retains the createSqlAnalyticsRouter method this PR introduced. plugin/.../config/EngineExtensionsHolder.java: deleted. Unreferenced after taking feature's SQLPlugin/TransportPPLQueryAction; not present on feature branch. Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Summary
Same shape as #5407 for
CalciteEvalCommandIT. The IT'sinit()previously created the in-testtest_evalindex via directPUT /test_eval/_doc/Nrequests, relying on dynamic mapping. Two problems:tests.analytics.parquet_indices=true) needs parquet-backed indices, whichTestUtils.createIndexByRestClientapplies viaTestUtils.makeParquetBackedwhen the system property is set. Direct PUTs sidestep that helper, sotest_evallands as Lucene-backed and the analytics planner rejects it withNo backend can scan all requested fields on index [test_eval]. All four otherwise-working tests fail at execution.init()runs before every@Testmethod. The doc PUTs are doc-level idempotent, so re-running was wasteful but not failing. Once we switch tocreateIndexByRestClient, the index-level PUT is no longer idempotent and re-running throwsresource_already_exists_exception.Both addressed in one change:
test_evalis created viaTestUtils.createIndexByRestClientwith an explicit mapping (name/title=keyword,age=long). The helper honourstests.analytics.parquet_indices=trueand produces a parquet-backed index for the analytics-engine sweep; on the v2 path the helper is a no-op around the index PUT, so behaviour is unchanged.initbody is guarded byTestUtils.isIndexExist— same idempotency idiom thatloadIndexuses for predefined fixtures. First@Testmethod provisions; subsequent methods skip.Also pins the projection order on
testFieldFormatStringConcatenation. The original query had no| fieldsclause and relied on the implicit projection's column order — v2 returns Lucene-source insertion order, analytics returns parquet-storage order (alphabetical), so the assertion only matched on v2 by coincidence. Adding| fields name, title, age, greetingmakes the assertion deterministic across paths; the existing expected rows already match this order, so v2 behaviour is preserved. The other four tests already have explicit| fields ...clauses.No semantic change for the v2 path: the explicit mapping types (keyword, long, keyword) resolve to the same PPL types (
"string","bigint","string") that dynamic mapping inferred, andfieldformatreads from_sourceeither way.Pass rate
CalciteFieldFormatCommandITagainst a runTask cluster withanalytics-engine+opensearch-sql-plugininstalled, invoked via:integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT':testFieldFormatStringConcatenationNo backend can scan all requested fields on index [test_eval]testFieldFormatStringConcatenationWithNullFieldtestFieldFormatStringConcatWithSuffixtestFieldFormatStringConcatWithPrefixSuffixtestFieldFormatStringConcatenationWithNullFieldToStringNo backend supports scalar function [TOSTRING] among [datafusion]The remaining
tostring()-using test is blocked on a multi-mode UDF (binary/hex/commas/duration/duration_millis) that the analytics path doesn't yet wire — tracked as out of scope here, plausible follow-up estimated at ~1 day for a native Rust UDF + Substrait extension + Java adapter.Test plan
./gradlew :integ-test:integTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT'— 5 / 5 green (v2 path, no regression)../gradlew :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT'against a runTask cluster — 4 / 5 pass (was 0 / 5 before this PR).Related
CalciteEvalCommandIT).fieldformatitself (no code change needed —fieldformatlowers toEvalwithCONCAT+CAST, both already wired) is QA-pinned in [QA] Add FieldFormatCommandIT for the analytics-engine REST path OpenSearch#21544.Note on base
This PR targets
feature/mustang-ppl-integrationrather thanmainso it lands alongside the rest of the analytics-engine compatibility scaffolding (analyticsCompatibilityTesttask,tests.analytics.parquet_indicespropagation,loadIndexparquet-aware variant) that the helper-managed pattern relies on. The change is purely additive over the mustang feature branch.