Carry CalciteFieldFormatCommandIT through the helper-managed index path by ahkcs · Pull Request #5417 · opensearch-project/sql

ahkcs · 2026-05-07T18:27:39Z

Summary

Same shape as #5407 for CalciteEvalCommandIT. The IT's init() previously created the in-test test_eval index via direct PUT /test_eval/_doc/N requests, relying on dynamic mapping. Two problems:

The doc PUTs auto-create the index with whatever settings the cluster defaults to. The analytics-engine compatibility path (force-routing on; tests.analytics.parquet_indices=true) needs parquet-backed indices, which TestUtils.createIndexByRestClient applies via TestUtils.makeParquetBacked when the system property is set. Direct PUTs sidestep that helper, so test_eval lands as Lucene-backed and the analytics planner rejects it with No backend can scan all requested fields on index [test_eval]. All four otherwise-working tests fail at execution.
init() runs before every @Test method. The doc PUTs are doc-level idempotent, so re-running was wasteful but not failing. Once we switch to createIndexByRestClient, the index-level PUT is no longer idempotent and re-running throws resource_already_exists_exception.

Both addressed in one change:

test_eval is created via TestUtils.createIndexByRestClient with an explicit mapping (name/title=keyword, age=long). The helper honours tests.analytics.parquet_indices=true and produces a parquet-backed index for the analytics-engine sweep; on the v2 path the helper is a no-op around the index PUT, so behaviour is unchanged.
The whole init body is guarded by TestUtils.isIndexExist — same idempotency idiom that loadIndex uses for predefined fixtures. First @Test method provisions; subsequent methods skip.

Also pins the projection order on testFieldFormatStringConcatenation. The original query had no | fields clause and relied on the implicit projection's column order — v2 returns Lucene-source insertion order, analytics returns parquet-storage order (alphabetical), so the assertion only matched on v2 by coincidence. Adding | fields name, title, age, greeting makes the assertion deterministic across paths; the existing expected rows already match this order, so v2 behaviour is preserved. The other four tests already have explicit | fields ... clauses.

No semantic change for the v2 path: the explicit mapping types (keyword, long, keyword) resolve to the same PPL types ("string", "bigint", "string") that dynamic mapping inferred, and fieldformat reads from _source either way.

Pass rate

CalciteFieldFormatCommandIT against a runTask cluster with analytics-engine + opensearch-sql-plugin installed, invoked via :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT':

Test	Before	After
`testFieldFormatStringConcatenation`	❌ `No backend can scan all requested fields on index [test_eval]`	✅
`testFieldFormatStringConcatenationWithNullField`	❌ same	✅
`testFieldFormatStringConcatWithSuffix`	❌ same	✅
`testFieldFormatStringConcatWithPrefixSuffix`	❌ same	✅
`testFieldFormatStringConcatenationWithNullFieldToString`	❌ same	❌ `No backend supports scalar function [TOSTRING] among [datafusion]`
Total (analytics path)	0 / 5	4 / 5
Total (v2 path)	5 / 5	5 / 5 (no regression)

The remaining tostring()-using test is blocked on a multi-mode UDF (binary / hex / commas / duration / duration_millis) that the analytics path doesn't yet wire — tracked as out of scope here, plausible follow-up estimated at ~1 day for a native Rust UDF + Substrait extension + Java adapter.

Test plan

./gradlew :integ-test:integTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' — 5 / 5 green (v2 path, no regression).
./gradlew :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' against a runTask cluster — 4 / 5 pass (was 0 / 5 before this PR).

Carry CalciteEvalCommandIT through the helper-managed index path #5407 (the equivalent fix for CalciteEvalCommandIT).
The analytics-route wiring for fieldformat itself (no code change needed — fieldformat lowers to Eval with CONCAT + CAST, both already wired) is QA-pinned in [QA] Add FieldFormatCommandIT for the analytics-engine REST path OpenSearch#21544.

Note on base

This PR targets feature/mustang-ppl-integration rather than main so it lands alongside the rest of the analytics-engine compatibility scaffolding (analyticsCompatibilityTest task, tests.analytics.parquet_indices propagation, loadIndex parquet-aware variant) that the helper-managed pattern relies on. The change is purely additive over the mustang feature branch.

github-actions · 2026-05-07T18:29:20Z

PR Reviewer Guide 🔍

(Review updated until commit `96fec3e`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Null Age Handling Document 3 sets `"age": null` explicitly. With the explicit mapping `age` as `long`, inserting a null value may behave differently across Lucene and parquet-backed indices. Verify that the analytics-engine path handles null long values correctly and that test assertions account for this null value in result rows. request3.setJsonEntity("{\"name\": \"Charlie\", \"age\": null, \"title\": \"Analyst\"}"); client().performRequest(request3); Missing Index Cleanup The `test_eval` index is created once and guarded by `isIndexExist`, but there is no corresponding teardown (e.g., `@AfterAll` or `deleteIndex`) to remove it after the test class finishes. This could leave stale state that affects other test classes or test suite reruns if the index persists between runs. if (!TestUtils.isIndexExist(client(), "test_eval")) { String testEvalMapping = "{\"mappings\":{\"properties\":{" + "\"name\":{\"type\":\"keyword\"}," + "\"age\":{\"type\":\"long\"}," + "\"title\":{\"type\":\"keyword\"}}}}"; TestUtils.createIndexByRestClient(client(), "test_eval", testEvalMapping); // Create test data for string concatenation Request request1 = new Request("PUT", "/test_eval/_doc/1?refresh=true"); request1.setJsonEntity("{\"name\": \"Alice\", \"age\": 25, \"title\": \"Engineer\"}"); client().performRequest(request1); Request request2 = new Request("PUT", "/test_eval/_doc/2?refresh=true"); request2.setJsonEntity("{\"name\": \"Bob\", \"age\": 30, \"title\": \"Manager\"}"); client().performRequest(request2); Request request3 = new Request("PUT", "/test_eval/_doc/3?refresh=true"); request3.setJsonEntity("{\"name\": \"Charlie\", \"age\": null, \"title\": \"Analyst\"}"); client().performRequest(request3); }

github-actions · 2026-05-07T18:29:38Z

PR Code Suggestions ✨

Latest suggestions up to 96fec3e
Explore these optional code suggestions:

Category	Suggestion	Impact
General	Ensure all documents are visible before tests The documents are inserted without a `?refresh=true` wait after the last document, but more critically, there is no refresh call after all three documents are indexed to ensure they are all visible before tests run. Since each request uses `refresh=true` individually, the last document's refresh is fine, but if the index creation itself is asynchronous, tests may run before all documents are searchable. Consider adding a dedicated refresh request (e.g., `POST /test_eval/_refresh`) after all three documents are inserted to guarantee consistency. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteFieldFormatCommandIT.java [34-54] if (!TestUtils.isIndexExist(client(), "test_eval")) { String testEvalMapping = "{\"mappings\":{\"properties\":{" + "\"name\":{\"type\":\"keyword\"}," + "\"age\":{\"type\":\"long\"}," + "\"title\":{\"type\":\"keyword\"}}}}"; TestUtils.createIndexByRestClient(client(), "test_eval", testEvalMapping); // Create test data for string concatenation Request request1 = new Request("PUT", "/test_eval/_doc/1?refresh=true"); request1.setJsonEntity("{\"name\": \"Alice\", \"age\": 25, \"title\": \"Engineer\"}"); client().performRequest(request1); Request request2 = new Request("PUT", "/test_eval/_doc/2?refresh=true"); request2.setJsonEntity("{\"name\": \"Bob\", \"age\": 30, \"title\": \"Manager\"}"); client().performRequest(request2); Request request3 = new Request("PUT", "/test_eval/_doc/3?refresh=true"); request3.setJsonEntity("{\"name\": \"Charlie\", \"age\": null, \"title\": \"Analyst\"}"); client().performRequest(request3); + + // Ensure all documents are visible before tests run + client().performRequest(new Request("POST", "/test_eval/_refresh")); } Suggestion importance[1-10]: 2 __ Why: Each individual `PUT` request already uses `?refresh=true`, which forces a refresh after each document is indexed, making them immediately searchable. Adding an extra `_refresh` call after all three documents are already individually refreshed provides no additional benefit and is redundant.	Low

Previous suggestions

Suggestions up to commit 2f052e0

Category	Suggestion	Impact
Possible issue	Ensure index refresh after all documents indexed The document PUT requests do not use `?refresh=true` in a way that guarantees all three documents are visible before the test queries run, since each refresh only covers that single document. Consider adding a final explicit refresh call (e.g., `client().performRequest(new Request("POST", "/test_eval/_refresh"))`) after all three documents are indexed to ensure full index visibility before tests execute. integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteFieldFormatCommandIT.java [34-54] if (!TestUtils.isIndexExist(client(), "test_eval")) { String testEvalMapping = "{\"mappings\":{\"properties\":{" + "\"name\":{\"type\":\"keyword\"}," + "\"age\":{\"type\":\"long\"}," + "\"title\":{\"type\":\"keyword\"}}}}"; TestUtils.createIndexByRestClient(client(), "test_eval", testEvalMapping); // Create test data for string concatenation - Request request1 = new Request("PUT", "/test_eval/_doc/1?refresh=true"); + Request request1 = new Request("PUT", "/test_eval/_doc/1"); request1.setJsonEntity("{\"name\": \"Alice\", \"age\": 25, \"title\": \"Engineer\"}"); client().performRequest(request1); - Request request2 = new Request("PUT", "/test_eval/_doc/2?refresh=true"); + Request request2 = new Request("PUT", "/test_eval/_doc/2"); request2.setJsonEntity("{\"name\": \"Bob\", \"age\": 30, \"title\": \"Manager\"}"); client().performRequest(request2); - Request request3 = new Request("PUT", "/test_eval/_doc/3?refresh=true"); + Request request3 = new Request("PUT", "/test_eval/_doc/3"); request3.setJsonEntity("{\"name\": \"Charlie\", \"age\": null, \"title\": \"Analyst\"}"); client().performRequest(request3); + + client().performRequest(new Request("POST", "/test_eval/_refresh")); } Suggestion importance[1-10]: 3 __ Why: The suggestion to add a final `_refresh` call after all documents are indexed is technically valid, but each individual PUT request already uses `?refresh=true`, which forces a refresh after each document is indexed. This means all documents are already visible before the next request is made, making the additional refresh call redundant. The `improved_code` removes `?refresh=true` from individual requests and adds a single final refresh, which is a valid alternative approach but not strictly necessary given the existing per-document refresh.	Low

@test

Same shape as opensearch-project#5407 for CalciteEvalCommandIT. The IT's init() previously created the in-test test_eval index via direct `PUT /test_eval/_doc/N` requests, relying on dynamic mapping. Two problems: 1. The doc PUTs auto-create the index with whatever settings the cluster defaults to. The analytics-engine compatibility path (force-routing on; tests.analytics.parquet_indices=true) needs parquet-backed indices, which TestUtils.createIndexByRestClient applies via TestUtils.makeParquetBacked when the system property is set. Direct PUTs sidestep that helper, so test_eval lands as Lucene-backed and the analytics planner rejects it with "No backend can scan all requested fields on index [test_eval]". All four working tests fail at execution. 2. init() runs before every @test method. The doc PUTs are doc-level idempotent, so re-running was wasteful but not failing. Once we switch to createIndexByRestClient, the index-level PUT is no longer idempotent and re-running throws "resource_already_exists_exception". Both addressed in one change: - test_eval is created via TestUtils.createIndexByRestClient with an explicit mapping (name/title=keyword, age=long). The helper honours tests.analytics.parquet_indices=true and produces a parquet-backed index for the analytics-engine sweep; on the v2 path the helper is a no-op around the index PUT, so behaviour is unchanged. - The whole init body is guarded by TestUtils.isIndexExist — same idempotency idiom that loadIndex uses for predefined fixtures. First @test method provisions; subsequent methods skip. Also pins the projection order on testFieldFormatStringConcatenation. The original query (`source=test_eval | fieldformat greeting = 'Hello ' + name`) had no `| fields` clause and relied on the implicit projection's column order — v2 returns Lucene-source insertion order, analytics returns parquet-storage order (alphabetical), so the assertion only matched on v2 by coincidence. Adding `| fields name, title, age, greeting` makes the assertion deterministic across paths; the existing expected rows (`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this order, so v2 behaviour is preserved. The other four tests already had explicit `| fields ...` clauses, so no change there. No semantic change for the v2 path: the explicit mapping types (keyword, long, keyword) resolve to the same PPL types ("string", "bigint", "string") that dynamic mapping inferred, and fieldformat reads from _source either way. Analytics-route compatibility goes from 1/5 to 4/5 (verified locally against a runTask cluster with analytics-engine + opensearch-sql-plugin). The remaining `testFieldFormatStringConcatenationWithNullFieldToString` needs a `tostring()` UDF on the analytics path — a multi-mode UDF (binary / hex / commas / duration) tracked separately as out of scope. Test plan: - ./gradlew :integ-test:integTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 5/5 green (v2 path, no regression). - ./gradlew :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 4/5 pass; the 5th fails on `tostring`'s missing capability registration, which is the documented out-of-scope category. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-07T19:18:00Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 96fec3e.

'Diff too large, requires skip by maintainers after manual review'

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-05-07T19:18:57Z

Persistent review updated to latest commit 96fec3e

@ignore

…ation Brings the catch-up branch up to current upstream/main (4 commits since this PR was opened) and current feature/mustang-ppl-integration (9 commits since this PR was opened), so the PR is mergeable into feature/mustang-ppl-integration without conflicts. Squashed (rather than two real merge commits) for the same DCO reason the original commit was squashed: upstream commits authored by many contributors with inconsistent or missing Signed-off-by trailers would otherwise be brought into this PR's history. Newer main commits absorbed (4): - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) Newer feature commits absorbed (9): - opensearch-project#5403 (analytics-engine optional dependency — major rewiring) - opensearch-project#5407 (Carry CalciteEvalCommandIT through helper-managed index path) - opensearch-project#5413 (Default plugins.calcite.enabled=true on unified path) - opensearch-project#5415, opensearch-project#5416, opensearch-project#5417, opensearch-project#5409, opensearch-project#5400, opensearch-project#5406 (smaller carryovers + bumps) Conflict resolutions (10 from main side, 3 from feature side): api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. PR's RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept HEAD's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking. Same pattern as the original PR resolution; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced a duplicated handleException/getRawErrorCode block; HEAD already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so HEAD is the correct superset. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 TestEventReporterAsListener cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs get a parquet-backed index. Added back PR HEAD's test_eval_agent setup (needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java: took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter. Feature's design supersedes our prior wiring. plugin/.../SQLPlugin.java: took feature. The same opensearch-project#5403 simplification removed loadExtensions/EngineExtensionsHolder/executionEngineExtensions plumbing (no longer needed once analytics-engine is optionally bound). Feature retains the createSqlAnalyticsRouter method this PR introduced. plugin/.../config/EngineExtensionsHolder.java: deleted. Unreferenced after taking feature's SQLPlugin/TransportPPLQueryAction; not present on feature branch. Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>

ahkcs requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 7, 2026 18:27

ahkcs added the enhancement New feature or request label May 7, 2026

ahkcs mentioned this pull request May 7, 2026

[QA] Add FieldFormatCommandIT for the analytics-engine REST path opensearch-project/OpenSearch#21544

Merged

2 tasks

ahkcs enabled auto-merge (squash) May 7, 2026 19:12

ahkcs disabled auto-merge May 7, 2026 19:13

ahkcs force-pushed the mustang/fieldformat-it-helper-managed-index branch from 2f052e0 to 96fec3e Compare May 7, 2026 19:17

ahkcs changed the base branch from main to feature/mustang-ppl-integration May 7, 2026 19:17

ahkcs merged commit 7da02d5 into opensearch-project:feature/mustang-ppl-integration May 7, 2026
68 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carry CalciteFieldFormatCommandIT through the helper-managed index path#5417

Carry CalciteFieldFormatCommandIT through the helper-managed index path#5417
ahkcs merged 1 commit intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:mustang/fieldformat-it-helper-managed-index

ahkcs commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ahkcs commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pass rate

Test plan

Related

Note on base

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 96fec3e)

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 7, 2026

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ahkcs commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

(Review updated until commit `96fec3e`)

github-actions Bot commented May 7, 2026 •

edited

Loading