Register LENGTH, REGEXP_REPLACE, DATE_TRUNC in unified function spec#5419
Conversation
Add FunctionSpecBuilder DSL with three construction paths: delegateTo() for existing Calcite operators, vararg() for pushdown-only UDFs, and operands() for typed functions with optional late-binding impl. Register LENGTH, REGEXP_REPLACE, and DATE_TRUNC in UnifiedFunctionSpec LIBRARY category. Contribute via CoreExtension registered in UnifiedSqlSpec.extended(). This unblocks ClickBench q28 (LENGTH), q29 (REGEXP_REPLACE), and q43 (DATE_TRUNC) at the SQL Plugin parsing/validation layer. Signed-off-by: Chen Dai <daichen@amazon.com>
Add preCompilationRules() extension point to LanguageSpec that allows extensions to transform the logical plan before in-memory execution only. The plan remains clean for external consumers (Analytics Engine). CoreExtension registers FunctionImplBindingRule which fetches impl bindings from UnifiedFunctionSpec and rewrites custom function calls into executable Calcite expressions at compilation time. DATE_TRUNC now has an impl that rewrites to FLOOR(ts TO unit), making it executable in-memory while preserving DATE_TRUNC in the logical plan for the Analytics Engine path. Signed-off-by: Chen Dai <daichen@amazon.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Bring the analytics-engine catch-up PR up to current upstream/main by resolving conflicts introduced by 4 main commits since 2026-04-30: - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) Resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. PR's RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept HEAD's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking. Same pattern as the original PR resolution; both improvements are orthogonal. plugin/SQLPlugin.java: kept HEAD imports for ExplainResponse and QueryType (referenced by createSqlAnalyticsRouter, which only exists on the feature branch). plugin/transport/TransportPPLQueryAction.java: kept HEAD's queryPlanExecutor parameter alongside main's extensionsHolder parameter, since both are referenced in the constructor body. legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced a duplicated handleException/getRawErrorCode block; HEAD already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so HEAD is the correct superset. Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava and unit tests pass; spotlessCheck passes. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ation Brings the catch-up branch up to current upstream/main (4 commits since this PR was opened) and current feature/mustang-ppl-integration (9 commits since this PR was opened), so the PR is mergeable into feature/mustang-ppl-integration without conflicts. Squashed (rather than two real merge commits) for the same DCO reason the original commit was squashed: upstream commits authored by many contributors with inconsistent or missing Signed-off-by trailers would otherwise be brought into this PR's history. Newer main commits absorbed (4): - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) Newer feature commits absorbed (9): - opensearch-project#5403 (analytics-engine optional dependency — major rewiring) - opensearch-project#5407 (Carry CalciteEvalCommandIT through helper-managed index path) - opensearch-project#5413 (Default plugins.calcite.enabled=true on unified path) - opensearch-project#5415, opensearch-project#5416, opensearch-project#5417, opensearch-project#5409, opensearch-project#5400, opensearch-project#5406 (smaller carryovers + bumps) Conflict resolutions (10 from main side, 3 from feature side): api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. PR's RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept HEAD's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking. Same pattern as the original PR resolution; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced a duplicated handleException/getRawErrorCode block; HEAD already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so HEAD is the correct superset. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 TestEventReporterAsListener cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs get a parquet-backed index. Added back PR HEAD's test_eval_agent setup (needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java: took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter. Feature's design supersedes our prior wiring. plugin/.../SQLPlugin.java: took feature. The same opensearch-project#5403 simplification removed loadExtensions/EngineExtensionsHolder/executionEngineExtensions plumbing (no longer needed once analytics-engine is optionally bound). Feature retains the createSqlAnalyticsRouter method this PR introduced. plugin/.../config/EngineExtensionsHolder.java: deleted. Unreferenced after taking feature's SQLPlugin/TransportPPLQueryAction; not present on feature branch. Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Single squashed commit on top of feature/mustang-ppl-integration that absorbs upstream/main's commits not yet on the feature branch. Replaces the prior catch-up squash (opensearch-project#5396 base + the original af831d3 rebase commit) so this PR is a fast-forward into feature/mustang-ppl-integration. Squashed (rather than a merge commit) because upstream main commits were authored by many contributors with inconsistent or missing Signed-off-by trailers; DCO would otherwise reject those commits. Main commits absorbed (54 since divergence; 4 since the original catch-up squash was made on 2026-04-30): - opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - opensearch-project#5408 (datetime type normalization) - opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion) - opensearch-project#5399 (FGAC-scoped SQL cursor continuation) - opensearch-project#5394 (SQL Vector Search), opensearch-project#5361 (OpenSearch 3.7), opensearch-project#5360 (unified SQL language spec), opensearch-project#5240 (PPL Union), and 46 others. Conflict resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in opensearch-project#5408 / opensearch-project#5419. core/executor/QueryService.java: composed both sides — kept feature's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took feature. The 3-way merge produced a duplicated handleException/getRawErrorCode block; feature already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so feature is the correct superset. CLAUDE.md, docs/user/ppl/functions/condition.md: took main. explain_streamstats_global{,_null_bucket}.yaml: took main (post-opensearch-project#5359 shape). core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into PlanUtils.findInputCollation). integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT. ppl/antlr/OpenSearchPPLParser.g4: added unionCommand. ppl/calcite/CalcitePPLStreamstatsTest.java: added testMultipleStreamstatsWithWindow. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs get a parquet-backed index. Added the test_eval_agent setup (needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java: took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter, and removed the loadExtensions / EngineExtensionsHolder / executionEngineExtensions plumbing. Feature retains the createSqlAnalyticsRouter method this catch-up introduced. plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced post-opensearch-project#5403; not present on feature). Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…5397) Single squashed commit on top of feature/mustang-ppl-integration that absorbs upstream/main's commits not yet on the feature branch. Replaces the prior catch-up squash (#5396 base + the original af831d3 rebase commit) so this PR is a fast-forward into feature/mustang-ppl-integration. Squashed (rather than a merge commit) because upstream main commits were authored by many contributors with inconsistent or missing Signed-off-by trailers; DCO would otherwise reject those commits. Main commits absorbed (54 since divergence; 4 since the original catch-up squash was made on 2026-04-30): - #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - #5408 (datetime type normalization) - #5414 (Gradle wrapper bump + @ignore exclusion) - #5399 (FGAC-scoped SQL cursor continuation) - #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL language spec), #5240 (PPL Union), and 46 others. Conflict resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in #5408 / #5419. core/executor/QueryService.java: composed both sides — kept feature's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took feature. The 3-way merge produced a duplicated handleException/getRawErrorCode block; feature already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so feature is the correct superset. CLAUDE.md, docs/user/ppl/functions/condition.md: took main. explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359 shape). core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into PlanUtils.findInputCollation). integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT. ppl/antlr/OpenSearchPPLParser.g4: added unionCommand. ppl/calcite/CalcitePPLStreamstatsTest.java: added testMultipleStreamstatsWithWindow. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from #5407) so analytics-engine compatibility runs get a parquet-backed index. Added the test_eval_agent setup (needed by the dotted-path eval tests for #5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java: took feature. PR #5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter, and removed the loadExtensions / EngineExtensionsHolder / executionEngineExtensions plumbing. Feature retains the createSqlAnalyticsRouter method this catch-up introduced. plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced post-#5403; not present on feature). Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Description
Register
LENGTH,REGEXP_REPLACE, andDATE_TRUNCin the unified function spec, unblocking ClickBench q28, q29, and q43 at the unified SQL parsing/validation layer. Additionally, apreCompilationRules()extension point enables late-binding function implementations (e.g.,DATE_TRUNC → FLOOR) applied only at compilation time, keeping the logical plan clean for external consumers.Related Issues
Part of #5248
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.