Skip to content

Register LENGTH, REGEXP_REPLACE, DATE_TRUNC in unified function spec#5419

Merged
dai-chen merged 2 commits intoopensearch-project:mainfrom
dai-chen:feature/register-scalar-functions
May 7, 2026
Merged

Register LENGTH, REGEXP_REPLACE, DATE_TRUNC in unified function spec#5419
dai-chen merged 2 commits intoopensearch-project:mainfrom
dai-chen:feature/register-scalar-functions

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 7, 2026

Description

Register LENGTH, REGEXP_REPLACE, and DATE_TRUNC in the unified function spec, unblocking ClickBench q28, q29, and q43 at the unified SQL parsing/validation layer. Additionally, a preCompilationRules() extension point enables late-binding function implementations (e.g., DATE_TRUNC → FLOOR) applied only at compilation time, keeping the logical plan clean for external consumers.

Note: These functions are safe to register: LENGTH and REGEXP_REPLACE have consistent signatures across all execution paths (SparkSQL, DataFusion, SQL V2, PPL), and DATE_TRUNC aligns with SparkSQL and DataFusion semantics with no conflicting definition in existing engines. See the criteria in #5346 (comment).

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dai-chen added 2 commits May 7, 2026 12:42
Add FunctionSpecBuilder DSL with three construction paths: delegateTo()
for existing Calcite operators, vararg() for pushdown-only UDFs, and
operands() for typed functions with optional late-binding impl.

Register LENGTH, REGEXP_REPLACE, and DATE_TRUNC in UnifiedFunctionSpec
LIBRARY category. Contribute via CoreExtension registered in
UnifiedSqlSpec.extended().

This unblocks ClickBench q28 (LENGTH), q29 (REGEXP_REPLACE), and q43
(DATE_TRUNC) at the SQL Plugin parsing/validation layer.

Signed-off-by: Chen Dai <daichen@amazon.com>
Add preCompilationRules() extension point to LanguageSpec that allows
extensions to transform the logical plan before in-memory execution
only. The plan remains clean for external consumers (Analytics Engine).

CoreExtension registers FunctionImplBindingRule which fetches impl
bindings from UnifiedFunctionSpec and rewrites custom function calls
into executable Calcite expressions at compilation time.

DATE_TRUNC now has an impl that rewrites to FLOOR(ts TO unit), making
it executable in-memory while preserving DATE_TRUNC in the logical plan
for the Analytics Engine path.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this May 7, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The visit method creates a new RexBuilder from node.getCluster().getRexBuilder() but then applies the shuttle to visited instead of node. If the shuttle modifies the node structure, the RexBuilder may reference stale cluster metadata. This can cause inconsistencies when the rewritten expressions reference types or operators from a different cluster state.

public RelNode visit(RelNode node) {
  RelNode visited = super.visit(node);
  RexBuilder rexBuilder = node.getCluster().getRexBuilder();
  return visited.accept(
      new RexShuttle() {
        @Override
        public RexNode visitCall(RexCall call) {
          RexCall visited = (RexCall) super.visitCall(call);
          return Optional.ofNullable(bindings.get(visited.getOperator()))
              .map(impl -> impl.apply(rexBuilder, visited))
              .orElse(visited);
        }
      });
Possible Issue

The impl lambda for date_trunc casts call.operands.get(0) to RexLiteral without null or type checking. If the first operand is not a literal (e.g., a column reference or expression), this throws ClassCastException at compilation time. The function should either validate the operand type or handle non-literal cases gracefully.

RexLiteral unitLiteral = (RexLiteral) call.operands.get(0);
String unit = unitLiteral.getValueAs(String.class);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle potential duplicate operator keys

The Collectors.toMap will throw IllegalStateException if duplicate keys (operators)
exist. Since multiple function specs could theoretically share the same operator,
use toMap with a merge function to handle collisions explicitly, or validate
uniqueness upfront.

api/src/main/java/org/opensearch/sql/api/spec/core/LateBindingFunctionRule.java [28-32]

 private final Map<SqlOperator, BiFunction<RexBuilder, RexCall, RexNode>> bindings =
     UnifiedFunctionSpec.ALL_SPECS.values().stream()
         .filter(spec -> spec.getImpl() != null)
         .collect(
-            Collectors.toMap(UnifiedFunctionSpec::getOperator, UnifiedFunctionSpec::getImpl));
+            Collectors.toMap(
+                UnifiedFunctionSpec::getOperator,
+                UnifiedFunctionSpec::getImpl,
+                (existing, replacement) -> existing));
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that Collectors.toMap throws IllegalStateException on duplicate keys. Adding a merge function prevents runtime exceptions if multiple UnifiedFunctionSpec instances share the same SqlOperator. This is a valid defensive programming practice that improves robustness.

Medium

@dai-chen dai-chen merged commit 0ff1eec into opensearch-project:main May 7, 2026
46 of 48 checks passed
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
Bring the analytics-engine catch-up PR up to current upstream/main by
resolving conflicts introduced by 4 main commits since 2026-04-30:

- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)

Resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. PR's RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept HEAD's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking. Same pattern as the original PR
resolution; both improvements are orthogonal.

plugin/SQLPlugin.java: kept HEAD imports for ExplainResponse and
QueryType (referenced by createSqlAnalyticsRouter, which only exists
on the feature branch).

plugin/transport/TransportPPLQueryAction.java: kept HEAD's
queryPlanExecutor parameter alongside main's extensionsHolder
parameter, since both are referenced in the constructor body.

legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced
a duplicated handleException/getRawErrorCode block; HEAD already
contained both the delegateToV2Engine refactor and the ErrorReport
unwrap from main, so HEAD is the correct superset.

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava and
unit tests pass; spotlessCheck passes.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
…ation

Brings the catch-up branch up to current upstream/main (4 commits since
this PR was opened) and current feature/mustang-ppl-integration (9
commits since this PR was opened), so the PR is mergeable into
feature/mustang-ppl-integration without conflicts.

Squashed (rather than two real merge commits) for the same DCO reason
the original commit was squashed: upstream commits authored by many
contributors with inconsistent or missing Signed-off-by trailers would
otherwise be brought into this PR's history.

Newer main commits absorbed (4):
- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)

Newer feature commits absorbed (9):
- opensearch-project#5403 (analytics-engine optional dependency — major rewiring)
- opensearch-project#5407 (Carry CalciteEvalCommandIT through helper-managed index path)
- opensearch-project#5413 (Default plugins.calcite.enabled=true on unified path)
- opensearch-project#5415, opensearch-project#5416, opensearch-project#5417, opensearch-project#5409, opensearch-project#5400, opensearch-project#5406 (smaller carryovers + bumps)

Conflict resolutions (10 from main side, 3 from feature side):

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. PR's RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept HEAD's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking. Same pattern as the original PR
resolution; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced
a duplicated handleException/getRawErrorCode block; HEAD already
contained both the delegateToV2Engine refactor and the ErrorReport
unwrap from main, so HEAD is the correct superset.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 TestEventReporterAsListener
cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs
get a parquet-backed index. Added back PR HEAD's test_eval_agent setup
(needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own
isIndexExist guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java: took feature. PR opensearch-project#5403 made
analytics-engine an optional dependency by moving QueryPlanExecutor
from a required constructor parameter to an @Inject(optional=true)
setter. Feature's design supersedes our prior wiring.

plugin/.../SQLPlugin.java: took feature. The same opensearch-project#5403 simplification
removed loadExtensions/EngineExtensionsHolder/executionEngineExtensions
plumbing (no longer needed once analytics-engine is optionally bound).
Feature retains the createSqlAnalyticsRouter method this PR introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted. Unreferenced
after taking feature's SQLPlugin/TransportPPLQueryAction; not present
on feature branch.

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
Single squashed commit on top of feature/mustang-ppl-integration that
absorbs upstream/main's commits not yet on the feature branch. Replaces
the prior catch-up squash (opensearch-project#5396 base + the original af831d3 rebase
commit) so this PR is a fast-forward into feature/mustang-ppl-integration.

Squashed (rather than a merge commit) because upstream main commits
were authored by many contributors with inconsistent or missing
Signed-off-by trailers; DCO would otherwise reject those commits.

Main commits absorbed (54 since divergence; 4 since the original
catch-up squash was made on 2026-04-30):
- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)
- opensearch-project#5394 (SQL Vector Search), opensearch-project#5361 (OpenSearch 3.7), opensearch-project#5360 (unified SQL
  language spec), opensearch-project#5240 (PPL Union), and 46 others.

Conflict resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept feature's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took feature. The 3-way merge
produced a duplicated handleException/getRawErrorCode block; feature
already contained both the delegateToV2Engine refactor and the
ErrorReport unwrap from main, so feature is the correct superset.

CLAUDE.md, docs/user/ppl/functions/condition.md: took main.

explain_streamstats_global{,_null_bucket}.yaml: took main (post-opensearch-project#5359
shape).

core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation
utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into
PlanUtils.findInputCollation).

integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT.

ppl/antlr/OpenSearchPPLParser.g4: added unionCommand.

ppl/calcite/CalcitePPLStreamstatsTest.java: added
testMultipleStreamstatsWithWindow.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs
get a parquet-backed index. Added the test_eval_agent setup (needed by
the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist
guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java:
took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by
moving QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter, and removed the loadExtensions /
EngineExtensionsHolder / executionEngineExtensions plumbing. Feature
retains the createSqlAnalyticsRouter method this catch-up introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced
post-opensearch-project#5403; not present on feature).

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit that referenced this pull request May 8, 2026
…5397)

Single squashed commit on top of feature/mustang-ppl-integration that
absorbs upstream/main's commits not yet on the feature branch. Replaces
the prior catch-up squash (#5396 base + the original af831d3 rebase
commit) so this PR is a fast-forward into feature/mustang-ppl-integration.

Squashed (rather than a merge commit) because upstream main commits
were authored by many contributors with inconsistent or missing
Signed-off-by trailers; DCO would otherwise reject those commits.

Main commits absorbed (54 since divergence; 4 since the original
catch-up squash was made on 2026-04-30):
- #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- #5408 (datetime type normalization)
- #5414 (Gradle wrapper bump + @ignore exclusion)
- #5399 (FGAC-scoped SQL cursor continuation)
- #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL
  language spec), #5240 (PPL Union), and 46 others.

Conflict resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in #5408 / #5419.

core/executor/QueryService.java: composed both sides — kept feature's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took feature. The 3-way merge
produced a duplicated handleException/getRawErrorCode block; feature
already contained both the delegateToV2Engine refactor and the
ErrorReport unwrap from main, so feature is the correct superset.

CLAUDE.md, docs/user/ppl/functions/condition.md: took main.

explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359
shape).

core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation
utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into
PlanUtils.findInputCollation).

integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT.

ppl/antlr/OpenSearchPPLParser.g4: added unionCommand.

ppl/calcite/CalcitePPLStreamstatsTest.java: added
testMultipleStreamstatsWithWindow.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from #5407) so analytics-engine compatibility runs
get a parquet-backed index. Added the test_eval_agent setup (needed by
the dotted-path eval tests for #5351) wrapped in its own isIndexExist
guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java:
took feature. PR #5403 made analytics-engine an optional dependency by
moving QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter, and removed the loadExtensions /
EngineExtensionsHolder / executionEngineExtensions plumbing. Feature
retains the createSqlAnalyticsRouter method this catch-up introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced
post-#5403; not present on feature).

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants