Skip to content

Extend V2 SQL parser with JOIN for unified query path#5446

Merged
dai-chen merged 1 commit into
opensearch-project:mainfrom
dai-chen:feature/sql-v2-join
May 15, 2026
Merged

Extend V2 SQL parser with JOIN for unified query path#5446
dai-chen merged 1 commit into
opensearch-project:mainfrom
dai-chen:feature/sql-v2-join

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 15, 2026

Description

Extend the V2 SQL ANTLR grammar and add ExtendedAstBuilder to support JOIN (INNER/LEFT/RIGHT/CROSS supported by OpenSearch SQL today) in the unified Calcite-based query path. The base AstBuilder continues to throw SyntaxCheckException for these statements, preserving the legacy engine fallback in production.

Notes

  • Subsearch limits defaulted to 0 in unified query context to avoid LogicalSystemLimit noise in logical plans
  • A semantic predicate prevents LEFT/RIGHT from being consumed as implicit table aliases when followed by JOIN.

Planned PRs

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

PR Reviewer Guide 🔍

(Review updated until commit 92004cf)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The ExtendedAstBuilder overrides visitJoinClause but does not call attach() on the returned Join node. In the base AstBuilder, visitFromClause iterates over joinClause contexts and calls attach(result) on each visited join. Without attach(), the join node is not linked to the left relation, breaking the plan tree structure. This will cause incorrect query plans or runtime failures when the planner attempts to traverse the join.

public UnresolvedPlan visitJoinClause(JoinClauseContext ctx) {
  JoinType joinType = toJoinType(ctx);
  UnresolvedPlan right = visit(ctx.relation());
  Optional<UnresolvedExpression> condition =
      Optional.ofNullable(ctx.expression()).map(this::visitAstExpression);
  return join(right, joinType, condition);
}
Semantic Predicate Issue

The semantic predicate isJoinKeyword() checks if the next token is a join keyword to prevent LEFT/RIGHT from being consumed as table aliases. However, the predicate is placed in the implicit alias alternative of the tableAsRelation rule. If the predicate fails (returns false), ANTLR will not backtrack to try other alternatives in the relation rule because semantic predicates are treated as hard constraints. This means queries like 'FROM t1 LEFT JOIN t2' may fail to parse if the predicate logic or token lookahead is incorrect, rather than gracefully trying the joinClause rule.

: tableName (AS alias | {!isJoinKeyword()}? alias)?                 # tableAsRelation

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

PR Code Suggestions ✨

Latest suggestions up to 92004cf

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix join type detection logic

The toJoinType method incorrectly uses ctx.getStart().getType() to determine join
type, which only checks the first token. For RIGHT OUTER JOIN, this returns RIGHT,
but for INNER JOIN or standalone JOIN, it returns JOIN or other tokens, defaulting
to INNER. This fails to distinguish CROSS JOIN (which starts with CROSS) from
implicit INNER JOIN (which starts with JOIN). Use explicit null checks on
ctx.LEFT(), ctx.RIGHT(), ctx.CROSS() parser rule methods instead.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [64-70]

 private JoinType toJoinType(JoinClauseContext ctx) {
-  return switch (ctx.getStart().getType()) {
-    case OpenSearchSQLParser.LEFT -> JoinType.LEFT;
-    case OpenSearchSQLParser.RIGHT -> JoinType.RIGHT;
-    case OpenSearchSQLParser.CROSS -> JoinType.CROSS;
-    default -> JoinType.INNER;
-  };
+  if (ctx.LEFT() != null) return JoinType.LEFT;
+  if (ctx.RIGHT() != null) return JoinType.RIGHT;
+  if (ctx.CROSS() != null) return JoinType.CROSS;
+  return JoinType.INNER;
 }
Suggestion importance[1-10]: 9

__

Why: The current implementation using ctx.getStart().getType() is fundamentally flawed for join type detection. For CROSS JOIN, the first token is CROSS, but for INNER JOIN or standalone JOIN, the first token could be INNER or JOIN, making the switch statement unreliable. The suggested approach using explicit null checks on parser rule methods (ctx.LEFT(), ctx.RIGHT(), ctx.CROSS()) is the correct ANTLR4 pattern and will properly distinguish all join types including CROSS JOIN from implicit INNER JOIN.

High
General
Remove unreliable semantic predicate

The semantic predicate {!isJoinKeyword()}? in the implicit alias alternative creates
ambiguity because ANTLR evaluates predicates during prediction, not parsing. If the
next token is LEFT (a valid identifier), the predicate may fail even when LEFT is
intended as an alias (e.g., FROM t1 LEFT where LEFT is a column/function reference).
Consider restructuring the grammar to handle aliases in a post-parse validation step
or use a more robust disambiguation strategy.

sql/src/main/antlr/OpenSearchSQLParser.g4 [131-135]

 relation
-   // The predicate guarantees only match implicit alias if next token is NOT a join keyword
-   : tableName (AS alias | {!isJoinKeyword()}? alias)?                 # tableAsRelation
+   : tableName (AS alias | alias)?                 # tableAsRelation
    | LR_BRACKET subquery = querySpecification RR_BRACKET AS? alias     # subqueryAsRelation
    | qualifiedName LR_BRACKET tableFunctionArgs RR_BRACKET (AS? alias)? # tableFunctionRelation
    ;
Suggestion importance[1-10]: 3

__

Why: While the concern about semantic predicates is valid in theory, the suggestion to remove {!isJoinKeyword()}? would reintroduce the exact ambiguity problem the predicate was designed to solve (consuming LEFT/RIGHT as aliases instead of join keywords). The current implementation with the predicate is a reasonable solution for this grammar conflict, and removing it without an alternative disambiguation strategy would break join parsing. The suggestion lacks a concrete alternative approach.

Low

Previous suggestions

Suggestions up to commit f1084b4
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate CROSS JOIN has no condition

For CROSS JOIN, the SQL standard prohibits an ON condition, but this code allows it
when ctx.expression() is present. Validate that CROSS JOIN has no condition to
prevent generating invalid logical plans that may cause downstream errors.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [56-62]

 @Override
 public UnresolvedPlan visitJoinClause(JoinClauseContext ctx) {
   JoinType joinType = toJoinType(ctx);
   UnresolvedPlan right = visit(ctx.relation());
   Optional<UnresolvedExpression> condition =
       Optional.ofNullable(ctx.expression()).map(this::visitAstExpression);
+  if (joinType == JoinType.CROSS && condition.isPresent()) {
+    throw new IllegalArgumentException("CROSS JOIN cannot have an ON condition");
+  }
   return join(right, joinType, condition);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid semantic validation. The grammar allows CROSS JOIN relation (ON expression)?, but SQL standard prohibits ON conditions for CROSS JOIN. Adding this validation prevents invalid logical plans and provides early error detection, improving correctness.

Medium
Add type safety for join plan

The code performs an unsafe cast to Join without verifying the type returned by
visit(joinCtx). If visit returns a different UnresolvedPlan subtype, this will throw
a ClassCastException at runtime. Add type checking before casting.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [143-147]

 for (var joinCtx : ctx.joinClause()) {
   UnresolvedPlan joinPlan = visit(joinCtx);
+  if (!(joinPlan instanceof Join)) {
+    throw new IllegalStateException("Expected Join plan but got " + joinPlan.getClass().getSimpleName());
+  }
   ((Join) joinPlan).attach(result);
   result = joinPlan;
 }
Suggestion importance[1-10]: 5

__

Why: The cast is safe in the current implementation since visitJoinClause always returns a Join object. However, adding explicit type checking improves defensive programming and provides clearer error messages if the implementation changes. This is a minor robustness improvement.

Low
Improve join type detection logic

The toJoinType method only checks the start token type, which may not correctly
distinguish between INNER JOIN and CROSS JOIN when INNER is explicitly specified.
Consider checking for the presence of CROSS token explicitly in the context to
ensure accurate join type detection.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [64-70]

 private JoinType toJoinType(JoinClauseContext ctx) {
-  return switch (ctx.getStart().getType()) {
-    case OpenSearchSQLParser.LEFT -> JoinType.LEFT;
-    case OpenSearchSQLParser.RIGHT -> JoinType.RIGHT;
-    case OpenSearchSQLParser.CROSS -> JoinType.CROSS;
-    default -> JoinType.INNER;
-  };
+  for (int i = 0; i < ctx.getChildCount(); i++) {
+    int tokenType = ctx.getChild(i) instanceof org.antlr.v4.runtime.tree.TerminalNode
+        ? ((org.antlr.v4.runtime.tree.TerminalNode) ctx.getChild(i)).getSymbol().getType()
+        : -1;
+    if (tokenType == OpenSearchSQLParser.CROSS) return JoinType.CROSS;
+    if (tokenType == OpenSearchSQLParser.LEFT) return JoinType.LEFT;
+    if (tokenType == OpenSearchSQLParser.RIGHT) return JoinType.RIGHT;
+  }
+  return JoinType.INNER;
 }
Suggestion importance[1-10]: 3

__

Why: The current implementation using ctx.getStart().getType() is correct for the grammar rules defined in OpenSearchSQLParser.g4. The grammar explicitly defines (INNER | CROSS)? JOIN and (LEFT | RIGHT) OUTER? JOIN, making the start token sufficient to distinguish join types. The suggested iteration through all children is unnecessary complexity.

Low
Suggestions up to commit 42138ba
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate CROSS JOIN has no condition

The toJoinType method assumes the join type is determined solely by the first token,
but CROSS JOIN has a condition check (ON expression?) in the grammar. A CROSS JOIN
with an ON clause should be rejected or handled differently, as CROSS JOINs are
unconditional by definition.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [64-71]

 private JoinType toJoinType(JoinClauseContext ctx) {
-  return switch (ctx.getStart().getType()) {
+  int type = ctx.getStart().getType();
+  if (type == OpenSearchSQLParser.CROSS && ctx.expression() != null) {
+    throw new IllegalArgumentException("CROSS JOIN cannot have an ON condition");
+  }
+  return switch (type) {
     case OpenSearchSQLParser.LEFT -> JoinType.LEFT;
     case OpenSearchSQLParser.RIGHT -> JoinType.RIGHT;
     case OpenSearchSQLParser.CROSS -> JoinType.CROSS;
     default -> JoinType.INNER;
   };
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that CROSS JOINs should not have ON conditions per SQL semantics. However, the grammar already allows ON expression? for CROSS JOIN, so this validation should be added to prevent semantic errors. The score is moderate because while this is a valid concern, it's a semantic validation issue rather than a critical bug.

Medium
Validate join plan type before casting

The code casts visit(joinCtx) to Join without validation. If visitJoinClause throws
SyntaxCheckException (as it does in the base implementation), or returns a different
plan type, this cast will fail with ClassCastException. Add a type check or handle
the exception before casting.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [143-147]

 for (var joinCtx : ctx.joinClause()) {
   UnresolvedPlan joinPlan = visit(joinCtx);
+  if (!(joinPlan instanceof Join)) {
+    throw new IllegalStateException("Expected Join plan but got " + joinPlan.getClass());
+  }
   ((Join) joinPlan).attach(result);
   result = joinPlan;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about unsafe casting, but in this context the cast is safe because visitJoinClause in the base AstBuilder throws SyntaxCheckException, and only the ExtendedAstBuilder override returns a Join plan. The code is designed to work with the extended builder where the cast is guaranteed to succeed. Adding the check would be defensive but not strictly necessary given the design.

Low
Suggestions up to commit 37072eb
CategorySuggestion                                                                                                                                    Impact
General
Add type safety for join cast

The code performs an unchecked cast (Join) joinPlan without verifying the actual
type returned by visit(joinCtx). If visit returns a different UnresolvedPlan
subtype, this will throw a ClassCastException at runtime. Add type checking before
casting to ensure robustness.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [143-147]

 for (var joinCtx : ctx.joinClause()) {
   UnresolvedPlan joinPlan = visit(joinCtx);
-  ((Join) joinPlan).attach(result);
-  result = joinPlan;
+  if (joinPlan instanceof Join join) {
+    join.attach(result);
+    result = join;
+  }
 }
Suggestion importance[1-10]: 7

__

Why: Valid suggestion to add type checking before casting. The visit(joinCtx) method calls visitJoinClause which throws SyntaxCheckException in the base AstBuilder, but in ExtendedAstBuilder it returns a Join. Adding type safety prevents potential ClassCastException if the visitor behavior changes.

Medium
Require left relation in join construction

The join method creates a Join node without attaching it to a left relation, leaving
it in an incomplete state. The caller must manually call attach() afterward.
Consider requiring the left relation as a parameter to ensure the join is properly
constructed at creation time and prevent incomplete join nodes.

core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java [762-773]

 public static UnresolvedPlan join(
-    UnresolvedPlan right, Join.JoinType joinType, Optional<UnresolvedExpression> condition) {
-  return new Join(
+    UnresolvedPlan left,
+    UnresolvedPlan right,
+    Join.JoinType joinType,
+    Optional<UnresolvedExpression> condition) {
+  Join join = new Join(
       right,
       Optional.empty(),
       Optional.empty(),
       joinType,
       condition,
       new Join.JoinHint(),
       Optional.empty(),
       Argument.ArgumentMap.empty());
+  join.attach(left);
+  return join;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the join method creates an incomplete Join node requiring manual attach() call. However, the current design appears intentional to match the pattern used in visitJoinClause where joins are built incrementally. While the improved API would be cleaner, changing it would require updates to all callers and may not align with the existing AST construction pattern.

Low
Handle OUTER keyword in join detection

The toJoinType method only checks the first token type but doesn't handle OUTER
keyword in LEFT OUTER JOIN or RIGHT OUTER JOIN syntax. This could lead to incorrect
join type detection when OUTER appears as the start token. Verify the token sequence
to correctly identify join types with optional OUTER keyword.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [64-71]

 private JoinType toJoinType(JoinClauseContext ctx) {
-  return switch (ctx.getStart().getType()) {
+  int firstToken = ctx.getStart().getType();
+  return switch (firstToken) {
     case OpenSearchSQLParser.LEFT -> JoinType.LEFT;
     case OpenSearchSQLParser.RIGHT -> JoinType.RIGHT;
     case OpenSearchSQLParser.CROSS -> JoinType.CROSS;
+    case OpenSearchSQLParser.INNER, OpenSearchSQLParser.JOIN -> JoinType.INNER;
     default -> JoinType.INNER;
   };
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about handling the OUTER keyword, but the grammar in OpenSearchSQLParser.g4 shows that OUTER is optional and appears between LEFT/RIGHT and JOIN. The getStart() method will return the first token (LEFT or RIGHT), not OUTER, so the current implementation is correct. The improved code adds redundant cases without fixing an actual issue.

Low

@dai-chen dai-chen force-pushed the feature/sql-v2-join branch from 37072eb to 42138ba Compare May 15, 2026 17:40
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 42138ba.

PathLineSeverityDescription
api/src/main/java/org/opensearch/sql/api/UnifiedQueryContext.java147mediumSubsearch limits are switched from SysLimit.DEFAULT to SysLimit.UNLIMITED_SUBSEARCH (0 = unlimited) for all unified SQL path executions. Removing resource limits on subsearch and join-subsearch operations could enable unbounded memory/CPU consumption via crafted queries, potentially enabling denial-of-service. The comment justifies this as an architectural decision to exclude LogicalSystemLimit from logical plans, but the scope (all unified SQL queries) and absence of an alternative enforcement point warrants review to confirm no external rate-limiting or circuit-breaker compensates.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 42138ba

@dai-chen dai-chen force-pushed the feature/sql-v2-join branch from 42138ba to f1084b4 Compare May 15, 2026 18:24
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f1084b4

Extend the V2 ANTLR grammar with JOIN clause rules (INNER/LEFT/RIGHT/
CROSS) and add ExtendedAstBuilder in SqlV2QueryParser to produce Join
AST nodes for the Calcite-based unified query path. The base AstBuilder
throws SyntaxCheckException to preserve legacy engine fallback.

A semantic predicate prevents LEFT/RIGHT from being consumed as implicit
table aliases when followed by JOIN.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/sql-v2-join branch from f1084b4 to 92004cf Compare May 15, 2026 18:55
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 92004cf

@dai-chen dai-chen requested a review from songkant-aws as a code owner May 15, 2026 19:25
@dai-chen dai-chen merged commit 6421658 into opensearch-project:main May 15, 2026
62 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants