Skip to content

Preserve original expression text as column name in unified SQL#5392

Draft
dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
dai-chen:fix-5332-preserve-expr-names
Draft

Preserve original expression text as column name in unified SQL#5392
dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
dai-chen:fix-5332-preserve-expr-names

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

Description [WIP]

Calcite's SqlToRelConverter names unnamed SELECT items EXPR$0, EXPR$1, etc. - surprising versus PostgreSQL/MySQL/Spark. Fix by adding SelectItemAliasRewriter, a pre-validation SqlShuttle that wraps unnamed, non-identifier items with AS via the existing LanguageSpec.postParseRules hook.

Related Issues

Resolves #5332

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Calcite's SqlToRelConverter names unnamed SELECT items EXPR$0, EXPR$1,
etc. - surprising versus PostgreSQL/MySQL/Spark. Fix by adding
SelectItemAliasRewriter, a pre-validation SqlShuttle that wraps
unnamed, non-identifier items with AS <text> via the existing
LanguageSpec.postParseRules hook.

  SELECT COUNT(*) FROM t   -> `COUNT(*)`  (was EXPR$0)
  SELECT UPPER(name)       -> `UPPER(name)`
  SELECT x AS y, name, *   -> unchanged

Resolves opensearch-project#5332

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added enhancement New feature or request SQL labels Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add SelectItemAliasRewriter implementation and unit tests

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/search/SelectItemAliasRewriter.java
  • api/src/test/java/org/opensearch/sql/api/spec/search/SelectItemAliasRewriterTest.java

Sub-PR theme: Wire SelectItemAliasRewriter into SearchExtension and update integration tests

Relevant files:

  • api/src/main/java/org/opensearch/sql/api/spec/search/SearchExtension.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedQueryPlannerSqlTest.java
  • api/src/test/java/org/opensearch/sql/api/UnifiedRelevanceSearchSqlTest.java

⚡ Recommended focus areas for review

Alias Collision

When two unnamed expressions produce the same SQL text (e.g., SELECT COUNT(*), COUNT(*) FROM t), both will receive the same alias string. This could cause ambiguous column name issues downstream in Calcite's validator or planner.

private static SqlNode aliasIfNeeded(SqlNode item) {
  if (item.getKind() == SqlKind.AS || item instanceof SqlIdentifier) {
    return item;
  }
  return SqlValidatorUtil.addAlias(item, item.toSqlString(UNPARSE_CONFIG).getSql());
}
Star Handling

The aliasIfNeeded method does not explicitly handle SqlKind.STAR (i.e., SELECT * or SELECT t.*). While SqlIdentifier covers simple *, a qualified t.* may not be an SqlIdentifier in all Calcite versions and could inadvertently receive an alias like AS "t.*".

private static SqlNode aliasIfNeeded(SqlNode item) {
  if (item.getKind() == SqlKind.AS || item instanceof SqlIdentifier) {
    return item;
  }
  return SqlValidatorUtil.addAlias(item, item.toSqlString(UNPARSE_CONFIG).getSql());
}
Alias Validity

The alias is derived directly from item.toSqlString(UNPARSE_CONFIG).getSql(), which may produce strings containing special characters (parentheses, operators, spaces). While Calcite may accept these as quoted identifiers, some downstream consumers or serialization paths may not handle such alias names correctly.

return SqlValidatorUtil.addAlias(item, item.toSqlString(UNPARSE_CONFIG).getSql());
Incomplete Assertion

testUnnamedArithmeticExpression and testUnnamedFunctionCall only assert that AS is present somewhere in the output, but do not verify the actual alias text. This makes the tests weak and could miss regressions in the alias content.

public void testUnnamedArithmeticExpression() throws Exception {
  SqlNode result = rewrite("SELECT 1 + 1 FROM t");
  assertContains(result, " AS ");
}

@Test
public void testUnnamedFunctionCall() throws Exception {
  SqlNode result = rewrite("SELECT UPPER(name) FROM t");
  assertContains(result, " AS ");
}

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null select list

*select.getSelectList() can return null for SELECT queries in some Calcite
versions, which would cause a NullPointerException when iterating. A null check
should be added before processing the select list.

api/src/main/java/org/opensearch/sql/api/spec/search/SelectItemAliasRewriter.java [52-59]

 private static void rewriteSelectList(SqlSelect select) {
   SqlNodeList old = select.getSelectList();
+  if (old == null) {
+    return;
+  }
   SqlNodeList rewritten = new SqlNodeList(old.getParserPosition());
   for (SqlNode item : old) {
     rewritten.add(aliasIfNeeded(item));
   }
   select.setSelectList(rewritten);
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid defensive programming suggestion - getSelectList() can return null for SELECT * in some Calcite versions, which would cause a NullPointerException. The test testStarUntouched passes a SELECT * query, suggesting this edge case exists, though it may be handled differently in the tested Calcite version.

Low
General
Normalize whitespace in derived alias names

The alias string derived from toSqlString may contain characters (e.g., newlines,
extra whitespace from complex expressions) that make it an invalid or unexpected
column name. Consider trimming and normalizing whitespace in the alias string to
avoid surprising column names.

api/src/main/java/org/opensearch/sql/api/spec/search/SelectItemAliasRewriter.java [66]

-return SqlValidatorUtil.addAlias(item, item.toSqlString(UNPARSE_CONFIG).getSql());
+String alias = item.toSqlString(UNPARSE_CONFIG).getSql().trim().replaceAll("\\s+", " ");
+return SqlValidatorUtil.addAlias(item, alias);
Suggestion importance[1-10]: 4

__

Why: The UNPARSE_CONFIG already sets withIndentation(0) and withSelectListItemsOnSeparateLines(false), which reduces whitespace issues. However, trimming and normalizing whitespace could still be a minor improvement for edge cases with complex expressions that might produce unexpected whitespace in alias names.

Low
Document thread-safety assumption for singleton rewriter

SelectItemAliasRewriter is stateless and used as a singleton via INSTANCE, but
SqlShuttle is not documented as thread-safe. If visit is ever called concurrently
(e.g., in a multi-threaded query planner), shared mutable state in the parent class
could cause issues. Consider verifying thread-safety of SqlShuttle, or document the
assumption that this rewriter is only used single-threaded.

api/src/main/java/org/opensearch/sql/api/spec/search/SelectItemAliasRewriter.java [31]

+// If SqlShuttle is confirmed stateless/thread-safe, keep as-is and add a comment:
+/** Singleton instance. Safe for concurrent use only if SqlShuttle is stateless. */
 public static final SelectItemAliasRewriter INSTANCE = new SelectItemAliasRewriter();
Suggestion importance[1-10]: 2

__

Why: This suggestion asks to add a comment about thread-safety, which is a documentation concern. The improved_code only adds a comment rather than making a functional change, and the suggestion score should be low per guidelines about docstring/comment additions.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnifiedQueryPlanner should preserve original SQL expression names instead of EXPR$N

1 participant