Skip to content

Support pushdown sort by simple expressions#4071

Merged
yuancu merged 12 commits intoopensearch-project:mainfrom
songkant-aws:optimize-monotonic-rex-sort-expand
Sep 4, 2025
Merged

Support pushdown sort by simple expressions#4071
yuancu merged 12 commits intoopensearch-project:mainfrom
songkant-aws:optimize-monotonic-rex-sort-expand

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Aug 19, 2025

Description

This PR aims to resolve simple sort expression pushdown problem without prerequisite of project pushdown optimization.

A PPL query may contain a sort over projected trivial expression like: source = test | eval b = a + 1 | sort b

This optimization PR will optimize this PPL into source = test | sort a | eval b = a + 1 so that the problem is translated to pushdown field sort on column a. Also, sorting by field is supposed to be faster than sorting by script considering OpenSearch internal optimization.

Related Issues

Resolves #3990

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws songkant-aws requested a review from yuancu as a code owner August 25, 2025 05:39
@yuancu yuancu added the enhancement New feature or request label Aug 25, 2025
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when the sorted field is not projected in the final output, there will be an added sort at the top.

E.g.

  • source=opensearch-sql_test_index_account | eval b = balance + 1 | sort b | fields b:

    EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[+($t0, $t1)], balance=[$t0], $f1=[$t2])
      CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[SORT->[{
      "balance" : {
        "order" : "asc",
        "missing" : "_first"
      }
    }], LIMIT->10000, PROJECT->[balance]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","_source":{"includes":["balance"],"excludes":[]},"sort":[{"balance":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
    
  • but for source=opensearch-sql_test_index_account | eval b = balance + 1 | sort b | fields b

    EnumerableSort(sort0=[$0], dir0=[ASC-nulls-first])
      EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[+($t0, $t1)], $f0=[$t2])
        CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[balance], SORT->[{
      "balance" : {
        "order" : "asc",
        "missing" : "_first"
      }
    }], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","_source":{"includes":["balance"],"excludes":[]},"sort":[{"balance":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
    

@songkant-aws
Copy link
Contributor Author

songkant-aws commented Sep 2, 2025

It seems that when the sorted field is not projected in the final output, there will be an added sort at the top.

E.g.

  • source=opensearch-sql_test_index_account | eval b = balance + 1 | sort b | fields b:
    EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[+($t0, $t1)], balance=[$t0], $f1=[$t2])
      CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[SORT->[{
      "balance" : {
        "order" : "asc",
        "missing" : "_first"
      }
    }], LIMIT->10000, PROJECT->[balance]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","_source":{"includes":["balance"],"excludes":[]},"sort":[{"balance":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
    
  • but for source=opensearch-sql_test_index_account | eval b = balance + 1 | sort b | fields b
    EnumerableSort(sort0=[$0], dir0=[ASC-nulls-first])
      EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[+($t0, $t1)], $f0=[$t2])
        CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[balance], SORT->[{
      "balance" : {
        "order" : "asc",
        "missing" : "_first"
      }
    }], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","_source":{"includes":["balance"],"excludes":[]},"sort":[{"balance":{"order":"asc","missing":"_first"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
    

Nice catch! It happens when project doesn't carry the equivalent input field in its project node expressions. In ExpandCollationOnProjectExprRule, I use Project itself collation to check if it satisfies the target collation(aka top sort collation). But since Project has only trivial simple expression output, it will contain empty RelCollation []. So in this case, the rule will skip due to the empty fromCollation.

I need to check if it's safe to use Project.getInput() collation instead of collations on Project itself. The problem should be solved with that fix.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +221 to +229
// Cast from low precision to high precision
srcType = typeFactory.createSqlType(SqlTypeName.DECIMAL);
srcInput = rexBuilder.makeInputRef(srcType, 1);
dstType =
typeFactory.createSqlType(
SqlTypeName.DECIMAL, srcType.getPrecision() + 4, srcType.getScale() + 4);
cast = rexBuilder.makeCast(dstType, srcInput);
result = OpenSearchRelOptUtil.getOrderEquivalentInputInfo(cast);
assertExpectedInputInfo(result, 1, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A converse test like the following may help:

// Cast from high precision to low precision
srcType = typeFactory.createSqlType(SqlTypeName.DECIMAL);
srcInput = rexBuilder.makeInputRef(srcType, 1);
dstType =
        typeFactory.createSqlType(
                SqlTypeName.DECIMAL, srcType.getPrecision() - 4, srcType.getScale());
cast = rexBuilder.makeCast(dstType, srcInput);
result = OpenSearchRelOptUtil.getOrderEquivalentInputInfo(cast);
assertFalse(result.isPresent());

@yuancu yuancu merged commit 9f1ee08 into opensearch-project:main Sep 4, 2025
30 of 32 checks passed
songkant-aws added a commit to songkant-aws/sql that referenced this pull request Sep 8, 2025
* Support pushdown sort by simple expressions

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix IT for no pushdown case

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add minor case to allow sort pushdown for casted floating number

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix the issue of using wrong fromCollation

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add some unit tests for OpenSearchRelOptUtil

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix checkstyle

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>
songkant-aws added a commit to songkant-aws/sql that referenced this pull request Sep 8, 2025
* Support pushdown sort by simple expressions

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix IT for no pushdown case

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add minor case to allow sort pushdown for casted floating number

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix the issue of using wrong fromCollation

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add some unit tests for OpenSearchRelOptUtil

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix checkstyle

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>
LantaoJin pushed a commit that referenced this pull request Sep 11, 2025
… (#4243)

* Support pushdown sort by simple expressions (#4071)

* Support pushdown sort by simple expressions

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix IT for no pushdown case

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add minor case to allow sort pushdown for casted floating number

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix the issue of using wrong fromCollation

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add some unit tests for OpenSearchRelOptUtil

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix checkstyle

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix compile issue

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws songkant-aws deleted the optimize-monotonic-rex-sort-expand branch October 9, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Push down sort on RexCall of monotonic

3 participants