Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sort optimization for all NumericTypes #6326

Closed
gashutos opened this issue Feb 15, 2023 · 2 comments
Closed

Enable sort optimization for all NumericTypes #6326

gashutos opened this issue Feb 15, 2023 · 2 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs

Comments

@gashutos
Copy link
Contributor

gashutos commented Feb 15, 2023

Is your feature request related to a problem? Please describe.
Lucene provides inbuilt sort optimization (point based) for all numeric sort types. As per this PR we were able to enable sort optimization for below 4 types.

  1. DATE
  2. DATE_NANOSECONDS
  3. LONG
  4. DOUBLE

There are other types where we have widened our Types (i.e from SHORT to LONG). Code. So those types are still not optmized and taking time to return hits.

Describe the solution you'd like
Enable Lucene numeric sort optimization for all remaining types mentioned here.
Code

Describe alternatives you've considered
There is no alternative as of now.

Additional context
This will break existing indexed data, so we need to do POC and determine challenges to enable optimizations where SortField.Type is not matched with Point.Type.

        // LUCENE-9280 added the ability for collectors to skip non-competitive
        // documents when top docs are sorted by other fields different from the _score.
        // However, from Lucene 9 onwards, numeric sort optimisation requires the byte size
        // for points (BKD index) and doc values (columnar) and SortField.Type to be matched.
        // NumericType violates this requirement
        // (see: https://github.com/opensearch-project/OpenSearch/issues/2063#issuecomment-1069358826 test failure)
        // because it uses the largest byte size (LONG) for the SortField of most types. The section below disables
        // the BKD based sort optimization for numeric types whose encoded BYTE size does not match the comparator (LONG)/
        // So as of now, we can only enable for DATE, DATE_NANOSECONDS, LONG, DOUBLE.
        // todo : Enable other SortField.Type as well, that will require wider change
@gashutos gashutos added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 15, 2023
@andrross
Copy link
Member

However, from Lucene 9 onwards, numeric sort optimisation requires the byte size for points (BKD index) and doc values (columnar) and SortField.Type to be matched. NumericType violates this requirement (see: #2063 (comment) test failure) because it uses the largest byte size (LONG) for the SortField of most types.

@nknize Do you happen to know why OpenSearch/ES chose to use a LONG for the SortField for the smaller types? I suspect I'm not the only one who would like to know so it might be helpful to capture that here. (Or link to anywhere else that might explain it). Thanks!

@gashutos
Copy link
Contributor Author

@nknize @reta raised above PR to enable sort optimization for all remaining types. Let me know if approach looks good...

@gashutos gashutos self-assigned this Jun 28, 2023
@gashutos gashutos added the Performance This is for any performance related enhancements or bugs label Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs
Projects
None yet
Development

No branches or pull requests

3 participants