Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize sort on numeric long and date fields. #49732

Merged
merged 1 commit into from
Nov 29, 2019

Conversation

mayya-sharipova
Copy link
Contributor

This rewrites long sort as a DistanceFeatureQuery, which can
efficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.

The optimization can be disabled with setting the system property
es.search.rewrite_sort to false.

Optimization is skipped when an index has 50% or more data with
the same value.

Optimization is done through:

  1. Rewriting sort as DistanceFeatureQuery which can
    efficiently skip non-competitive blocks and segments of documents.

  2. Sorting segments according to the primary numeric sort field(Sort leaves on search according to the primary numeric sort field #44021)
    This allows to skip non-competitive segments.

  3. Using collector manager.
    When we optimize sort, we sort segments by their min/max value.
    As a collector expects to have segments in order,
    we can not use a single collector for sorted segments.
    We use collectorManager, where for every segment a dedicated collector
    will be created.

  4. Using Lucene's shared TopFieldCollector manager
    This collector manager is able to exchange minimum competitive
    score between collectors, which allows us to efficiently skip
    the whole segments that don't contain competitive scores.

  5. When index is force merged to a single segment, Add a new merge policy that interleaves old and new segments on force merge #48533 interleaving
    old and new segments allows for this optimization as well,
    as blocks with non-competitive docs can be skipped.

Backport for #48804

Co-authored-by: Jim Ferenczi jim.ferenczi@elastic.co

This rewrites long sort as a `DistanceFeatureQuery`, which can
efficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.

The optimization can be disabled with setting the system property
`es.search.rewrite_sort` to `false`.

Optimization is skipped when an index has 50% or more data with
the same value.

Optimization is done through:
1. Rewriting sort as `DistanceFeatureQuery` which can
efficiently skip non-competitive blocks and segments of documents.

2. Sorting segments according to the primary numeric sort field(elastic#44021)
This allows to skip non-competitive segments.

3. Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.

4. Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.

5. When index is force merged to a single segment, elastic#48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.

Backport for elastic#48804

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
@mayya-sharipova mayya-sharipova added backport v7.6.0 :Search/Search Search-related issues that do not fall into other categories labels Nov 29, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@mayya-sharipova mayya-sharipova merged commit 7cf1708 into elastic:7.x Nov 29, 2019
@mayya-sharipova mayya-sharipova deleted the backport-long-sort-opt branch November 29, 2019 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Search/Search Search-related issues that do not fall into other categories v7.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants