Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice query for Point in time readers (PIT) #65740

Closed
jimczi opened this issue Dec 2, 2020 · 1 comment · Fixed by #74457
Closed

Slice query for Point in time readers (PIT) #65740

jimczi opened this issue Dec 2, 2020 · 1 comment · Fixed by #74457
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jimczi
Copy link
Contributor

jimczi commented Dec 2, 2020

Slice queries that are used under a PIT (point-in-time) reader should use the internal Lucene document id to filter documents. If all slices use the same PIT, relying on Lucene document ids for the filtering should be much more effective than the current TermsSliceQuery that uses the _id field.
We could also deprecate the usage of slices in scrolls if they are more effective inside a PIT but that can be done in a follow up.

@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Dec 2, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@matriv matriv self-assigned this Dec 7, 2020
@jtibshirani jtibshirani self-assigned this Jun 10, 2021
jtibshirani added a commit that referenced this issue Jul 8, 2021
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.

Closes #65740.
jtibshirani added a commit that referenced this issue Jul 9, 2021
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.

Closes #65740.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants