Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle duplicates in unordered interval matching #49775

Closed
wants to merge 7 commits into from

Conversation

romseygeek
Copy link
Contributor

Currently, unordered interval matching does not check for duplicates,
which means that a query for to be or not to be can match a document
that contains the phrase to be or not, because the second to be matches
at the same position as the first and the AND interval algorithm does not
check for overlaps. This is counter-intuitive.

This commit adds a check to the interval builder, such that if it finds duplicates
when combining sources into an unordered AND, it combines those duplicates
into an ORDERED interval first; so to be or not to be becomes
UNORDERED(ORDERED(to, to), ORDERED(be, be), or, not)

@romseygeek romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.6.0 labels Dec 2, 2019
@romseygeek romseygeek self-assigned this Dec 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek
Copy link
Contributor Author

This really needs to be handled in lucene, as this solution doesn't correctly handle internal gaps in intervals with repeats. I've opened https://github.com/apache/lucene-solr/pull/1097/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants