Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzziness support in intervals query #49595

Closed
consulthys opened this issue Nov 26, 2019 · 9 comments · Fixed by #49762
Closed

Fuzziness support in intervals query #49595

consulthys opened this issue Nov 26, 2019 · 9 comments · Fixed by #49762
Labels
>feature :Search/Search Search-related issues that do not fall into other categories

Comments

@consulthys
Copy link
Contributor

consulthys commented Nov 26, 2019

Feature request:

Following up on this discussion:

ES 7.0 introduced support for the Lucene intervals query which is more powerful and easier to deal with than span queries (analyzer support, etc). However, one thing that span queries support and intervals doesn't is fuzziness. Since the intervals query is supposed to help in legal and patent search, I have a hard time understanding how this could be possible without fuzziness support.

I could not find much information about that when browsing the Github issues. Is there any reason why the intervals query doesn't support fuzziness? (because Lucene doesn't maybe)? Is it on the roadmap?

Quoting @jimczi who suggested to open a new issue so that this new feature request could be discussed:

I think it's worth opening an issue in Elasticsearch and we'll discuss where the support should land (Elasticsearch or Lucene). As @Mikhail_Khludnev suggested it should be easy to make fuzzy intervals in Elasticsearch using the MultiTermIntervalsSource except that it is not exposed 😉. Queries that need to check positions cannot handle large number of multi-terms so we have some logic to restrict to those that expand to less than a provided threshold (bounded to 1024). With this protection in place I don't see why we should not expose them more simply in Lucene.

The floor is yours, guys!

@mattweber
Copy link
Contributor

This is one of the reasons I created #49519. Even if this is not something you (Elastic) want to maintain and/or put restrictions on, please allow us to do it our self via plugins.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >feature labels Nov 26, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jimczi
Copy link
Contributor

jimczi commented Nov 26, 2019

If we have the same restriction than prefix and wildcard on intervals (we throw an error if the maximum number of expansions is reached) I don't see why we could not have the support in core. @romseygeek what do you think ?

@romseygeek
Copy link
Contributor

Fuzziness is slightly trickier than prefix or wildcard because we don't just select the first n matching terms from the index, we also rank them by how close they are to the original term, so I think this may require some work in lucene as well. But +1 to adding it to core, it will be a generally useful thing to support.

@jimczi
Copy link
Contributor

jimczi commented Nov 27, 2019

Fuzziness is slightly trickier than prefix or wildcard because we don't just select the first n matching terms from the index

The current strategy for multi-terms interval query is to throw an error if the number of expanded terms is greater than the provided threshold. I think we should apply the same for fuzzy query in order to ensure that we don't miss result due to pruning ?

@prasad2kin
Copy link

This is one of the reasons I created #49519. Even if this is not something you (Elastic) want to maintain and/or put restrictions on, please allow us to do it our self via plugins.

Good to see you @mattweber. Have you developed any such plugin that is publically available?

@mattweber
Copy link
Contributor

@prasad2kin Hi! No, I have not because intervals are not currently pluggable. I opened #49519 to make it possible, but I am not sure this will get merged or not.

@consulthys
Copy link
Contributor Author

Thanks @romseygeek !!
Any idea when this will get merged and in which release?

@jimczi
Copy link
Contributor

jimczi commented Dec 13, 2019

@consulthys you can check the version in the pr, it is currently targeted for 7.6.

romseygeek added a commit that referenced this issue Jan 3, 2020
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes #49595
romseygeek added a commit that referenced this issue Jan 3, 2020
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes #49595
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes elastic#49595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants