Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit minimum_should_match number support for Terms Set Query #94095

Closed
kulinsj opened this issue Feb 23, 2023 · 13 comments
Closed

Explicit minimum_should_match number support for Terms Set Query #94095

kulinsj opened this issue Feb 23, 2023 · 13 comments
Labels
>enhancement good first issue low hanging fruit help wanted adoptme :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@kulinsj
Copy link

kulinsj commented Feb 23, 2023

Description

The Terms Set Query lets you match documents that have some minimum number of matches to a given array of input search terms. The minimum number of matches however can only be specified by referencing another field on the document or by using a script.

I have a use case where I need to be able to specify the minimum number of terms to match at query time, agnostic of documents so the minimum_should_match_field is not useful. Further, since I know I need to match a minimum of exactly, say, 3 terms, it seems like overkill to have to enable and using scripting (as in the use of minimum_should_match_script) to just return a static number.

Today as an alternative, I'm instead using a Bool Query with a minimum_should_match accepting my explicit number and all of my input search terms mapping to an array of individual { term: { field: value } } entries in the should array of the bool. However, this is very json-inefficient, especially when I am making queries with many (possibly thousands) of terms in my query.

I was so happy to find the Terms Set query as a leaner, more specialized alternative to my massive should array but was then very surprised that there was no query-time input for the minimum match count (outside scripting).

TL;DR: Terms Set query should support a third possible top-level parameter for specifying the minimum number of terms to match which simply accepts a number, like minimum_should_match

@kulinsj kulinsj added >enhancement needs:triage Requires assignment of a team area label labels Feb 23, 2023
@inqueue inqueue added the :Search/Search Search-related issues that do not fall into other categories label Feb 24, 2023
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Feb 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ritwikbd
Copy link

Is the accepted answer on this link useful for you?

@kulinsj
Copy link
Author

kulinsj commented Feb 27, 2023

Is the accepted answer on this link useful for you?

That's pretty much exactly what I describe as the alternative I'm using now in my original post: using a bool query with should and n term queries in the should array - but my issue is that while that works, it's very JSON inefficient, especially for queries with many [many] terms leading to a lot of extra overhead in sending and receiving the query (not to mention being more cumbersome to read if you every need to analyse the entire formatted query).

@benwtrent
Copy link
Member

The main concern I have is around optimizations. We attempt to optimize the terms queries of certain field types.

Any minimum_should_match supplied in the terms query would almost certainly remove any potential optimizations and would have to be naive translation to a pure boolean disjunction.

@kulinsj
Copy link
Author

kulinsj commented Feb 28, 2023

To be clear, I'm talking about the Terms Set query which is distinct from the Terms Query. The Terms Set query already supports a minimum match specification in two other formats - I don't see how this would be any less optimized than those? If anything it should be more performant than the existingly supported minimum_should_match_field and minimum_should_match_script since it wouldn't require an additional field lookup or script execution on every document.

@javanna javanna added good first issue low hanging fruit help wanted adoptme labels May 3, 2023
@Kiriakos1998
Copy link
Contributor

Hello @javanna @benwtrent, can I pick it up? My understanding is that a new parameter is needed in the terms set query to specify what're the minimum matches in order to return a document.
GET /job-candidates/_search { "query": { "terms_set": { "programming_languages": { "terms": [ "c++", "java", "php" ], "minimum_should_match": "2" } } } }
For example, this query will return all documents that have at least two of the provided languages in their programming_languages field.

@javanna
Copy link
Member

javanna commented May 10, 2023

hi @Kiriakos1998 sounds good to me.

@Kiriakos1998
Copy link
Contributor

Hello @javanna @benwtrent , i have opened a pull request.

@Utsavk
Copy link

Utsavk commented Jul 21, 2023

Hey @javanna @benwtrent ,

Is the currently opened pull request still in consideration or can I pick up this task? I am a first time contributor.

@benwtrent
Copy link
Member

@Utsavk we should allow @Kiriakos1998 to finish their work. I added a review there.

Thank for your interest in the issue!

benwtrent added a commit that referenced this issue Jul 28, 2023
Support minimum_should_match for terms_set query.

Closes(#94095)

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
felixbarny pushed a commit to felixbarny/elasticsearch that referenced this issue Aug 3, 2023
Support minimum_should_match for terms_set query.

Closes(elastic#94095)

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
@shin96
Copy link

shin96 commented Sep 25, 2023

If the issues is fixed can we mark this as closed?

@cbuescher
Copy link
Member

Good point, auto close didn't work here.

@PemLer
Copy link

PemLer commented Sep 27, 2023

Hi @benwtrent, may I ask which version we can use feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement good first issue low hanging fruit help wanted adoptme :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests