-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial phrase matching suggestion #34960
Comments
To further clarify, the following searches would also partially phrase match that document: |
Pinging @elastic/es-search-aggs |
@mohmad-null sorry, but I don't quite follow your as here. Phrase queries are intended to match exact phrases and even have the "slop" factor to incorporate a certain degree of fuzzy matching. What you describe should be already covered by general "match" queries, possibly in conjuction with "span" queries if you need the terms to be in a certain order. Could you elaborate a bit more why those don't satisfy your needs? |
My suggestion is based around the fact that in real-world use, a search query can have extraneous words beyond just a phrase itself. For example, in As far as I can tell from reading the docs, there's no way to do this with ES, but I'm happy to be proven wrong. Match query - As far as I can see from reading around it, there is no proximity/distance component that gets factored in to the score, match is simply a "bag of words" search. That's what the docs explicitly say anyway - https://www.elastic.co/guide/en/elasticsearch/guide/current/proximity-matching.html Span query - I did look into it but the docs suggest it's for a very niche thing where you have lots of knowledge of the search terms and documents, and structure there-in. "These are typically used to implement very specific queries on legal documents or patents." - I'm seeking to use this on a general search engine. As such, a modification of phrase query seemed like the logical way to go. |
@cbuescher - And while it may indeed be possible to do this with My suggestion is to simply add to the high-level thing so folks don't need to figure out the low-level span query thing (which don't seem to be well documented beyond the basic API - there doesn't seem to be any overarching guidance or explanation of how/why to use them or what problems they solve beyond repeated references to "specialized fields like patent searches"). |
We do plan to add a proximity boost option to the match query, although I don't think there's an open issue for it. The basic idea is to do an interval query over all terms in the match, which will score higher the close the terms are together as a whole, but I can see an argument for adding interval queries over each consecutive pair of terms as well. We'd want to be careful to avoid expanding the query too much though. |
Cases like this can probably also be solved with n-grams shingles (e.g. 2 and 3-grams cover a lot of phrase-like expressions in english). For starters, e.g. https://www.elastic.co/blog/searching-with-shingles gives a rough idea what to do. |
It would be nice if you could do partial phrase matching. I don't mean like the phrase_prefix.
Consider the below query:
south europe trees
You run this as a match_phrase against a text field with this value:
South Europe Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
This it will find nothing as the entire sequence of terms does not appear in the text field.
What I would like is for there to be a "partial" flag which when enabled would allow this to return the partial result as "South Europe" does appear in the text field. The score would be commensurately lower of course.
The text was updated successfully, but these errors were encountered: