-
Notifications
You must be signed in to change notification settings - Fork 24.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: return only matching terms from Search/Query API #17045
Comments
Hi @basiclaser What you are suggesting looks easy, but when you are working with the full query DSL it soon becomes complex. A fuzzy query might return thousands of terms. Which ones do you choose to return to the user? How do you apply the logic of the bool query to what is returned. Why do you say that the suggesters are limited here? The phrase suggester seems like just the thing you need. It handles fuzziness, but more importantly it finds the best related terms. Perhaps what you're looking for instead is a form of query expansion, eg: do a query, get the top 5 matching docs, retrieve the most interesting terms from those docs then rerun the query with those terms? |
Hi clinton, thanks for your advice and ponderings. In my particular use-case there is no fuzziness, though I feel that this feature should be an available feature for whatever type of query. Elasticsearch is after all an analytics engine too, right? If somebody is looking to use ES as a means of analysing and returning terms, should that not simply be available? To not provide it, I think, limits the way people think about Elasticsearch. Thanks again! |
I'd find this very useful as well. It opens the door to all kinds of analysis to help with data quality. |
@elastic/es-search-aggs |
@basiclaser unfortunately, there is no simple way to do it. Due to its distributed nature and potentially very large of results in a single search, elasticsearch performs searches in 2 phase: first in the QUERY phase we find top relevant results on each shard, during this phase we have information about why we found each individual record, but we immediately discard after determining if the result matches the query since it would be too cumbersome and expensive to keep this information around. After each shard sends their top records to the coordinating node, the coordinating node, determines the global top records and shards that these records came from and asks these shards to retrieve the records in the FETCH phase. Unfortunately, during the FETCH phase we no longer have the information that we discovered, used and discarded during the QUERY phase. So, to figure out why a record was found we have to either 1) preserve this information from the QUERY phase or 2) rediscover this later one. The first option is currently implemented using named queries mechanism, that might work for you if you can represent each term you care about as an individual named querty. The second option is implemented using highlighting, which can work with the original query as well as arbitrary queries. The current highlighting functionality is pretty limited for this purpose at the moment but as you can see in the discussion in #29631 we have some plans to expand in the future. Since the most likely approach for this issue will be to follow the highlighting plugin path rather than a general search feature, I am going to close this issue. |
Describe the feature:
Queries should have the option to just return a collection of matching terms ( not the documents containing the matching terms ). This would combine the power and flexibility of queries with the output of suggesters.
Example
Let's assume the following data is indexed:
if i query the following -
The response object contains "krylov", not the entire field or document containing "krylov".
This allows users to construct more powerful queries to find terms, in lieu of using comparatively limited suggesters.
EDIT: to clarify I'm aware that something similar is achievable with the use of highlights, but that seems wasteful, less predictable and a bit hacky.
The text was updated successfully, but these errors were encountered: