Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: return only matching terms from Search/Query API #17045

Closed
made-by-chris opened this issue Mar 10, 2016 · 5 comments
Closed
Labels
discuss >feature :Search/Search Search-related issues that do not fall into other categories

Comments

@made-by-chris
Copy link

Describe the feature:

Queries should have the option to just return a collection of matching terms ( not the documents containing the matching terms ). This would combine the power and flexibility of queries with the output of suggesters.

Example

Let's assume the following data is indexed:

{title: 'Krylov subspace methods'},
{title: 'SUPG methods for finite differences'},

if i query the following -

{
    "fuzzy" : {
        "title" : "krylo"
    }
}

The response object contains "krylov", not the entire field or document containing "krylov".

This allows users to construct more powerful queries to find terms, in lieu of using comparatively limited suggesters.

EDIT: to clarify I'm aware that something similar is achievable with the use of highlights, but that seems wasteful, less predictable and a bit hacky.

@clintongormley
Copy link

Hi @basiclaser

What you are suggesting looks easy, but when you are working with the full query DSL it soon becomes complex. A fuzzy query might return thousands of terms. Which ones do you choose to return to the user? How do you apply the logic of the bool query to what is returned.

Why do you say that the suggesters are limited here? The phrase suggester seems like just the thing you need. It handles fuzziness, but more importantly it finds the best related terms.

Perhaps what you're looking for instead is a form of query expansion, eg: do a query, get the top 5 matching docs, retrieve the most interesting terms from those docs then rerun the query with those terms?

@made-by-chris
Copy link
Author

Hi clinton, thanks for your advice and ponderings.
I can clarify that I definitely do want whichever terms match my query. I appreciate that this could be achieved by extracting the relevant terms from the document but that feels a bit like climbing through a window to open a door. I'd rather just open the door.

In my particular use-case there is no fuzziness, though I feel that this feature should be an available feature for whatever type of query.

Elasticsearch is after all an analytics engine too, right? If somebody is looking to use ES as a means of analysing and returning terms, should that not simply be available? To not provide it, I think, limits the way people think about Elasticsearch.
Would you happen to know if this functionality is available in Lucene? If there is some technical limitation preventing this feature I'd love to hear about it too.

Thanks again!

@bkazez
Copy link

bkazez commented Jan 1, 2018

I'd find this very useful as well. It opens the door to all kinds of analysis to help with data quality.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
@javanna
Copy link
Member

javanna commented Mar 16, 2018

@elastic/es-search-aggs

@imotov
Copy link
Contributor

imotov commented Oct 1, 2018

I appreciate that this could be achieved by extracting the relevant terms from the document but that feels a bit like climbing through a window to open a door. I'd rather just open the door.

@basiclaser unfortunately, there is no simple way to do it. Due to its distributed nature and potentially very large of results in a single search, elasticsearch performs searches in 2 phase: first in the QUERY phase we find top relevant results on each shard, during this phase we have information about why we found each individual record, but we immediately discard after determining if the result matches the query since it would be too cumbersome and expensive to keep this information around. After each shard sends their top records to the coordinating node, the coordinating node, determines the global top records and shards that these records came from and asks these shards to retrieve the records in the FETCH phase. Unfortunately, during the FETCH phase we no longer have the information that we discovered, used and discarded during the QUERY phase.

So, to figure out why a record was found we have to either 1) preserve this information from the QUERY phase or 2) rediscover this later one. The first option is currently implemented using named queries mechanism, that might work for you if you can represent each term you care about as an individual named querty. The second option is implemented using highlighting, which can work with the original query as well as arbitrary queries. The current highlighting functionality is pretty limited for this purpose at the moment but as you can see in the discussion in #29631 we have some plans to expand in the future.

Since the most likely approach for this issue will be to follow the highlighting plugin path rather than a general search feature, I am going to close this issue.

@imotov imotov closed this as completed Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss >feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants