Feature request: return only matching terms from Search/Query API #17045

made-by-chris · 2016-03-10T10:58:02Z

Describe the feature:

Queries should have the option to just return a collection of matching terms ( not the documents containing the matching terms ). This would combine the power and flexibility of queries with the output of suggesters.

Example

Let's assume the following data is indexed:

{title: 'Krylov subspace methods'},
{title: 'SUPG methods for finite differences'},

if i query the following -

{
    "fuzzy" : {
        "title" : "krylo"
    }
}

The response object contains "krylov", not the entire field or document containing "krylov".

This allows users to construct more powerful queries to find terms, in lieu of using comparatively limited suggesters.

EDIT: to clarify I'm aware that something similar is achievable with the use of highlights, but that seems wasteful, less predictable and a bit hacky.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-03-14T09:57:06Z

Hi @basiclaser

What you are suggesting looks easy, but when you are working with the full query DSL it soon becomes complex. A fuzzy query might return thousands of terms. Which ones do you choose to return to the user? How do you apply the logic of the bool query to what is returned.

Why do you say that the suggesters are limited here? The phrase suggester seems like just the thing you need. It handles fuzziness, but more importantly it finds the best related terms.

Perhaps what you're looking for instead is a form of query expansion, eg: do a query, get the top 5 matching docs, retrieve the most interesting terms from those docs then rerun the query with those terms?

made-by-chris · 2016-03-15T19:40:27Z

Hi clinton, thanks for your advice and ponderings.
I can clarify that I definitely do want whichever terms match my query. I appreciate that this could be achieved by extracting the relevant terms from the document but that feels a bit like climbing through a window to open a door. I'd rather just open the door.

In my particular use-case there is no fuzziness, though I feel that this feature should be an available feature for whatever type of query.

Elasticsearch is after all an analytics engine too, right? If somebody is looking to use ES as a means of analysing and returning terms, should that not simply be available? To not provide it, I think, limits the way people think about Elasticsearch.
Would you happen to know if this functionality is available in Lucene? If there is some technical limitation preventing this feature I'd love to hear about it too.

Thanks again!

bkazez · 2018-01-01T12:42:46Z

I'd find this very useful as well. It opens the door to all kinds of analysis to help with data quality.

javanna · 2018-03-16T11:12:39Z

@elastic/es-search-aggs

imotov · 2018-10-01T15:34:34Z

I appreciate that this could be achieved by extracting the relevant terms from the document but that feels a bit like climbing through a window to open a door. I'd rather just open the door.

@basiclaser unfortunately, there is no simple way to do it. Due to its distributed nature and potentially very large of results in a single search, elasticsearch performs searches in 2 phase: first in the QUERY phase we find top relevant results on each shard, during this phase we have information about why we found each individual record, but we immediately discard after determining if the result matches the query since it would be too cumbersome and expensive to keep this information around. After each shard sends their top records to the coordinating node, the coordinating node, determines the global top records and shards that these records came from and asks these shards to retrieve the records in the FETCH phase. Unfortunately, during the FETCH phase we no longer have the information that we discovered, used and discarded during the QUERY phase.

So, to figure out why a record was found we have to either 1) preserve this information from the QUERY phase or 2) rediscover this later one. The first option is currently implemented using named queries mechanism, that might work for you if you can represent each term you care about as an individual named querty. The second option is implemented using highlighting, which can work with the original query as well as arbitrary queries. The current highlighting functionality is pretty limited for this purpose at the moment but as you can see in the discussion in #29631 we have some plans to expand in the future.

Since the most likely approach for this issue will be to follow the highlighting plugin path rather than a general search feature, I am going to close this issue.

clintongormley added discuss :Query DSL labels Mar 14, 2016

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

colings86 added the >feature label Apr 24, 2018

imotov closed this as completed Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: return only matching terms from Search/Query API #17045

Feature request: return only matching terms from Search/Query API #17045

made-by-chris commented Mar 10, 2016

clintongormley commented Mar 14, 2016

made-by-chris commented Mar 15, 2016

bkazez commented Jan 1, 2018

javanna commented Mar 16, 2018

imotov commented Oct 1, 2018

Feature request: return only matching terms from Search/Query API #17045

Feature request: return only matching terms from Search/Query API #17045

Comments

made-by-chris commented Mar 10, 2016

Describe the feature:

Example

clintongormley commented Mar 14, 2016

made-by-chris commented Mar 15, 2016

bkazez commented Jan 1, 2018

javanna commented Mar 16, 2018

imotov commented Oct 1, 2018