Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

Closed
emedina opened this Issue Mar 17, 2011 · 1 comment

2 participants

@emedina

Add a flag called analyze_wildcard to both query_string and field queries, once set, a best effort will be made to analyze wildcard and prefix queries as well.

More details:

When we use an analyzer that stems terms into tokens, and then later we want to search against those analyzed terms using a wildcard, by default the search terms are not analyzed, as that analysis could lead into several tokens and the search engine would not be sure which one to use:

http://www.jguru.com/faq/view.jsp?EID=538312

However, in certain circumstances, when the liability of the search can be somehow constrained in favor of better expected results, it would be nice to tell the search engine to analyze the wildcard terms before executing the search, therefore allowing for a more precise search (at least, expected).

Let's put here an example with the Spanish analyzer (which uses the snowball stemmer):

  • We index the phrase "I have an iPhone"
  • We index the phrase "I love the triad iPad/iPhone/iPod"
  • We index the phrase "I found the perfect combination: iPhone/MP3"

If we use the standard current 'query_string', when searching for "*phone*", we will only get the last phrase, due to the way in which the terms have been analyzed:

"I have an iPhone":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"hav","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"an","start_offset":7,"end_offset":9,"type":"","position":3},{"token":"iphon","start_offset":10,"end_offset":16,"type":"","position":4}]}

"I love the triad iPad/iPhone/iPod":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"lov","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"the","start_offset":7,"end_offset":10,"type":"","position":3},{"token":"tri","start_offset":11,"end_offset":16,"type":"","position":4},{"token":"ipad","start_offset":17,"end_offset":21,"type":"","position":5},{"token":"iphon","start_offset":22,"end_offset":28,"type":"","position":6},{"token":"ipod","start_offset":29,"end_offset":33,"type":"","position":7}]}

"I found the perfect combination: iPhone/MP3":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"found","start_offset":2,"end_offset":7,"type":"","position":2},{"token":"the","start_offset":8,"end_offset":11,"type":"","position":3},{"token":"perfect","start_offset":12,"end_offset":19,"type":"","position":4},{"token":"combination","start_offset":20,"end_offset":31,"type":"","position":5},{"token":"iphone/mp3","start_offset":33,"end_offset":43,"type":"","position":6}]}

See how the latter stems "iPhone/MP3" as "iphone/mp3"? Hence this is the only one matching a 'query_string' equal to "*phone*" (and similar 'unexpected' results occur when using just one leading or trailing wildcard as well).

This result would be dissapointing for the user, as she'd expect at least something like "iPhone" or even "telephone" to be returned as a result, but due to fact that the Spanish analyzer will always remove the trailing 'e' from most of the words, it won't find them.

So enhancement would be to provide a mechanism, in the form of a parameter, for instance, in the 'query_string', that would tell the ES query parser to analyze those search terms surrounded by wildcards (i.e. either enclosed completely, or just with a leading or trailing wildcard).

Following our previous example, a 'query_string' for "*phone*" would be actually analyzed in the Spanish analyzer as "*phon*" therefore returning absolutely all the phrases previously created, which would be the expected and reasonable behaviour from a user's perspective. Of course, it could have some side effects on other searches, but as a parameter, it would be up to the search designer to either use it or not.

@kimchy
elastic member

Query: Provide an option to analyze wildcard/prefix in query_string / field queries, closed by 25124b0.

@kimchy kimchy closed this Mar 17, 2011
@deinspanjer deinspanjer pushed a commit that referenced this issue Apr 28, 2011
@kimchy kimchy Query: Provide an option to analyze wildcard/prefix in query_string /…
… field queries, closes #787.
25124b0
@jprante jprante added a commit to jprante/elasticsearch that referenced this issue Nov 10, 2014
@jprante jprante Query: add option for analyze wildcard/prefix also to simple_query_st…
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.
49f2aab
@dakrone dakrone added a commit that referenced this issue Nov 11, 2014
@jprante jprante Query: add option for analyze wildcard/prefix also to simple_query_st…
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.
8aa64c6
@dakrone dakrone added a commit that referenced this issue Nov 11, 2014
@jprante jprante Query: add option for analyze wildcard/prefix also to simple_query_st…
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.
a8f2de8
@jprante jprante added a commit to jprante/elasticsearch that referenced this issue Mar 13, 2015
@jprante jprante Query: add option for analyze wildcard/prefix also to simple_query_st…
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.

Signed-off-by: Jörg Prante <joergprante@gmail.com>
d1d203e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment