Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root search analyzer doesn't act as default for fields #3102

Closed
gakhov opened this issue May 28, 2013 · 7 comments
Closed

Root search analyzer doesn't act as default for fields #3102

gakhov opened this issue May 28, 2013 · 7 comments
Labels
:Search/Mapping Index mappings, including merging and defining field types

Comments

@gakhov
Copy link
Contributor

gakhov commented May 28, 2013

Recently, we moved from version 0.19.11 to last stable 0.90.0 and found one very strange behavior that looks like an issue.

If on the item root level we have specified our custom index_analyzer, search_analyzer (or just analyzer), then index_analyzer works well, but not the search_analyzer. Also, fif we update existing mapping with explicitly specifying search_analyzer on the field level, then it still doesn't seem to work and ES uses standard one.

To reproduce:

Create a new index and define our custom analyzer de_stem:

curl -XPUT 'http://localhost:9200/issue/' -d '{"index": {"number_of_shards": 1,"analysis": {"filter": {"de_snowball": {"type": "snowball","language": "German"}},"analyzer": {"de_stem": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "de_snowball"]}}}}},"number_of_replicas": 0}}'

Put mapping with specified index_analyzer and search_analyzer :

curl -XPUT 'http://localhost:9200/issue/item/_mapping' -d '{"item": {"index_analyzer" : "de_stem","search_analyzer" : "de_stem","properties": {"content": {"dynamic": false,"properties": {"body": {"type": "string"}}}}}}}'

Try search_analyzer for the field content.body with Analyze API

curl -XGET 'localhost:9200/issue/_analyze?pretty=true&field=content.body' -d 'Apple'

Actual result:

{
  "tokens" : [ {
    "token" : "apple",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

Expected result:

{
  "tokens" : [ {
    "token" : "appl",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

The right (expected) result still possible to get, but with explicitly specified search_analyzer:

curl -XGET 'localhost:9200/issue/_analyze?pretty=true&field=content.body&analyzer=de_stem' -d 'Apple'

Index Analyzer is set well

As we see above, search_analyzer seems wasn't set, but index_analyzer works well.

Let's index a document:

curl -PUT 'http://localhost:9200/issue/item/1' -d '{"content" : {"body": "10 Things We Hate About Apple"}}'

If index_analyzer was set well to de_stem the word Apple should be indexed as appl, but not apple (as standard analyzer does).

Let's search for appl first:

curl -XGET 'http://localhost:9200/issue/_search?search_type=count&pretty=true' -d '{"query":{"query_string":{"fields":["content.body"],"query":"appl"}}}'

It works! We get back 1 result:

  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [ ]
  }

For the word apple, as expected, it doesn't work since search_analyzer is standard, but index_analyzer is de_stem (so, actual search term will stay apple, but indexed is appl):

curl -XGET 'http://localhost:9200/issue/_search?search_type=count&pretty=true' -d '{"query":{"query_string":{"fields":["content.body"],"query":"apple"}}}'
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  }

Specifying search analyzer with Put Mapping API doesn't help

Ok, i try to update mapping and specify explicitly search analyzer for content.body field on the existing index we created above:

curl -PUT 'http://localhost:9200/issue/item/_mapping' -d '{"item": {"properties": {"content": {"dynamic": false,"properties": {"body": {"type": "string", "search_analyzer": "de_stem"}}}}}}'

Response is ok, but the all problems described above stay the same. So, it seems the search_analyzer for the field content.body is still standard.

@gakhov
Copy link
Contributor Author

gakhov commented May 31, 2013

The general issue stays the same in new release 0.90.1. Only the updating search_analyzer on runtime now seems to work.

@s1monw
Copy link
Contributor

s1monw commented May 31, 2013

hey @gakhov we will look into this soon hopefully. Thanks for reporting it!

@ghost ghost assigned spinscale May 31, 2013
@kimchy
Copy link
Member

kimchy commented Jun 1, 2013

I have quickly looked into it, and I can see why it happens. The reason is that the analyzer (index/search) are associated with the types, so when using APIs that are not directly using a type, the analyzer can't be derived automatically (just based on the field name).

In the above test case, the document gets indexed properly (because it has the type). When using the analyze API, the type information is not there, so it can't derive the analyzer (and we don't expose the ability to provide a type in the analyze API).

In the search case, the search is executed on the index as a whole, so again the analyzer can't be derived based on the type, on the other hand, if its executes explicitly on the type (item), things will work well:

curl -XGET 'http://localhost:9200/issue/item/_search?search_type=count&pretty=true' -d '{"query":{"query_string":{"fields":["content.body"],"query":"apple"}}}'

This behavior mainly comes from the fact that it gets tricky supporting multiple types with different definitions (like analysis) when executing across all types (such as executing against the index without specifying the type explicitly). But I admit its confusing. Requires some thinking into how to improve the behavior if even possible, but just wanted to post it here to explain the logic of how things work now.

@gakhov
Copy link
Contributor Author

gakhov commented Jun 2, 2013

Ah, thank you @kimchy! I didn't figure it out. This behaviour was a big problem for our application and we temporary solved it with specifing the search_analyzer on the quering.

It make fully sense for me to specify the item type on indexing, but I expected that ElasticSearch will guess the type on searching phase since type isn't required in fact for searching and I have only one type at all, so it should be easy to guess.

Actually, i thought when i update mapping, ElasticSearch goes through all fields and set analyzers from default if they don't set explicitely in mapping. So, on search phase every field has an explicit analyzer (exactly as it would be if i manually set analyzers in mapping) and then ElasticSearch doesn't need to guess the type at all since every field has it's own analyzer.

P.S. In such situation, I would expect either clean exception from ElasticSearch or right resolving item type based on field, since silence makes debug very hard.

One more comment. It seems in analyze API i can't specify the item name, so no way to check how query is analyzed (if i don't specify the analyzer explicitly) and this feels like an issue too.

@spinscale spinscale removed their assignment Jul 18, 2014
@clintongormley
Copy link

This should be fixable once we have #4081 done.

@clintongormley clintongormley added :Search/Mapping Index mappings, including merging and defining field types and removed help wanted adoptme labels Nov 29, 2014
@clintongormley
Copy link

As part of #4081, we should remove the type-level analysis settings.

@clintongormley
Copy link

Closing in favour of #8874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

No branches or pull requests

5 participants