Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting does not work when all fields are type keyword #21636

Closed
nostrebor opened this issue Nov 17, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@nostrebor
Copy link

commented Nov 17, 2016

Elasticsearch version: 5.0

Plugins installed: N/A

JVM version: 1.8u112

OS version: Windows Server 2012

Description of the problem including expected versus actual behavior: Highlighting dynamic fields which are stored as keyword is not working as expected. When searching over _all I would expect highlighting to occur on all fields of the result when using require_field_match: false. Example query can be found here:

http://pastebin.com/sAGczFhU

My use case is that nearly every search is done for an exact value or over _all. I could get extreme performance gains by shifting dynamically created fields to Keyword, and then if full-text search is needed, defining them explicitly in the mapping. However, search highlighting is still an important part of our workflow.

Do I have a misunderstanding of how highlighting works? My interpretation is that

  1. We search over _all
  2. Hits are selected for highlighting as a postprocessing step
  3. Each field is matched against the original search query if require_field_match: false is set to true and the highlighting_query option is unused.

Steps to reproduce:

  1. Create a new index with a mapping that sets dynamic fields to type Keyword
  2. Add three new documents with "message": "Kostya Test"
  3. Run the previous query (fixing the date range)

Expected: There should be highlighted search text extracted from the _source fields that are loaded at highlight-time
Actual: No text is highlighted

The same can be repeated where the search is an exact match for the field, and no highlighting is done then either.

@jimczi

This comment has been minimized.

Copy link
Member

commented Nov 17, 2016

Hi @nostrebor,
Please ask questions like these on the discussion forum: https://discuss.elastic.co/
We reserve Github for issues and feature requests.
Regarding your issue each field is highlighted using its own analyzer. Since you're using a keyword analyzer the only query that can be highlighted would be the entire value of the field. So it would only work with the query "Kostya Test".

@jimczi jimczi closed this Nov 17, 2016

@nostrebor

This comment has been minimized.

Copy link
Author

commented Nov 17, 2016

I thought that might be the case so I tried it. As I mentioned in my original post, the same can be repeated where the search is an exact match for the field, and no highlighting is done then either.

@jimczi

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

Ok now I understand the problem @nostrebor
This simple recreation exhibits the problem:

PUT t
{
    "mappings": {
        "t": {
            "properties": {
                "message": {
                    "type": "keyword",
                    "store": true
                }
            }
        }
    }
}

PUT t/t/1
{
    "message": "foo"
}

GET _search
{
   "size": 1,
   "stored_fields": "message",
   "query": {
      "match": {
         "message": "foo"
      }
   },
   "highlight": {
      "fields": {
         "message": {
             "type": "plain",
             "no_match_size": 10
         }
      }
   }
}

... returns:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "t",
            "_type": "t",
            "_id": "1",
            "_score": 0.2876821,
            "fields": {
               "message": [
                  "foo"
               ]
            },
            "highlight": {
               "message": [
                  "[66 6f 6f]"
               ]
            }
         }
      ]
   }
}

This is due to the fact that the keyword field is stored as a binary field. The highlighter does not convert the binary value into a valid string.
I'll work on a fix.
As a workaround you can use a text field with a keyword analyzer:

"message": {
  "type": "text",
  "analyzer": "keyword",
  "store": true
}

... or you can force highlighting on source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html#_force_highlighting_on_source

@nostrebor

This comment has been minimized.

Copy link
Author

commented Nov 18, 2016

For posterity, there was a separate issue of searching vs all converting the string to lowercase which would not match the keyword data iirc, which has an existing flag already. Thanks for looking into the other issue though!

jimczi added a commit that referenced this issue Nov 21, 2016

Fix highlighting on a stored keyword field (#21645)
* Fix highlighting on a stored keyword field

The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.

Fixes #21636

* Replace BytesRef#utf8ToString with MappedFieldType#valueForDisplay

jimczi added a commit that referenced this issue Nov 21, 2016

Fix highlighting on a stored keyword field (#21645)
* Fix highlighting on a stored keyword field

The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.

Fixes #21636

* Replace BytesRef#utf8ToString with MappedFieldType#valueForDisplay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.