Skip to content

Highlighting sometimes returns too much text (orders of magnitude larger than fragment_size)  #9442

@scharron

Description

@scharron

Index mapping :

{
    "mappings" : {
        "test": {
            "properties": {
                "content" : {
                    "type" : "string",
                    "analyzer" : "french"
                }
            }
        }
    }
}

Document inserted :
https://gist.github.com/scharron/684a4fbab85135c203ee

Query :

{
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "\"Vlaams Brabant\"",
          "fields": [
            "content"
          ]
        }
      }
    }
  },
  "highlight": {
    "fragment_size": 100,
    "fields": {
      "content": {}
    }
  }
}

I would expect to get about 200 characters in 2 fragments, but instead I get 12KB of data in only one fragment.
I also tried the postings and the fast vector highlighters with the same result.
The behaviour is the same for both an ES 1.4.2 and ES 1.3.4.

It may be related to other curious behaviours : http://stackoverflow.com/questions/28167990/curious-behaviour-of-fragment-size-in-elasticsearch-highlighting?noredirect=1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions