ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

ariasdelrio · 2013-06-20T08:13:29Z

If the "highlight" section of a searh query contains a field of type "long", an error occurs. That didn't happen with the old version.
This is a sample script I used to test this behaviour:

curl -s -o /dev/null -X DELETE "http://${hostname}:9200/test?pretty"

curl -s -o /dev/null -X PUT "http://${hostname}:9200/test/?pretty"  -d '
{
   "mappings" : {
      "test1" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            }
         }
      },
      "test2" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            },
            "number" : {
                "store": "yes",
                "type": "long"
            }
         }
      }
   }
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test1?pretty"  -d '
{
   "text" : "test one"
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test2?pretty"  -d '
{
   "text" : "test two",
   "number" : 100
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test1/_refresh"
curl -s -o /dev/null -X POST "http://${hostname}:9200/test2/_refresh"

sleep 3

curl -s -X GET "http://${hostname}:9200/test/_search?pretty" -d '
{
  "query": {
        "prefix": {
          "text": "test"
        }
  },
  "highlight": {
    "number_of_fragments": 0,
    "fields": {
        "text": {},
        "number": {}
    }
  }

}'

If you run it against an elasticsearch 0.20.6 server, it returns the two hits, correctly highlighted.
However, if you run it against an elasticsearch 0.90 server, this happens:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 4,
    "failed" : 1,
    "failures" : [ {
      "index" : "test",
      "shard" : 0,
      "status" : 500,
      "reason" : "FetchPhaseExecutionException[[test][0]: query[text:test*],from[0],size[10]: Fetch Failed [Failed to highlight field [number]]]; nested: StringIndexOutOfBoundsException[String index out of range: -1]; "
    } ]
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "test1",
      "_id" : "y51XzuuaQ6uYnigzhq5EtA",
      "_score" : 1.0, "_source" :
{
   "text" : "test one"
}
,
      "highlight" : {
        "text" : [ "<em>test</em> one" ]
      }
    } ]
  }
}

s1monw · 2013-06-20T08:36:04Z

thanks for opening this issue! We will look into it soon

jpountz · 2013-06-20T12:41:45Z

@Quecksilber Thanks for the detailed steps, I could reproduce the problem. Can you confirm that with 0.20 you didn't expect the numbers to be highlighted even if they were part of the query? (0.20 doesn't look able to do so)

ariasdelrio · 2013-06-20T13:32:41Z

@jpountz I didn't expect the numbers to be highlighted in the query. To
say the truth, we only realized we were automatically including numeric
fields in the "highlight" section after this error came up (it is an ID in
the real scenario). Maybe this should be an error, but with a different
message, like "numeric values cannot be highlighted" or something like that.

NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its implementations had a few issues: its reset() method was not idempotent, causing exceptions if reset() was called twice (causing elastic#3211) and it had no attributes, meaning that the only thing it allowed to do is counting the number of generated tokens. The reason why indexing numeric data worked is that the mapper's parseCreateField directly generates a NumericTokenStream and by-passes the analyzer. This commit makes NumericTokenizer.reset idempotent and makes consuming a NumericTokenizer behave the same way as consuming the underlying NumericTokenStream.

jpountz · 2013-06-21T14:36:08Z

I discussed this issue with Simon and we decided to make the behavior match 0.20: highlighting numeric terms is not supported but doesn't raise errors.

ariasdelrio · 2013-06-21T14:40:28Z

Thanks :)

On Fri, Jun 21, 2013 at 4:36 PM, Adrien Grand notifications@github.comwrote:

I discussed this issue with Simon and we decided to make the behavior
match 0.20: highlighting numeric terms is not supported but doesn't raise
errors.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3211#issuecomment-19819527
.

smtlaissezfaire · 2013-06-23T00:21:09Z

+1

NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its implementations had a few issues: its reset() method was not idempotent, causing exceptions if reset() was called twice (causing #3211) and it had no attributes, meaning that the only thing it allowed to do is counting the number of generated tokens. The reason why indexing numeric data worked is that the mapper's parseCreateField directly generates a NumericTokenStream and by-passes the analyzer. This commit makes NumericTokenizer.reset idempotent and makes consuming a NumericTokenizer behave the same way as consuming the underlying NumericTokenStream.

smtlaissezfaire · 2013-06-24T19:49:22Z

Awesome! Thanks!

NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its implementations had a few issues: its reset() method was not idempotent, causing exceptions if reset() was called twice (causing elastic#3211) and it had no attributes, meaning that the only thing it allowed to do is counting the number of generated tokens. The reason why indexing numeric data worked is that the mapper's parseCreateField directly generates a NumericTokenStream and by-passes the analyzer. This commit makes NumericTokenizer.reset idempotent and makes consuming a NumericTokenizer behave the same way as consuming the underlying NumericTokenStream.

ghost assigned jpountz Jun 20, 2013

jpountz mentioned this issue Jun 21, 2013

Fix NumericTokenizer. #3214

Closed

jpountz closed this as completed Jun 24, 2013

jpountz mentioned this issue Jul 3, 2013

Error on MoreLikeThis API with Non Stored Numeric Fields #3252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

ariasdelrio commented Jun 20, 2013

s1monw commented Jun 20, 2013

jpountz commented Jun 20, 2013

ariasdelrio commented Jun 20, 2013

jpountz commented Jun 21, 2013

ariasdelrio commented Jun 21, 2013

smtlaissezfaire commented Jun 23, 2013

smtlaissezfaire commented Jun 24, 2013

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

Comments

ariasdelrio commented Jun 20, 2013

s1monw commented Jun 20, 2013

jpountz commented Jun 20, 2013

ariasdelrio commented Jun 20, 2013

jpountz commented Jun 21, 2013

ariasdelrio commented Jun 21, 2013

smtlaissezfaire commented Jun 23, 2013

smtlaissezfaire commented Jun 24, 2013