Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

Closed
ariasdelrio opened this issue Jun 20, 2013 · 7 comments
Closed

Comments

@ariasdelrio
Copy link

If the "highlight" section of a searh query contains a field of type "long", an error occurs. That didn't happen with the old version.
This is a sample script I used to test this behaviour:

curl -s -o /dev/null -X DELETE "http://${hostname}:9200/test?pretty"

curl -s -o /dev/null -X PUT "http://${hostname}:9200/test/?pretty"  -d '
{
   "mappings" : {
      "test1" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            }
         }
      },
      "test2" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            },
            "number" : {
                "store": "yes",
                "type": "long"
            }
         }
      }
   }
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test1?pretty"  -d '
{
   "text" : "test one"
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test2?pretty"  -d '
{
   "text" : "test two",
   "number" : 100
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test1/_refresh"
curl -s -o /dev/null -X POST "http://${hostname}:9200/test2/_refresh"

sleep 3

curl -s -X GET "http://${hostname}:9200/test/_search?pretty" -d '
{
  "query": {
        "prefix": {
          "text": "test"
        }
  },
  "highlight": {
    "number_of_fragments": 0,
    "fields": {
        "text": {},
        "number": {}
    }
  }

}'

If you run it against an elasticsearch 0.20.6 server, it returns the two hits, correctly highlighted.
However, if you run it against an elasticsearch 0.90 server, this happens:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 4,
    "failed" : 1,
    "failures" : [ {
      "index" : "test",
      "shard" : 0,
      "status" : 500,
      "reason" : "FetchPhaseExecutionException[[test][0]: query[text:test*],from[0],size[10]: Fetch Failed [Failed to highlight field [number]]]; nested: StringIndexOutOfBoundsException[String index out of range: -1]; "
    } ]
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "test1",
      "_id" : "y51XzuuaQ6uYnigzhq5EtA",
      "_score" : 1.0, "_source" :
{
   "text" : "test one"
}
,
      "highlight" : {
        "text" : [ "<em>test</em> one" ]
      }
    } ]
  }
}
@ghost ghost assigned jpountz Jun 20, 2013
@s1monw
Copy link
Contributor

s1monw commented Jun 20, 2013

thanks for opening this issue! We will look into it soon

@jpountz
Copy link
Contributor

jpountz commented Jun 20, 2013

@Quecksilber Thanks for the detailed steps, I could reproduce the problem. Can you confirm that with 0.20 you didn't expect the numbers to be highlighted even if they were part of the query? (0.20 doesn't look able to do so)

@ariasdelrio
Copy link
Author

@jpountz I didn't expect the numbers to be highlighted in the query. To
say the truth, we only realized we were automatically including numeric
fields in the "highlight" section after this error came up (it is an ID in
the real scenario). Maybe this should be an error, but with a different
message, like "numeric values cannot be highlighted" or something like that.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Jun 21, 2013
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing elastic#3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
@jpountz
Copy link
Contributor

jpountz commented Jun 21, 2013

I discussed this issue with Simon and we decided to make the behavior match 0.20: highlighting numeric terms is not supported but doesn't raise errors.

@ariasdelrio
Copy link
Author

Thanks :)

On Fri, Jun 21, 2013 at 4:36 PM, Adrien Grand notifications@github.comwrote:

I discussed this issue with Simon and we decided to make the behavior
match 0.20: highlighting numeric terms is not supported but doesn't raise
errors.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3211#issuecomment-19819527
.

@smtlaissezfaire
Copy link

+1

jpountz added a commit that referenced this issue Jun 24, 2013
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing #3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
jpountz added a commit that referenced this issue Jun 24, 2013
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing #3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
@jpountz jpountz closed this as completed Jun 24, 2013
@smtlaissezfaire
Copy link

Awesome! Thanks!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing elastic#3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants