Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch 0.90 fails when "highlight" contains a field of type "long" #3211

Closed
ariasdelrio opened this Issue Jun 20, 2013 · 7 comments

Comments

Projects
None yet
4 participants
@ariasdelrio
Copy link

commented Jun 20, 2013

If the "highlight" section of a searh query contains a field of type "long", an error occurs. That didn't happen with the old version.
This is a sample script I used to test this behaviour:

curl -s -o /dev/null -X DELETE "http://${hostname}:9200/test?pretty"

curl -s -o /dev/null -X PUT "http://${hostname}:9200/test/?pretty"  -d '
{
   "mappings" : {
      "test1" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            }
         }
      },
      "test2" : {
        "properties" : {
            "text" : {
                "store": "yes",
                "type": "string"
            },
            "number" : {
                "store": "yes",
                "type": "long"
            }
         }
      }
   }
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test1?pretty"  -d '
{
   "text" : "test one"
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test/test2?pretty"  -d '
{
   "text" : "test two",
   "number" : 100
}
'

curl -s -o /dev/null -X POST "http://${hostname}:9200/test1/_refresh"
curl -s -o /dev/null -X POST "http://${hostname}:9200/test2/_refresh"

sleep 3

curl -s -X GET "http://${hostname}:9200/test/_search?pretty" -d '
{
  "query": {
        "prefix": {
          "text": "test"
        }
  },
  "highlight": {
    "number_of_fragments": 0,
    "fields": {
        "text": {},
        "number": {}
    }
  }

}'

If you run it against an elasticsearch 0.20.6 server, it returns the two hits, correctly highlighted.
However, if you run it against an elasticsearch 0.90 server, this happens:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 4,
    "failed" : 1,
    "failures" : [ {
      "index" : "test",
      "shard" : 0,
      "status" : 500,
      "reason" : "FetchPhaseExecutionException[[test][0]: query[text:test*],from[0],size[10]: Fetch Failed [Failed to highlight field [number]]]; nested: StringIndexOutOfBoundsException[String index out of range: -1]; "
    } ]
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "test1",
      "_id" : "y51XzuuaQ6uYnigzhq5EtA",
      "_score" : 1.0, "_source" :
{
   "text" : "test one"
}
,
      "highlight" : {
        "text" : [ "<em>test</em> one" ]
      }
    } ]
  }
}

@ghost ghost assigned jpountz Jun 20, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 20, 2013

thanks for opening this issue! We will look into it soon

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jun 20, 2013

@Quecksilber Thanks for the detailed steps, I could reproduce the problem. Can you confirm that with 0.20 you didn't expect the numbers to be highlighted even if they were part of the query? (0.20 doesn't look able to do so)

@ariasdelrio

This comment has been minimized.

Copy link
Author

commented Jun 20, 2013

@jpountz I didn't expect the numbers to be highlighted in the query. To
say the truth, we only realized we were automatically including numeric
fields in the "highlight" section after this error came up (it is an ID in
the real scenario). Maybe this should be an error, but with a different
message, like "numeric values cannot be highlighted" or something like that.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Jun 21, 2013

Fix NumericTokenizer.
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing elastic#3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2013

I discussed this issue with Simon and we decided to make the behavior match 0.20: highlighting numeric terms is not supported but doesn't raise errors.

@ariasdelrio

This comment has been minimized.

Copy link
Author

commented Jun 21, 2013

Thanks :)

On Fri, Jun 21, 2013 at 4:36 PM, Adrien Grand notifications@github.comwrote:

I discussed this issue with Simon and we decided to make the behavior
match 0.20: highlighting numeric terms is not supported but doesn't raise
errors.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3211#issuecomment-19819527
.

@smtlaissezfaire

This comment has been minimized.

Copy link

commented Jun 23, 2013

+1

jpountz added a commit that referenced this issue Jun 24, 2013

Fix NumericTokenizer.
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing #3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.

jpountz added a commit that referenced this issue Jun 24, 2013

Fix NumericTokenizer.
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing #3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.

@jpountz jpountz closed this Jun 24, 2013

@smtlaissezfaire

This comment has been minimized.

Copy link

commented Jun 24, 2013

Awesome! Thanks!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Fix NumericTokenizer.
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing elastic#3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.