Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting sort field via script_fields results in array_index_out_of_bounds_exception for some values #99620

Open
mad-pf opened this issue Sep 18, 2023 · 2 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@mad-pf
Copy link

mad-pf commented Sep 18, 2023

Elasticsearch Version

8.9.2

Installed Plugins

analysis-icu

Java Version

bundled

OS Version

as shipped in docker.elastic.co/elasticsearch/elasticsearch:8.9.2; Linux kernel 6.4

Problem Description

Getting the sort value of some simple text values (such as "Q" or "W") in a painless script results in an array_index_out_of_bounds_exception while the same request has no issues with other values such as "A" or "QQ".

Seems to work fine on ES 7.17, but is also broken on ES 8.5. analysis-icu plugin is required to reproduce the problem.

Steps to Reproduce

The following script creates a simple index, adds two objects and searches for these objects with script_fields. For the first object, it works fine, for the second object, an error is returned:

#!/bin/sh

ES=http://localhost:9202
ESIDX=debug_69726

curl -s -XDELETE $ES/$ESIDX >/dev/null

curl -s -XPUT $ES/$ESIDX?pretty=true -H 'Content-Type: application/json' -d '{
    "settings": {}
}'

curl -s -XPOST $ES/$ESIDX/_mapping?pretty=true -H 'Content-Type: application/json' -d '{
    "properties": {
        "name": {
            "type": "keyword",
            "fields": {
                "sort": {
                    "type": "icu_collation_keyword"
                },
                "text": {
                    "type": "text"
                }
            }
        }
    }
}'

curl -s -XPOST $ES/$ESIDX/_bulk?pretty=true\&refresh=true -H 'Content-Type: application/json' -d \
'{"index": {"_id": "foo:1"}}
{"name": "A"}
{"index": {"_id": "foo:2"}}
{"name": "Q"}
'

echo "foo:1, no problem"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:1"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

echo "foo:2, explodes"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:2"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

Result for the first object, no problem:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "debug_69726",
        "_id" : "foo:1",
        "_score" : 1.0,
        "fields" : {
          "_sort" : [
            "*\u0001\u0005\u0001܀"
          ]
        }
      }
    ]
  }
}

Result for the second object, the error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
          "doc['name.sort'].value",
          "                ^---- HERE"
        ],
        "script" : "doc['name.sort'].value",
        "lang" : "painless",
        "position" : {
          "offset" : 16,
          "start" : 0,
          "end" : 22
        }
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "debug_69726",
        "node" : "PZ2dTLbxTWayGvJjxds3cg",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
            "doc['name.sort'].value",
            "                ^---- HERE"
          ],
          "script" : "doc['name.sort'].value",
          "lang" : "painless",
          "position" : {
            "offset" : 16,
            "start" : 0,
            "end" : 22
          },
          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 6 out of bounds for length 6"
          }
        }
      }
    ]
  },
  "status" : 400
}

Logs (if relevant)

No response

@mad-pf mad-pf added >bug needs:triage Requires assignment of a team area label labels Sep 18, 2023
@romseygeek romseygeek added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Sep 19, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Sep 19, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@juntezhang
Copy link

juntezhang commented Oct 3, 2023

This also affects the sort when the field is multi-valued (array), as by default the sort mode is by max, so then this exception will be triggered when setting the field type to icu_collation_keyword.

juntezhang referenced this issue Nov 16, 2023
…pers (#99361)

We mostly have a handful of `FieldType` values here across all mappers and none of them contain
attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up
subsequent mapper heap savings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants