Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_index_out_of_bounds_exception when using two synonym_graph filters #74118

Open
dtrieschnigg opened this issue Jun 15, 2021 · 7 comments
Open
Labels
>bug :Search/Analysis How text is split into tokens Team:Search Meta label for search team

Comments

@dtrieschnigg
Copy link

dtrieschnigg commented Jun 15, 2021

Elasticsearch version (bin/elasticsearch --version):
7.9.3 (lucene: 8.6.2)

Description of the problem including expected versus actual behavior:

When an analyzer uses two synonym_graph filters after each other (my use case: the first filter does decompounding and the second filter expands synonyms. For instance "cellphone" is decompounded into "cell" and "phone", and "phone" is expanded with "telephone"), this results in an array_index_out_of_bounds_exception when searching for a compound

Steps to reproduce:

DELETE testindex
PUT testindex
{
    "mappings" : {
      "properties" : {
        "body" : {
          "type" : "text",
          "analyzer" : "my_analyzer"
        }
      }
    },
    "settings" : {
      "index" : {
        "analysis" : {
          "filter" : {
            "synonym1" : {
              "type" : "synonym_graph",
              "synonyms": [ "cell phone, cellphone" ]
            },
            "synonym2" : {
              "type" : "synonym_graph",
              "synonyms": [ 
                "cell, cells"
              ]
            }
          },
          "analyzer" : {
            "my_analyzer" : {
              "filter" : [
                "synonym1",
                "synonym2"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        }
      }
    }
}
POST testindex/_search
{
  "query": {
    "match": {
      "body": "cell phone"
    }
  }
}

results in:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Index 0 out of bounds for length 0",
        "index_uuid" : "Wjc4TvHQQfmOgy2c66PHJg",
        "index" : "testindex"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "testindex",
        "node" : "JdOTt9JWR0WeBkMOiC4nIQ",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Index 0 out of bounds for length 0",
          "index_uuid" : "Wjc4TvHQQfmOgy2c66PHJg",
          "index" : "testindex",
          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 0 out of bounds for length 0"
          }
        }
      }
    ]
  },
  "status" : 400
}
@dtrieschnigg dtrieschnigg added >bug needs:triage Requires assignment of a team area label labels Jun 15, 2021
@dtrieschnigg
Copy link
Author

A workaround is to use a synonym rather than a synonym_graph filter.

@DaveCTurner DaveCTurner added :Search/Analysis How text is split into tokens and removed needs:triage Requires assignment of a team area label labels Jun 15, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dtrieschnigg
Copy link
Author

Any update on this issue?

@amitmbm
Copy link
Contributor

amitmbm commented Feb 17, 2022

I am also facing this issue, I was able to reproduce the issue with the master branch code, and its impact is even more, this issue kills the ES process in my local.

@romseygeek
Copy link
Contributor

This is a long standing problem in lucene itself, that the synonym graph filter can't accept graphs as inputs: https://issues.apache.org/jira/browse/LUCENE-9966. But we shouldn't be throwing errors to the user here, and in particular we shouldn't be allowing processes to die.

There are a few things we can do to fix this, I think. Firstly we should be able to detect in ES when you have an analysis chain that pipes a graph into a filter that doesn't accept one, and throw an error (or at least emit a warning). Secondly I think we can improve our synonym filter definitions to allow grouping of inputs to make it easier to categorise synonyms without having to specify multiple filters.

@amitmbm
Copy link
Contributor

amitmbm commented Feb 21, 2022

@romseygeek , Thanks for your inputs, Iet me work on the first part of it, what would you suggest, throwing an exception and what exception should be throw here? IllegalArgumentException?

@amitmbm
Copy link
Contributor

amitmbm commented Feb 23, 2022

@romseygeek please let me know your suggestions, so that I can fix this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Analysis How text is split into tokens Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants