Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the query builder serialization from QueryShardException message #51885

Merged
merged 2 commits into from
Feb 6, 2020

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Feb 4, 2020

QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes #51843
Closes #48910

QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes elastic#51843
Closes elastic#48910
@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories :Core/Infra/Logging Log management and logging utilities v8.0.0 v7.7.0 labels Feb 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Logging)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member

dnhatn commented Feb 5, 2020

@elasticmachine update branch

@altinp
Copy link

altinp commented Feb 5, 2020

Would this also deal with cases when the long query is quoted in the Lucene ParseException mesg, e.g. in case of too many boolean clauses in a query_string query? Lucene does:

    } catch (TooManyClauses tmc) {
      ParseException e = new ParseException("Cannot parse '" +query+ "': too many boolean clauses");

In such a case, the shard failure reason currently looks like the below and the specific reason of TooManyClauses, boolean, is (only) shown in reason.caused_by.reason:

{
  "reason": {
    "type": "query_shard_exception",
    "reason": "Failed to parse query [(\"phrase 1\") OR (\"phrase 2\") OR ...  (\"phrase 1030\")]",
    "index_uuid": "ejeEEpE0QeCnyWgXOsXQrw",
    "index": "foo",
    "caused_by": {
      "type": "parse_exception",
      "reason": "parse_exception: Cannot parse '(\"phrase 1\") OR (\"phrase 2\") OR ... (\"phrase 1030\")': too many boolean clauses",
      "caused_by": {
        "type": "too_many_clauses",
        "reason": "too_many_clauses: maxClauseCount is set to 1024"
      }
    }
  }
}

It would be nice to retain Lucene's specific reason ("too many boolean clauses), but perhaps the proper fix there is for Lucene to put the reason before the query. In the interim, when preparing the ES response for display to our users, we have been trimming the reason to retain its first and last few hundred chars and gut the middle of the string. So users see the Lucene messages as well as the begin and end of their query.

@altinp
Copy link

altinp commented Feb 5, 2020

also, would this be backported to ES 6?

@jimczi
Copy link
Contributor Author

jimczi commented Feb 6, 2020

Would this also deal with cases when the long query is quoted in the Lucene ParseException mesg, e.g. in case of too many boolean clauses in a query_string query? Lucene does:

No, this doesn't cover this case. We have another issue opened for the query_string specifically.

also, would this be backported to ES 6?

I don't think so, as you noticed there are other cases that we don't cover so this is more a bandage than a real fix.

@jimczi jimczi merged commit 1dcf1df into elastic:master Feb 6, 2020
@jimczi jimczi deleted the trim_query_shard_exception branch February 6, 2020 07:25
jimczi added a commit that referenced this pull request Feb 6, 2020
…ge (#51885)

QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes #51843
Closes #48910
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Logging Log management and logging utilities >enhancement :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't serialize the entire query when query parsing fails SearchShardFailure handling results in OOM
6 participants