-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Dynamically update maxClauseCount
based on resources on a node
#12549
Comments
Thanks @harshavamsi
AFAIK this is not a cluster but node level setting
I think it would make sense to explore this path, but it also not clear to me how to align this max number with cpu / heap (that would vary per node), @msfroh any heuristics you may be aware of? thank you |
[Triage - attendees 1 2 3] |
I have a few thoughts about this:
@reta -- do you know if there's any backward-compatibility issue with turning it into a dynamic cluster-level setting? I suppose it would break if different nodes have different values in their |
Thanks @msfroh , at high level I don't see why this settings couldn't be dynamic and at a cluster level, but the evil is in details I think: the [1] elastic/elasticsearch#18341 |
Is your feature request related to a problem? Please describe
Today lucene has a concept of
max_clause_count
which prevents the number of clauses from exceeding when expanding wildcard and prefix queries. OpenSearch inherited ES 7.10 which used a staticindex.query.bool.max_clause_count
setting that is set to a max of1024
by default.In https://issues.apache.org/jira/browse/LUCENE-8811, lucene changed the clause count from affecting only boolean queries to all queries on the index by adding the constraint at the
indexSearcher
level. This means that the expansion effects all the fields and is now very easy to cross the limit.A workaround was to use a
rewrite
to avoid hitting that limit, but it no longer works since the number of clauses after a rewrite is now what is counted.A lot of users trying to migrate from OS 1.x to 2.x have reported this problem where their queries would previously not hit the clause limit, but since upgrading have now run into the limits. ElasticSearch themselves have updated the way that the limit is computed and moved away from a static limit of 1024 to a more dynamic resource based(thread pool and heap size) limit.
Graylog2/graylog2-server#14272 is an example of people running into this issue.
Describe the solution you'd like
Rather than relying on
INDICES_MAX_CLAUSE_COUNT_SETTING
from cluster settings, compute threadpool and heap sizes from jvm stats and use them to determine clause limit.Is there a better metric to compute them? The limit was introduced to prevent rouge queries from consuming too many threads, CPU, and memory. It would make sense to update the limit based on these attributes.
Related component
Search:Resiliency
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: