-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds search backpressure documentation #1790
Conversation
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
_opensearch/search-backpressure.md
Outdated
{ | ||
"error" : { | ||
"root_cause" : [ | ||
{ | ||
"type" : "task_cancelled_exception", | ||
"reason" : "Task is cancelled due to high resource consumption" | ||
} | ||
], | ||
"type" : "search_phase_execution_exception", | ||
"reason" : "all shards failed", | ||
"phase" : "query", | ||
"grouped" : true, | ||
"failed_shards" : [ | ||
{ | ||
"shard" : 0, | ||
"index" : "nyc_taxis", | ||
"node" : "MGkMkg9wREW3IVewZ7U_jw", | ||
"reason" : { | ||
"type" : "task_cancelled_exception", | ||
"reason" : "Task is cancelled due to high resource consumption" | ||
} | ||
} | ||
] | ||
}, | ||
"status" : 500 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing an up-to-date sample cancellation response. Can you please update?
{
"error": {
"root_cause": [
{
"type": "task_cancelled_exception",
"reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
},
{
"type": "task_cancelled_exception",
"reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "foobar",
"node": "7yIqOeMfRyWW1rHs2S4byw",
"reason": {
"type": "task_cancelled_exception",
"reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
}
},
{
"shard": 1,
"index": "foobar",
"node": "7yIqOeMfRyWW1rHs2S4byw",
"reason": {
"type": "task_cancelled_exception",
"reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
}
}
]
},
"status": 500
}
_opensearch/search-backpressure.md
Outdated
|
||
An observer thread tracks the resource consumption of each task thread. It measures the resource consumption at several checkpoints during the query phase of a shard search request. If the node is determined to be under duress based on the JVM memory pressure and CPU utilization, the server examines the resource consumption for each search task. It determines if the CPU usage and elapsed time are within their fixed thresholds, and it compares the heap usage against the rolling average of the heap usage of the 100 most recent tasks. If the task is among the most resource-intensive based on these criteria, the task in canceled. | ||
|
||
Every minute OpenSearch can cancel at most 1% of the number of currently running search shard tasks. Once a task is canceled, OpenSearch monitors the node for the next two seconds to determine if it is still under duress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest re-wording this slightly.
OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is out of duress.
_opensearch/search-backpressure.md
Outdated
- Heap usage | ||
- Elapsed time | ||
|
||
An observer thread tracks the resource consumption of each task thread. It measures the resource consumption at several checkpoints during the query phase of a shard search request. If the node is determined to be under duress based on the JVM memory pressure and CPU utilization, the server examines the resource consumption for each search task. It determines if the CPU usage and elapsed time are within their fixed thresholds, and it compares the heap usage against the rolling average of the heap usage of the 100 most recent tasks. If the task is among the most resource-intensive based on these criteria, the task in canceled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest re-wording this slightly.
An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, then the resource usage of each search shard task is examined and compared against some tunable thresholds. CPU usage, heap usage and elapsed time are considered to give each task a cancellation score which is then used to cancel the most resource-intensive tasks.
_opensearch/search-backpressure.md
Outdated
|
||
## Canceled queries | ||
|
||
If a query is canceled, instead of receiving search results you receive an error from the server similar to the error below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what all shards failed, it may be possible that OpenSearch returns partial results. Can we re-word this slightly?
To retrieve the stats, use the following request: | ||
|
||
```json | ||
GET _nodes/stats/search_backpressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The response below is with human-readable fields enabled. Can you update this to:
GET _nodes/stats/search_backpressure?human=true
_opensearch/search-backpressure.md
Outdated
cancellation_count | Integer | The number of tasks canceled because of excessive heap usage since the node last restarted. | ||
current_max_bytes | Integer | The maximum heap usage for all tasks currently running on the node, in bytes. | ||
current_avg_bytes | Integer | The average heap usage for all tasks currently running on the node, in bytes. | ||
rolling_avg_bytes | Integer | The rolling average heap usage for the 100 most recent tasks, in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Rolling average is not hard-coded to work with just 100
most recent tasks.
This is a configurable setting defined by search_backpressure.search_shard_task.heap_moving_average_window_size
. 100 is just the default value for it.
_opensearch/search-backpressure.md
Outdated
## Search backpressure settings | ||
|
||
Search backpressure adds several settings to the standard OpenSearch cluster settings. These settings are dynamic, so you can change the default behavior of this feature without restarting your cluster. | ||
|
||
Setting | Default | Description | ||
:--- | :--- | :--- | ||
search_backpressure.<br> mode | `monitor_only` | The [mode](#search-backpressure-modes) for search backpressure. Valid values are `monitor_only`, `enforced`, or `disabled`. | ||
search_backpressure.<br> interval | 1 second | The interval at which the observer thread measures the resource consumption and cancels tasks. | ||
search_backpressure.<br> cancellation_ratio | 10% | The maximum percentage of tasks to cancel out of the number of successful task completions. | ||
search_backpressure.<br> cancellation_rate | 0.003 | The maximum number of tasks to cancel per millisecond. | ||
search_backpressure.<br> cancellation_burst | 10 | The maximum number of tasks that can be canceled before no further cancellations are made. | ||
search_backpressure.<br> node_duress.<br> num_consecutive_breaches | 3 | The number of consecutive limit breaches after which the node is marked in duress. | ||
search_backpressure.<br> node_duress.<br> cpu_threshold | 90% | The CPU usage threshold (in percentage) for a node to be considered in duress. | ||
search_backpressure.<br> node_duress.<br> heap_threshold | 70% | The heap usage threshold (in percentage) for a node to be considered in duress. | ||
search_backpressure.<br> search_heap_threshold | 5% | The heap usage threshold (in percentage) for the sum of heap usages across all search tasks before server-side cancellation is applied. | ||
search_backpressure.<br> search_task_heap_threshold | 0.5% | The heap usage threshold (in percentage) for one task before it is considered for cancellation. | ||
search_backpressure.<br> search_task_heap_variance | 2 | The heap usage variance for one task before it is considered for cancellation. A task is considered for cancellation when `taskHeapUsage` is greater than or equal to `heapUsageMovingAverage` · `variance`. | ||
search_backpressure.<br> search_task_cpu_time_threshold | 15 seconds | The CPU usage threshold (in milliseconds) for one task before it is considered for cancellation. | ||
search_backpressure.<br> search_task_elapsed_time_threshold | 30 seconds | The elapsed time threshold (in milliseconds) for one task before it is considered for cancellation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these settings have been renamed. Sharing the latest ones along with simplified descriptions.
-
search_backpressure.mode
(default monitor_only)
The search backpressure mode. Valid values aremonitor_only
,enforced
, ordisabled
. -
search_backpressure.interval_millis
(default 1 second)
The interval at which the observer thread measures the resource usage and cancels tasks. -
search_backpressure.cancellation_ratio
(default 10%)
The maximum number of tasks to cancel as a fraction of successful task completions. -
search_backpressure.cancellation_rate
(default 0.003)
The maximum number of tasks to cancel per millisecond of elapsed time. -
search_backpressure.cancellation_burst
(default 10)
The maximum number of tasks to cancel in a single iteration of the observer thread. -
search_backpressure.node_duress.num_successive_breaches
(default 3)
The number of of successive limit breaches after which the node is considered under duress. -
search_backpressure.node_duress.cpu_threshold
(default 90%)
The CPU usage threshold (in percentage) for a node to be considered in duress. -
search_backpressure.node_duress.heap_threshold
(default 70%)
The heap usage threshold (in percentage) for a node to be considered in duress. -
search_backpressure.search_shard_task.total_heap_percent_threshold
(default 5%)
The heap usage threshold (in percentage) for the sum of all search shard tasks before cancellation is applied. -
search_backpressure.search_shard_task.heap_percent_threshold
(default 0.5%)
The heap usage threshold (in percentage) for a single search shard task before it is considered for cancellation. -
search_backpressure.search_shard_task.heap_variance
(default 2.0)
The minimum variance of a single search shard task's heap usage usage compared to the rolling average of previously completed tasks before it is considered for cancellation. -
search_backpressure.search_shard_task.heap_moving_average_window_size
(default 100)
The number of previously completed search shard tasks to consider when calculating the moving average of heap usage. -
search_backpressure.search_shard_task.cpu_time_millis_threshold
(default 15 seconds)
The CPU usage threshold (in milliseconds) for a single search shard task before it is considered for cancellation. -
search_backpressure.search_shard_task.elapsed_time_millis_threshold
(default 30 seconds)
The elapsed time threshold (in milliseconds) for a single search shard task before it is considered for cancellation.
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Thanks for these changes. LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimal changes. LGTM.
_opensearch/search-backpressure.md
Outdated
- Heap usage | ||
- Elapsed time | ||
|
||
An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third sentence: insert comma after "heap usage."
_opensearch/search-backpressure.md
Outdated
|
||
## Canceled queries | ||
|
||
If a query is canceled, OpenSearch may return partial results in case some shards failed. If all shards failed, OpenSearch returns an error from the server similar to the error below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the second part of this sentence, I bolded my suggestion: "If a query is canceled, OpenSearch may return partial results if some shards failed."
_opensearch/search-backpressure.md
Outdated
search_backpressure.<br> node_duress.<br> heap_threshold | 70% | The heap usage threshold (in percentage) for a node to be considered in duress. | ||
search_backpressure.<br> search_shard_task.<br> total_heap_percent_threshold | 5% | The heap usage threshold (in percentage) for the sum of heap usages of all search shard tasks before cancellation is applied. | ||
search_backpressure.<br> search_shard_task.<br> heap_percent_threshold | 0.5% | The heap usage threshold (in percentage) for a single search shard task before it is considered for cancellation. | ||
search_backpressure.<br> search_shard_task.<br> heap_variance | 2.0 | The minimum variance of a single search shard task's heap usage usage compared to the rolling average of previously completed tasks before it is considered for cancellation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete second "usage" after "heap usage."
_opensearch/search-backpressure.md
Outdated
|
||
Field Name | Data Type | Description | ||
:--- | :--- | :--- | ||
search_backpressure | Object | Contains statistics about search backpressure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, either delete the verb "contains" in line 173 and 174 or add a verb to each description in lines 175-177.
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!
_opensearch/search-backpressure.md
Outdated
- Heap usage | ||
- Elapsed time | ||
|
||
An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage, and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the node being "determined to be under duress" standard language in this context? It reads a bit strangely to me, as though we're anthropomorphizing the node. Do we mean something like "If the node continues to be under duress"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not "continues", it is whether it becomes under duress. So, we're checking periodically for the node health. Normally, it's not under duress. If we determine that it is under duress, then we remediate.
_opensearch/search-backpressure.md
Outdated
|
||
An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage, and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks. | ||
|
||
OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress. | |
OpenSearch limits the number of cancellations to a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded for clarity.
Co-authored-by: Nate Bower <nbower@amazon.com>
Co-authored-by: Nate Bower <nbower@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Fixes #795
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.