Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] In-flight cancellation of SearchShardTask based on resource consumption #5039

Merged

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Nov 2, 2022

Description

This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the cluster performance.

This backport PR combines changes from:

Issues Resolved

#1181

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ketan Verma ketan9495@gmail.com

…on resource consumption (opensearch-project#4575)

This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain
thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the
cluster performance.

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…on of SearchShardTask (opensearch-project#4805)

1. CpuUsageTracker: cancels tasks if they consume too much CPU
2. ElapsedTimeTracker: cancels tasks if they consume too much time
3. HeapUsageTracker: cancels tasks if they consume too much heap

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@ketanv3 ketanv3 mentioned this pull request Nov 2, 2022
6 tasks
@ketanv3 ketanv3 marked this pull request as ready for review November 2, 2022 12:20
@ketanv3 ketanv3 requested review from a team and reta as code owners November 2, 2022 12:20
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

Gradle Check (Jenkins) Run Completed with:

Added search backpressure stats to the existing node/stats API to describe:
1. the number of cancellations (currently for SearchShardTask only)
2. the current state of TaskResourceUsageTracker

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@ketanv3 ketanv3 force-pushed the backport/inflight-cancellation-2.x branch from 94feab9 to da4b84e Compare November 3, 2022 08:36
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Nov 3, 2022

Jenkins' checks are broken at the moment (opensearch-project/opensearch-ci#222). Locally verified that the build succeeds.

Ran BWC checks using the following command:

./gradlew ':qa:mixed-cluster:v2.4.0#mixedClusterTest' -Dbwc.remote=ketanv3 -Dbwc.refspec.2.4=temp_2.x
...
> Task :distribution:bwc:minor:checkoutBwcBranch
Performing checkout of temp_2.x...
Checkout hash for :distribution:bwc:minor is da4b84ee369aef21b557235da1f60047c1aafce8
...
BUILD SUCCESSFUL in 5m 34s
186 actionable tasks: 8 executed, 178 up-to-date

Verified that the cluster created matches the commit ID of this PR (da4b84e).

@Bukhtawar Bukhtawar merged commit 7c521b9 into opensearch-project:2.x Nov 3, 2022
ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Nov 3, 2022
…ource consumption (opensearch-project#5039)

* [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (opensearch-project#4575)

This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain
thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the
cluster performance.

* [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (opensearch-project#4805)

1. CpuUsageTracker: cancels tasks if they consume too much CPU
2. ElapsedTimeTracker: cancels tasks if they consume too much time
3. HeapUsageTracker: cancels tasks if they consume too much heap

* [Backport 2.x]Added search backpressure stats API

Added search backpressure stats to the existing node/stats API to describe:
1. the number of cancellations (currently for SearchShardTask only)
2. the current state of TaskResourceUsageTracker

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
@Bukhtawar Bukhtawar added the backport 2.4 Backport to 2.4 branch label Nov 3, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 3, 2022
…ource consumption (#5039)

* [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (#4575)

This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain
thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the
cluster performance.

* [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (#4805)

1. CpuUsageTracker: cancels tasks if they consume too much CPU
2. ElapsedTimeTracker: cancels tasks if they consume too much time
3. HeapUsageTracker: cancels tasks if they consume too much heap

* [Backport 2.x]Added search backpressure stats API

Added search backpressure stats to the existing node/stats API to describe:
1. the number of cancellations (currently for SearchShardTask only)
2. the current state of TaskResourceUsageTracker

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
(cherry picked from commit 7c521b9)
Bukhtawar pushed a commit that referenced this pull request Nov 3, 2022
…ource consumption (#5039) (#5058)

* [Backport 2.x] Added in-flight cancellation of SearchShardTask based on resource consumption (#4575)

This feature aims to identify and cancel resource intensive SearchShardTasks if they have breached certain
thresholds. This will help in terminating problematic queries which can put nodes in duress and degrade the
cluster performance.

* [Backport 2.x] Added resource usage trackers for in-flight cancellation of SearchShardTask (#4805)

1. CpuUsageTracker: cancels tasks if they consume too much CPU
2. ElapsedTimeTracker: cancels tasks if they consume too much time
3. HeapUsageTracker: cancels tasks if they consume too much heap

* [Backport 2.x]Added search backpressure stats API

Added search backpressure stats to the existing node/stats API to describe:
1. the number of cancellations (currently for SearchShardTask only)
2. the current state of TaskResourceUsageTracker

Signed-off-by: Ketan Verma <ketan9495@gmail.com>
(cherry picked from commit 7c521b9)

Co-authored-by: Ketan Verma <ketanv3@users.noreply.github.com>
tlfeng pushed a commit to tlfeng/OpenSearch that referenced this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.4 Backport to 2.4 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants