Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task cancellation monitoring service #7642

Merged
merged 29 commits into from Jun 10, 2023

Conversation

sgup432
Copy link
Contributor

@sgup432 sgup432 commented May 19, 2023

Description

  • Adding task cancellation monitoring service changes. This detects tasks which are running more than X seconds(defined by a setting) after being cancelled. For now we are only tracking SearchShardTasks.
  • Runs a separate monitoring thread with interval of 5 seconds. We keep it off though until unless we see any tasks cancellations.
  • Added stats as well.
    This has dependency on changes(Task API) raised as part of this PR - Adding task cancellation timestamp in task API #7445. Once that is pushed, I will rebase.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#6953

Testing

Spun up a opensearch cluster. Introduced a deliberate delay in search request, then cancelled it. Verified that node stats and task API are being populated correctly.

curl -XGET localhost:9200/_nodes/stats?pretty
<rest of stats>....
"task_cancellation" : {
        "search_shard_task" : {
          "current_count_post_cancel" : 1,
          "total_count_post_cancel" : 1
        }
      }

 curl -XGET localhost:9200/_nodes/stats/task_cancellation?pretty
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "runTask",
  "nodes" : {
    "DyBaU6hwTVOPfRfLAF6uGw" : {
      "timestamp" : 1684523706582,
      "name" : "runTask-0",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "roles" : [
        "cluster_manager",
        "data",
        "ingest",
        "remote_cluster_client"
      ],
      "attributes" : {
        "testattr" : "test",
        "shard_indexing_pressure_enabled" : "true"
      },
      "task_cancellation" : {
        "search_shard_task" : {
          "current_count_post_cancel" : 1,
          "total_count_post_cancel" : 1
        }
      }
    }
  }
}

Task API related testing here - #7445

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sgup432 and others added 7 commits May 4, 2023 23:43
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Sagar <99425694+sgup432@users.noreply.github.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sagar <99425694+sgup432@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

Gradle Check (Jenkins) Run Completed with:

sgup432 and others added 2 commits June 6, 2023 23:14
Signed-off-by: Sagar <99425694+sgup432@users.noreply.github.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Merging #7642 (8077a04) into main (c888fce) will increase coverage by 0.23%.
The diff coverage is 84.24%.

@@             Coverage Diff              @@
##               main    #7642      +/-   ##
============================================
+ Coverage     70.84%   71.07%   +0.23%     
- Complexity    56464    56651     +187     
============================================
  Files          4702     4706       +4     
  Lines        266973   267135     +162     
  Branches      39157    39167      +10     
============================================
+ Hits         189131   189862     +731     
+ Misses        61897    61283     -614     
- Partials      15945    15990      +45     
Impacted Files Coverage Δ
...min/cluster/stats/TransportClusterStatsAction.java 69.56% <ø> (ø)
...rg/opensearch/common/settings/ClusterSettings.java 92.50% <ø> (ø)
...rch/action/admin/cluster/node/stats/NodeStats.java 51.44% <33.33%> (-1.00%) ⬇️
...src/main/java/org/opensearch/node/NodeService.java 74.39% <50.00%> (-0.30%) ⬇️
...search/tasks/SearchShardTaskCancellationStats.java 77.27% <77.27%> (ø)
...rc/main/java/org/opensearch/tasks/TaskManager.java 66.66% <80.95%> (+0.91%) ⬆️
...va/org/opensearch/tasks/TaskCancellationStats.java 82.35% <82.35%> (ø)
...earch/tasks/TaskCancellationMonitoringService.java 90.32% <90.32%> (ø)
...on/admin/cluster/node/stats/NodesStatsRequest.java 93.24% <100.00%> (+0.09%) ⬆️
.../cluster/node/stats/TransportNodesStatsAction.java 100.00% <100.00%> (ø)
... and 2 more

... and 458 files with indirect coverage changes

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testBasicTaskResourceTracking

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

@Bukhtawar
Copy link
Collaborator

Lets ensure we open an issue for all things needing a followup.

@sgup432
Copy link
Contributor Author

sgup432 commented Jun 8, 2023

Lets ensure we open an issue for all things needing a followup.

@Bukhtawar Sure. I have created desired issues for the things we need to track or follow up on.

@Bukhtawar Bukhtawar merged commit 584617e into opensearch-project:main Jun 10, 2023
10 checks passed
sgup432 added a commit to sgup432/OpenSearch that referenced this pull request Jun 13, 2023
* Task cancellation monitoring service 

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Bukhtawar pushed a commit that referenced this pull request Jun 14, 2023
* Task cancellation monitoring service (#7642)

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
…ct#8046)

* Task cancellation monitoring service (opensearch-project#7642)

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
* Task cancellation monitoring service

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Task cancellation monitoring service

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants