-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add ThreadWatchdog
to ClusterApplierService
#134361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ThreadWatchdog
to ClusterApplierService
#134361
Conversation
Adds another layer of detection of slow activity on the cluster applier thread. In particular this can detect activity that isn't included in an `UpdateTask`, which particularly may include completing an expensive listener attached to a `ClusterStateObserver`. Moreover it captures a thread dump if slow activity is detected.
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
This has required surprisingly many changes to the test suite, although they're all very simple and fall into two categories:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only had minor comments.
this.interval = interval; | ||
this.quietTime = quietTime.compareTo(interval) <= 0 ? interval : quietTime; | ||
this.lifecycle = lifecycle; | ||
this.logger = logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hot-threads dumper still links the doc for ReferenceDocs.NETWORK_THREADING_MODEL
which is not applicable in the new case. Do we care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes you're right. I can't think of a great place to document this. I mean it's kind of the same principle, this thread should be frequently idle just like the transport_worker
threads. I think I'm going to say we don't care.
|
||
final AtomicBoolean completedTask = new AtomicBoolean(); | ||
|
||
clusterApplierService.runOnApplierThread("blocking task", randomFrom(Priority.values()), ignored -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a test for immediate listener firing in ClusterApplierService#addTimeoutListener
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point, see ea0fca3.
Adds another layer of detection of slow activity on the cluster applier
thread. In particular this can detect activity that isn't included in an
UpdateTask
, which particularly may include completing an expensivelistener attached to a
ClusterStateObserver
. Moreover it captures athread dump if slow activity is detected.