New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[watchdog] Provide additional watchdog actions and/or extension points #11388
Comments
/assign @KBaichoo |
For a general mechanism I'm proposing the following changes: Bootstrap.proto
In GuardDogImpl:
The callbacks will have the following signature: |
You may want to have the events be: MultiKill, Kill, Miss and Megamiss to match the current actions. Getting the full list of threads that are in a miss/megamiss state seems very useful. Similarly, list of threads involved in the Kill or MultiKill event seems useful and would be consistent with the API used for miss/megamiss, even if the kind of operations I expect we would run on Kill/MultiKill possibly not requiring the thread id information. |
Good Idea, that would add more granularity and be more consistent than Abort. |
IIUC, the best way to provide the 'extensions' (for particular watchdog events) is via Factories and using utilities such as I don't see a class that derives from |
@envoyproxy/api-shepherds for input |
+1 to using an extension/typed_config interface if we think that we will want this to be extensible in the future with different actions/events. |
I've turned the Extension PR from a draft into an actual PR. Since it was getting quite big I've decided to implement one of the extensions in another PR. For implementing CPU profiling based on WatchDogEvents:
|
Added a watchdog extension that triggers profiling. Risk Level: Medium (new extension that is optional) Testing: Unit tests Docs Changes: Included (added a reference to the generated extension proto.rst) Release Notes: Included Fixes #11388 Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Added a watchdog extension that triggers profiling. Risk Level: Medium (new extension that is optional) Testing: Unit tests Docs Changes: Included (added a reference to the generated extension proto.rst) Release Notes: Included Fixes envoyproxy#11388 Signed-off-by: Kevin Baichoo <kbaichoo@google.com> Signed-off-by: Clara Andrew-Wani <candrewwani@gmail.com>
The thread watchdog is already an important mechanism to detect and recover from coding errors that results in infinite loops, blocking API calls and very long computations in worker threads. There are a few simple improvements that would make the watchdog even more awesome:
The text was updated successfully, but these errors were encountered: