Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Kubelet: add a metric to observe time since PLEG last seen #86251
What this PR does / why we need it:
Expose the measurement that kubelet uses to judge that "PLEG is unhealthy".
Note that the existing metrics
Which issue(s) this PR fixes:
Does this PR introduce a user-facing change?:
mattjmcnaughton left a comment
Adding this metric certainly seems like it would provide monitoring benefit w.r.t. to the issue you mentioned. Additionally, it seems like a low cost metric (i.e. no risks of high cardinality, etc...). So LGTM!
Pretty sure the test failures are flakes - will trigger retries.
Expose the measurement that kubelet uses to judge that "PLEG is unhealthy". If we can observe the measurement growing then we can alert before the node goes unhealthy. Note that the existing metrics PLEGRelistInterval and PLEGRelistDuration are poor for this, because when relist() gets stuck they are never updated. Signed-off-by: Bryan Boreham <firstname.lastname@example.org>
[APPROVALNOTIFIER] This PR is APPROVED
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing