-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent uptimeFunc from being called everytime CheckHealth is called #609
Conversation
Hi @mcshooter. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold testing in progress |
/cc @smileusd |
@mcshooter: GitHub didn't allow me to request PR reviews from the following users: smileusd. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @pjh |
/cc @ibabou |
cc/ @Random-Liu |
/sig node |
85d4245
to
f9db151
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed but don't have much to offer. At a quick glance it's challenging to read the code here (existing and modified) since I'm not sure what loopBackTime means.
But +1 to the goal of not calling the uptimeFunc as often to reduce CPU utilization. We'll still need to work on further changes to avoid starting numerous Powershell sessions on Windows.
/unhold |
f9db151
to
26f070b
Compare
In this case, do we still intend to use log pattern or repair on windows? If not, can we just declare it as not properly supported, and return error if users try to use log pattern or repair on windows? In that way, we can no-op those functions for windows instead. The current code makes me feel like we will never use it, because it has serious performance issue. In that case, I prefer we disable the feature and cleanup the corresponding code, instead of moving the code around to make the code never run just for us. Because others may still hit the issue if they try to use it. |
We still do use the repair functionality. So it doesn't make sense to remove this as a windows functionality. The code will still run to repair and call uptime but only in the case when it expects it to. |
Offline discussed, there is no good way to get the last start time when the service is down. So the current fix makes sense to me |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ibabou, mcshooter, Random-Liu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There is currently an issue for Windows nodes where NPD is causing the CPU to hit and persist at 100%. After investigating, it looks like there was a change that moved uptimeFunc to be called every time we checked the health through health checker. We shouldn't be calling uptimeFunc unless we know it has been determined that one of the services is unhealthy and there are no patterns to be checking. For Windows, because the query to get and calculate the uptime is pretty heavy, having it be called every 10s or so for each of the separate services would unnecessarily increase usage. This change will revert part of how we CheckHealth and move the loopBackTime logic inside of the logPatternHealthCheck function to only be done if there are log patterns that need to be checked.