-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip homekit_controller polls when system is overloaded and still trying to process the previous one #25968
Skip homekit_controller polls when system is overloaded and still trying to process the previous one #25968
Conversation
# Temporary connection failure. Device is still available but our | ||
# connection was dropped. | ||
if self._polling_lock.locked(): | ||
_LOGGER.warning("HomeKit controller update skipped as previous poll still in flight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think affected users will complain about the warning message?
Is it critical in the long term for the user to fix this problem, so that the user must be made aware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This logging message is supposed to only fire when the system is under so much load an operation that normally takes less than 1s is taking over a minute. The most likely causes for that are going to be bad devices, bad wifi or thread pool contention. So I'm hopeful most users won't see this and its a sign that their installation is deeply borked, so they will appreciate a clue. In that case I could see a different error message that said something like
HomeKit controller update skipped as previous poll still in flight - your accessory might not have a good connection to your network, your system may be unable to handle the number of accessories and integrations, or your configuration maybe experiencing threadpool contention issues.
But thats verbose.
A counter argument is that although HomeKit controller can exacerabate the situation (especially in the bad wifi case) its not neccesarily the root cause, so even if it's taking action to relieve pressure on the users system, maybe it shouldn't implicate itself with a warning level message.
I'm happy to change as you see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we can keep it as warning. The alternative is to make it debug I think.
We could add a section to the docs, explaining the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been meaning to improve the homekit_controller docs so i've included this in there: home-assistant/home-assistant.io#10153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that we should warn once per "occurrence". So if we hit 5 warnings in a row, we only warn the first. We would also add an info log when we are back to normal polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added the logging as suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just formatting left to attend.
Oops - done in latest commit |
Description:
In #25178 we have a system with 75 HomeKit entities. A couple of pull requests ago I landed a change to massively reduce the thread pool usage for such a setup (a 15x5 setup like this now requires a poll for each pairing (15 pairings) rather than for each entity (75 entities). But we can still do more.
While we have connection timeouts in HomeKit with this many pairings a poor wifi connection to a device or a heavily loaded system with back pressure could find itself in a situation where more pollings are being queued than satisfied. In the worst cases a HA instance might not be able to recover.
This change places a lock around the update code, but rather than waiting for the lock we instead skip (and warn) polls if the lock is already held. On a normal system this will be a no-op. On a HA instance that is struggling it should avoid placing any more pressure on the thread pool.
Checklist:
tox
. Your PR cannot be merged unless tests pass