Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip homekit_controller polls when system is overloaded and still trying to process the previous one #25968

Merged

Conversation

Jc2k
Copy link
Member

@Jc2k Jc2k commented Aug 15, 2019

Description:

In #25178 we have a system with 75 HomeKit entities. A couple of pull requests ago I landed a change to massively reduce the thread pool usage for such a setup (a 15x5 setup like this now requires a poll for each pairing (15 pairings) rather than for each entity (75 entities). But we can still do more.

While we have connection timeouts in HomeKit with this many pairings a poor wifi connection to a device or a heavily loaded system with back pressure could find itself in a situation where more pollings are being queued than satisfied. In the worst cases a HA instance might not be able to recover.

This change places a lock around the update code, but rather than waiting for the lock we instead skip (and warn) polls if the lock is already held. On a normal system this will be a no-op. On a HA instance that is struggling it should avoid placing any more pressure on the thread pool.

Checklist:

  • The code change is tested and works locally.
  • Local tests pass with tox. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the [development checklist][dev-checklist]

@Jc2k Jc2k changed the title Skip async_update if there are signs of backpressure Skip homekit_controller polls when system is overloaded and still trying to process the previous one Aug 15, 2019
# Temporary connection failure. Device is still available but our
# connection was dropped.
if self._polling_lock.locked():
_LOGGER.warning("HomeKit controller update skipped as previous poll still in flight")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think affected users will complain about the warning message?

Is it critical in the long term for the user to fix this problem, so that the user must be made aware?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This logging message is supposed to only fire when the system is under so much load an operation that normally takes less than 1s is taking over a minute. The most likely causes for that are going to be bad devices, bad wifi or thread pool contention. So I'm hopeful most users won't see this and its a sign that their installation is deeply borked, so they will appreciate a clue. In that case I could see a different error message that said something like

HomeKit controller update skipped as previous poll still in flight - your accessory might not have a good connection to your network, your system may be unable to handle the number of accessories and integrations, or your configuration maybe experiencing threadpool contention issues.

But thats verbose.

A counter argument is that although HomeKit controller can exacerabate the situation (especially in the bad wifi case) its not neccesarily the root cause, so even if it's taking action to relieve pressure on the users system, maybe it shouldn't implicate itself with a warning level message.

I'm happy to change as you see fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we can keep it as warning. The alternative is to make it debug I think.

We could add a section to the docs, explaining the warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to improve the homekit_controller docs so i've included this in there: home-assistant/home-assistant.io#10153

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that we should warn once per "occurrence". So if we hit 5 warnings in a row, we only warn the first. We would also add an info log when we are back to normal polling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added the logging as suggested

Copy link
Member

@MartinHjelmare MartinHjelmare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just formatting left to attend.

@Jc2k
Copy link
Member Author

Jc2k commented Aug 17, 2019

Oops - done in latest commit

@balloob balloob merged commit 2bd832c into home-assistant:dev Aug 18, 2019
@lock lock bot locked and limited conversation to collaborators Aug 19, 2019
@Jc2k Jc2k deleted the homekit_controller_avoid_backpressure branch December 7, 2019 09:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants