Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip homekit_controller polls when system is overloaded and still trying to process the previous one #25968

Merged

Conversation

@Jc2k
Copy link
Contributor

commented Aug 15, 2019

Description:

In #25178 we have a system with 75 HomeKit entities. A couple of pull requests ago I landed a change to massively reduce the thread pool usage for such a setup (a 15x5 setup like this now requires a poll for each pairing (15 pairings) rather than for each entity (75 entities). But we can still do more.

While we have connection timeouts in HomeKit with this many pairings a poor wifi connection to a device or a heavily loaded system with back pressure could find itself in a situation where more pollings are being queued than satisfied. In the worst cases a HA instance might not be able to recover.

This change places a lock around the update code, but rather than waiting for the lock we instead skip (and warn) polls if the lock is already held. On a normal system this will be a no-op. On a HA instance that is struggling it should avoid placing any more pressure on the thread pool.

Checklist:

  • The code change is tested and works locally.
  • Local tests pass with tox. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the [development checklist][dev-checklist]

@project-bot project-bot bot added this to Needs review in Dev Aug 15, 2019

@Jc2k Jc2k changed the title Skip async_update if there are signs of backpressure Skip homekit_controller polls when system is overloaded and still trying to process the previous one Aug 15, 2019

Dev automation moved this from Needs review to Reviewer approved Aug 15, 2019

# Temporary connection failure. Device is still available but our
# connection was dropped.
if self._polling_lock.locked():
_LOGGER.warning("HomeKit controller update skipped as previous poll still in flight")

This comment has been minimized.

Copy link
@MartinHjelmare

MartinHjelmare Aug 16, 2019

Member

Do you think affected users will complain about the warning message?

Is it critical in the long term for the user to fix this problem, so that the user must be made aware?

This comment has been minimized.

Copy link
@Jc2k

Jc2k Aug 16, 2019

Author Contributor

Good question. This logging message is supposed to only fire when the system is under so much load an operation that normally takes less than 1s is taking over a minute. The most likely causes for that are going to be bad devices, bad wifi or thread pool contention. So I'm hopeful most users won't see this and its a sign that their installation is deeply borked, so they will appreciate a clue. In that case I could see a different error message that said something like

HomeKit controller update skipped as previous poll still in flight - your accessory might not have a good connection to your network, your system may be unable to handle the number of accessories and integrations, or your configuration maybe experiencing threadpool contention issues.

But thats verbose.

A counter argument is that although HomeKit controller can exacerabate the situation (especially in the bad wifi case) its not neccesarily the root cause, so even if it's taking action to relieve pressure on the users system, maybe it shouldn't implicate itself with a warning level message.

I'm happy to change as you see fit.

This comment has been minimized.

Copy link
@MartinHjelmare

MartinHjelmare Aug 16, 2019

Member

It sounds like we can keep it as warning. The alternative is to make it debug I think.

We could add a section to the docs, explaining the warning?

This comment has been minimized.

Copy link
@Jc2k

Jc2k Aug 16, 2019

Author Contributor

I've been meaning to improve the homekit_controller docs so i've included this in there: home-assistant/home-assistant.io#10153

This comment has been minimized.

Copy link
@balloob

balloob Aug 16, 2019

Member

I would say that we should warn once per "occurrence". So if we hit 5 warnings in a row, we only warn the first. We would also add an info log when we are back to normal polling.

This comment has been minimized.

Copy link
@Jc2k

Jc2k Aug 17, 2019

Author Contributor

Have added the logging as suggested

Jc2k added 2 commits Aug 17, 2019
@MartinHjelmare
Copy link
Member

left a comment

Just formatting left to attend.

@Jc2k

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2019

Oops - done in latest commit

@balloob balloob merged commit 2bd832c into home-assistant:dev Aug 18, 2019

10 of 11 checks passed

codecov/patch 71.42% of diff hit (target 94.04%)
Details
CI Build #20190817.10 succeeded
Details
CI (FullCheck Mypy) FullCheck Mypy succeeded
Details
CI (FullCheck Pylint) FullCheck Pylint succeeded
Details
CI (Overview CheckFormat) Overview CheckFormat succeeded
Details
CI (Overview Lint) Overview Lint succeeded
Details
CI (Overview Validate) Overview Validate succeeded
Details
CI (Tests PyTest Python36) Tests PyTest Python36 succeeded
Details
CI (Tests PyTest Python37) Tests PyTest Python37 succeeded
Details
cla-bot Everyone involved has signed the CLA
codecov/project 94.02% (target 90%)
Details

Dev automation moved this from Reviewer approved to Done Aug 18, 2019

@lock lock bot locked and limited conversation to collaborators Aug 19, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
4 participants
You can’t perform that action at this time.