Improve websocket message coalescing to handle thundering herds better #118268

bdraco · 2024-05-27T23:13:44Z

Proposed change

When entities are added, we may add them in back-to-back tasks, which may return control to the event loop due to I/O or other suspension. This means that the WebSocket sender would run between each task, and the messages would not be coalesced, resulting in a thundering herd of messages to the browser, which would, in some cases, make the UI unresponsive.

To solve this, we now check if the queue has grown each time we are about to release the sender's future and reschedule the release of the queue for the next iteration of the event loop in the event the queue has grown in size. A safety value of PENDING_MSG_MAX_FORCE_READY is used to make sure we don't coalesce too many messages together to avoid the payload growing too large or the delay in sending becoming more than a few microseconds.

The auth code to delete the current refresh token (which closes the user's connection) expected it would always take one event loop iteration to send the response to deleting all the refresh tokens. Since it may take a few now, that code needed to be adjusted to have a longer delay to ensure the response could be sent before the user's own token was deleted.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

During startup the websocket would frequently disconnect if more than 4096 entities were added back to back. Some MQTT setups will have more than 10000 entities. Match the websocket peak value to the max expected entities

…ntities' into websocket_match_max_expected_entities

…ndle it

…annot handle it" This reverts commit 439e2d7.

home-assistant · 2024-05-27T23:13:50Z

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (websocket_api) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of websocket_api can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign websocket_api Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

home-assistant · 2024-05-27T23:13:50Z

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (auth) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of auth can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign auth Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

bdraco · 2024-05-27T23:15:29Z

@coderabbitai review

coderabbitai · 2024-05-27T23:15:36Z

Walkthrough

The changes introduce a delayed token deletion mechanism in the authentication component to avoid immediate connection closure and enhance the WebSocket API's message handling by managing queue sizes and future releases. Additionally, the tests are updated to reflect these changes, ensuring the new behaviors are correctly validated.

Changes

File	Change Summary
`homeassistant/components/auth/__init__.py`	Introduced `_delete_current_token_soon()` function to delete tokens after a delay.
`homeassistant/components/websocket_api/const.py`	Added `PENDING_MSG_MAX_FORCE_READY` constant and comments on browser limitations.
`homeassistant/components/websocket_api/http.py`	Enhanced `WebSocketHandler` with queue size management and future release handling.
`tests/components/auth/test_init.py`	Adjusted token deletion delay and added async block till done to ensure task completion.
`tests/components/websocket_api/test_http.py`	Removed assertion on message type before receiving a message from the websocket client.

Sequence Diagram(s) (Beta)

sequenceDiagram
    participant User
    participant HomeAssistant
    participant AuthComponent
    participant WebSocketAPI

    User->>HomeAssistant: Request Authentication
    HomeAssistant->>AuthComponent: Validate Token
    AuthComponent->>User: Token Validated

    User->>WebSocketAPI: Open WebSocket Connection
    WebSocketAPI->>WebSocketAPI: Manage Queue Size
    WebSocketAPI->>WebSocketAPI: Release Ready Future if Conditions Met

    User->>AuthComponent: Request Token Deletion
    AuthComponent->>AuthComponent: _delete_current_token_soon()
    AuthComponent->>HomeAssistant: Schedule Token Deletion Task (async)
    Note right of AuthComponent: Token deleted after delay

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to full the review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

homeassistant/components/auth/__init__.py

bdraco · 2024-05-28T00:13:05Z

Thats a lot better.. I guess we need a fast path for bytes

bdraco · 2024-05-28T01:45:35Z

I tried doing 5 ms timer, but that increased the latency too much. Call_soon Seems to be the best way, since it balances the desired have low latency with not overloading the system

homeassistant/components/websocket_api/http.py

bdraco · 2024-05-28T01:59:04Z

The number of writes before

bdraco · 2024-05-28T02:19:14Z

after

bdraco · 2024-05-28T02:56:56Z

@coderabbitai review

bdraco · 2024-05-28T04:17:20Z

I've had the bootstrap test fail a few times today. I fixed it in #118285

bdraco · 2024-05-29T03:53:39Z

thanks

* dev: (751 commits) Use runtime_data in ping (home-assistant#118332) Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341) Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339) Improve websocket message coalescing to handle thundering herds better (home-assistant#118268) Add cache to more complex entity filters (home-assistant#118344) Reduce the intent response data sent to LLMs (home-assistant#118346) Small speed up to connecting dispatchers (home-assistant#118342) Tweak Assist LLM API prompt (home-assistant#118343) Add Conversation command to timers (home-assistant#118325) LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283) Replace pop calls with del where the result is discarded in entity (home-assistant#118340) Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338) Use del instead of pop in the entity platform remove (home-assistant#118337) Update the recommended model for Google Gen AI (home-assistant#118323) Fix source_change not triggering an update (home-assistant#118312) Several fixes for the Matter climate platform (home-assistant#118322) Use None default for traccar server battery level sensor (home-assistant#118324) [esphome] 100% voice assistant test coverage (home-assistant#118334) Mark sonos group update a background task (home-assistant#118333) Filter timers more when pausing/unpausing (home-assistant#118331) ...

* dev: (8244 commits) Update zwave_js WS APIs for provisioning (home-assistant#117400) Add OSO Energy binary sensors (home-assistant#117174) Add august open action (home-assistant#113795) Add smoke detector temperature to Yale Smart Alarm (home-assistant#116306) Don't report entities with invalid unique id when loading the entity registry (home-assistant#118290) Fix epic_games_store mystery game URL (home-assistant#118314) Use runtime_data in ping (home-assistant#118332) Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341) Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339) Improve websocket message coalescing to handle thundering herds better (home-assistant#118268) Add cache to more complex entity filters (home-assistant#118344) Reduce the intent response data sent to LLMs (home-assistant#118346) Small speed up to connecting dispatchers (home-assistant#118342) Tweak Assist LLM API prompt (home-assistant#118343) Add Conversation command to timers (home-assistant#118325) LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283) Replace pop calls with del where the result is discarded in entity (home-assistant#118340) Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338) Use del instead of pop in the entity platform remove (home-assistant#118337) Update the recommended model for Google Gen AI (home-assistant#118323) ...

* dev: (1785 commits) Update zwave_js WS APIs for provisioning (home-assistant#117400) Add OSO Energy binary sensors (home-assistant#117174) Add august open action (home-assistant#113795) Add smoke detector temperature to Yale Smart Alarm (home-assistant#116306) Don't report entities with invalid unique id when loading the entity registry (home-assistant#118290) Fix epic_games_store mystery game URL (home-assistant#118314) Use runtime_data in ping (home-assistant#118332) Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341) Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339) Improve websocket message coalescing to handle thundering herds better (home-assistant#118268) Add cache to more complex entity filters (home-assistant#118344) Reduce the intent response data sent to LLMs (home-assistant#118346) Small speed up to connecting dispatchers (home-assistant#118342) Tweak Assist LLM API prompt (home-assistant#118343) Add Conversation command to timers (home-assistant#118325) LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283) Replace pop calls with del where the result is discarded in entity (home-assistant#118340) Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338) Use del instead of pop in the entity platform remove (home-assistant#118337) Update the recommended model for Google Gen AI (home-assistant#118323) ...

bdraco added 14 commits May 25, 2024 00:21

Increase websocket peak messages to match max expected entities

0a3891b

During startup the websocket would frequently disconnect if more than 4096 entities were added back to back. Some MQTT setups will have more than 10000 entities. Match the websocket peak value to the max expected entities

Merge branch 'dev' into websocket_match_max_expected_entities

3b94e74

coalesce more

1abb1ec

Merge remote-tracking branch 'upstream/websocket_match_max_expected_e…

dc4291b

…ntities' into websocket_match_max_expected_entities

delay more if the backlog gets large

5b31109

wait to send if the queue is building rapidly

a414099

tweak

f3d2003

tweak for chrome since it works great in firefox but chrome cannot ha…

439e2d7

…ndle it

Revert "tweak for chrome since it works great in firefox but chrome c…

f845ecc

…annot handle it" This reverts commit 439e2d7.

adjust for chrome

b4e4429

lower number

3a26917

Merge branch 'dev' into websocket_match_max_expected_entities

bed4652

remove code

de06c59

fixes

4e83e49

home-assistant bot added cla-signed code-quality core has-tests integration: auth integration: websocket_api Quality Scale: internal labels May 27, 2024

bdraco changed the title ~~Increase change of websocket message coalesce to better handle thundering herds~~ Increase chance of websocket message coalesce to better handle thundering herds May 27, 2024

bdraco changed the title ~~Increase chance of websocket message coalesce to better handle thundering herds~~ Improve websocket message coalescing to better handle thundering herds May 27, 2024

bdraco changed the title ~~Improve websocket message coalescing to better handle thundering herds~~ Improve websocket message coalescing to handle thundering herds better May 27, 2024

bdraco mentioned this pull request May 27, 2024

Align max expected entities constant between modules #118102

Merged

20 tasks

bdraco commented May 27, 2024

View reviewed changes

homeassistant/components/auth/__init__.py Show resolved Hide resolved

bdraco added 4 commits May 27, 2024 14:13

fast path for bytes

02520a6

compact

cb75108

adjust test since we see the close right away now on overload

64faeab

simplify check

5c5367e

bdraco commented May 28, 2024

View reviewed changes

homeassistant/components/websocket_api/http.py Show resolved Hide resolved

bdraco added 2 commits May 27, 2024 16:03

reduce loop

433fa88

tweak

bc48db3

handle ready right away

ac2d494

bdraco marked this pull request as ready for review May 28, 2024 04:16

bdraco requested a review from a team as a code owner May 28, 2024 04:17

bdraco added the smash Indicator this PR is close to finish for merging or closing label May 28, 2024

balloob approved these changes May 29, 2024

View reviewed changes

balloob merged commit 79bc179 into dev May 29, 2024
38 checks passed

balloob deleted the websocket_co branch May 29, 2024 03:14

github-actions bot locked and limited conversation to collaborators May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve websocket message coalescing to handle thundering herds better #118268

Improve websocket message coalescing to handle thundering herds better #118268

bdraco commented May 27, 2024 •

edited

Loading

home-assistant bot commented May 27, 2024

home-assistant bot commented May 27, 2024

bdraco commented May 27, 2024

coderabbitai bot commented May 27, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 29, 2024

Improve websocket message coalescing to handle thundering herds better #118268

Improve websocket message coalescing to handle thundering herds better #118268

Conversation

bdraco commented May 27, 2024 • edited Loading

Proposed change

Type of change

Additional information

Checklist

home-assistant bot commented May 27, 2024

home-assistant bot commented May 27, 2024

bdraco commented May 27, 2024

coderabbitai bot commented May 27, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s) (Beta)

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 28, 2024

bdraco commented May 29, 2024

bdraco commented May 27, 2024 •

edited

Loading

coderabbitai bot commented May 27, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)