Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve websocket message coalescing to handle thundering herds better #118268

Merged
merged 21 commits into from
May 29, 2024

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented May 27, 2024

Proposed change

When entities are added, we may add them in back-to-back tasks, which may return control to the event loop due to I/O or other suspension. This means that the WebSocket sender would run between each task, and the messages would not be coalesced, resulting in a thundering herd of messages to the browser, which would, in some cases, make the UI unresponsive.

To solve this, we now check if the queue has grown each time we are about to release the sender's future and reschedule the release of the queue for the next iteration of the event loop in the event the queue has grown in size. A safety value of PENDING_MSG_MAX_FORCE_READY is used to make sure we don't coalesce too many messages together to avoid the payload growing too large or the delay in sending becoming more than a few microseconds.

The auth code to delete the current refresh token (which closes the user's connection) expected it would always take one event loop iteration to send the response to deleting all the refresh tokens. Since it may take a few now, that code needed to be adjusted to have a longer delay to ensure the response could be sent before the user's own token was deleted.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (websocket_api) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of websocket_api can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign websocket_api Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (auth) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of auth can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign auth Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@bdraco
Copy link
Member Author

bdraco commented May 27, 2024

@coderabbitai review

Copy link

coderabbitai bot commented May 27, 2024

Walkthrough

The changes introduce a delayed token deletion mechanism in the authentication component to avoid immediate connection closure and enhance the WebSocket API's message handling by managing queue sizes and future releases. Additionally, the tests are updated to reflect these changes, ensuring the new behaviors are correctly validated.

Changes

File Change Summary
homeassistant/components/auth/__init__.py Introduced _delete_current_token_soon() function to delete tokens after a delay.
homeassistant/components/websocket_api/const.py Added PENDING_MSG_MAX_FORCE_READY constant and comments on browser limitations.
homeassistant/components/websocket_api/http.py Enhanced WebSocketHandler with queue size management and future release handling.
tests/components/auth/test_init.py Adjusted token deletion delay and added async block till done to ensure task completion.
tests/components/websocket_api/test_http.py Removed assertion on message type before receiving a message from the websocket client.

Sequence Diagram(s) (Beta)

sequenceDiagram
    participant User
    participant HomeAssistant
    participant AuthComponent
    participant WebSocketAPI

    User->>HomeAssistant: Request Authentication
    HomeAssistant->>AuthComponent: Validate Token
    AuthComponent->>User: Token Validated

    User->>WebSocketAPI: Open WebSocket Connection
    WebSocketAPI->>WebSocketAPI: Manage Queue Size
    WebSocketAPI->>WebSocketAPI: Release Ready Future if Conditions Met

    User->>AuthComponent: Request Token Deletion
    AuthComponent->>AuthComponent: _delete_current_token_soon()
    AuthComponent->>HomeAssistant: Schedule Token Deletion Task (async)
    Note right of AuthComponent: Token deleted after delay
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to full the review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@bdraco bdraco changed the title Increase change of websocket message coalesce to better handle thundering herds Increase chance of websocket message coalesce to better handle thundering herds May 27, 2024
@bdraco bdraco changed the title Increase chance of websocket message coalesce to better handle thundering herds Improve websocket message coalescing to better handle thundering herds May 27, 2024
@bdraco bdraco changed the title Improve websocket message coalescing to better handle thundering herds Improve websocket message coalescing to handle thundering herds better May 27, 2024
@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

send_message

Thats a lot better.. I guess we need a fast path for bytes

@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

I tried doing 5 ms timer, but that increased the latency too much. Call_soon Seems to be the best way, since it balances the desired have low latency with not overloading the system

@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

Screenshot 2024-05-27 at 3 58 07 PM

The number of writes before

@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

after

Screenshot 2024-05-27 at 4 18 29 PM Screenshot 2024-05-27 at 4 18 58 PM

@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

@coderabbitai review

@bdraco bdraco marked this pull request as ready for review May 28, 2024 04:16
@bdraco bdraco requested a review from a team as a code owner May 28, 2024 04:17
@bdraco
Copy link
Member Author

bdraco commented May 28, 2024

I've had the bootstrap test fail a few times today. I fixed it in #118285

@bdraco bdraco added the smash Indicator this PR is close to finish for merging or closing label May 28, 2024
@balloob balloob merged commit 79bc179 into dev May 29, 2024
38 checks passed
@balloob balloob deleted the websocket_co branch May 29, 2024 03:14
@bdraco
Copy link
Member Author

bdraco commented May 29, 2024

thanks

raman325 added a commit to raman325/home-assistant that referenced this pull request May 29, 2024
* dev: (751 commits)
  Use runtime_data in ping (home-assistant#118332)
  Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341)
  Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339)
  Improve websocket message coalescing to handle thundering herds better (home-assistant#118268)
  Add cache to more complex entity filters (home-assistant#118344)
  Reduce the intent response data sent to LLMs (home-assistant#118346)
  Small speed up to connecting dispatchers (home-assistant#118342)
  Tweak Assist LLM API prompt (home-assistant#118343)
  Add Conversation command to timers (home-assistant#118325)
  LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283)
  Replace pop calls with del where the result is discarded in entity (home-assistant#118340)
  Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338)
  Use del instead of pop in the entity platform remove (home-assistant#118337)
  Update the recommended model for Google Gen AI (home-assistant#118323)
  Fix source_change not triggering an update (home-assistant#118312)
  Several fixes for the Matter climate platform (home-assistant#118322)
  Use None default for traccar server battery level sensor (home-assistant#118324)
  [esphome] 100% voice assistant test coverage (home-assistant#118334)
  Mark sonos group update a background task (home-assistant#118333)
  Filter timers more when pausing/unpausing (home-assistant#118331)
  ...
raman325 added a commit to raman325/home-assistant that referenced this pull request May 29, 2024
* dev: (8244 commits)
  Update zwave_js WS APIs for provisioning (home-assistant#117400)
  Add OSO Energy binary sensors (home-assistant#117174)
  Add august open action (home-assistant#113795)
  Add smoke detector temperature to Yale Smart Alarm (home-assistant#116306)
  Don't report entities with invalid unique id when loading the entity registry (home-assistant#118290)
  Fix epic_games_store mystery game URL (home-assistant#118314)
  Use runtime_data in ping (home-assistant#118332)
  Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341)
  Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339)
  Improve websocket message coalescing to handle thundering herds better (home-assistant#118268)
  Add cache to more complex entity filters (home-assistant#118344)
  Reduce the intent response data sent to LLMs (home-assistant#118346)
  Small speed up to connecting dispatchers (home-assistant#118342)
  Tweak Assist LLM API prompt (home-assistant#118343)
  Add Conversation command to timers (home-assistant#118325)
  LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283)
  Replace pop calls with del where the result is discarded in entity (home-assistant#118340)
  Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338)
  Use del instead of pop in the entity platform remove (home-assistant#118337)
  Update the recommended model for Google Gen AI (home-assistant#118323)
  ...
raman325 added a commit to raman325/home-assistant that referenced this pull request May 29, 2024
* dev: (1785 commits)
  Update zwave_js WS APIs for provisioning (home-assistant#117400)
  Add OSO Energy binary sensors (home-assistant#117174)
  Add august open action (home-assistant#113795)
  Add smoke detector temperature to Yale Smart Alarm (home-assistant#116306)
  Don't report entities with invalid unique id when loading the entity registry (home-assistant#118290)
  Fix epic_games_store mystery game URL (home-assistant#118314)
  Use runtime_data in ping (home-assistant#118332)
  Fix last_reported_timestamp not being updated when last_reported is changed (home-assistant#118341)
  Replace pop calls with del where the result is discarded in restore_state (home-assistant#118339)
  Improve websocket message coalescing to handle thundering herds better (home-assistant#118268)
  Add cache to more complex entity filters (home-assistant#118344)
  Reduce the intent response data sent to LLMs (home-assistant#118346)
  Small speed up to connecting dispatchers (home-assistant#118342)
  Tweak Assist LLM API prompt (home-assistant#118343)
  Add Conversation command to timers (home-assistant#118325)
  LLM Assist API to ignore intents if not needed for exposed entities or calling device (home-assistant#118283)
  Replace pop calls with del where the result is discarded in entity (home-assistant#118340)
  Replace pop calls with del where the result is discarded in mqtt (home-assistant#118338)
  Use del instead of pop in the entity platform remove (home-assistant#118337)
  Update the recommended model for Google Gen AI (home-assistant#118323)
  ...
@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants