Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dispatcher to reduce run time and memory overhead #99676

Merged
merged 3 commits into from
Sep 6, 2023

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Sep 5, 2023

Proposed change

  • When we removed the last job/callable from the dict for the signal we did not remove the dict for the signal which meant it leaked memory
  • Since we tend to create 10000+ of these, avoid the closures as well to reduce memory usage.
  • Avoid calling .append() over in over when sending and instead do one list() call since the python overhead of calling .append() was much more expensive than the single list() call

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

When we removed the last job/callable from the dict for the
signal we did not remove the dict for the signal which meant
it leaked
# less memory than a full closure since a partial copies
# the body of the function and we don't have to store
# many different copies of the same function
return partial(_async_remove_dispatcher, dispatchers, signal, target)
Copy link
Member Author

@bdraco bdraco Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial creates a new function wrapper with different arguments but the underlying function body reference is the same https://github.com/python/cpython/blob/cf19e8ea3a232086ff1e0d7d5e2a092d3d96fc7c/Modules/_functoolsmodule.c#L84 so we don't end up with a new function body per dispatcher connect in memory.

Memory used per closure: 408.7808
Memory used per partial: 250.01984
from functools import partial

import psutil

process = psutil.Process()
before = process.memory_info().rss
closures = []


def gen_closure(i, y, q):
    def this_is_a_closure():
        z = i + 1
        print(z)
        z = y + 1
        print(z)
        z += q
        new_dict = {}
        new_dict["key"] = "value"
        return i

    return this_is_a_closure


for i in range(100000):
    closures.append(gen_closure(i, 6, 7))

after = process.memory_info().rss

print(f"Memory used per closure: {(after - before) / len(closures):,}")


before = process.memory_info().rss


def use_for_partial(i, y, q):
    z = i + 1
    print(z)
    z = y + 1
    print(z)
    z += q
    new_dict = {}
    new_dict["key"] = "value"
    return i


partials = []
for i in range(100000):
    partials.append(partial(use_for_partial, i, 6, 7))

after = process.memory_info().rss
print(f"Memory used per partial: {(after - before) / len(partials):,}")

@bdraco bdraco marked this pull request as ready for review September 5, 2023 16:19
@bdraco bdraco requested a review from a team as a code owner September 5, 2023 16:19

run: list[HassJob[..., None | Coroutine[Any, Any, None]]] = []
for target, job in target_list.items():
for target, job in list(target_list.items()):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just make a copy of the items instead of appending it all to a list one at a time since the python function overhead of calling .append() over and over it much worse

@bdraco bdraco changed the title Fix memory leak in dispatcher removal Refactor dispatcher to reduce overhead Sep 5, 2023
@bdraco bdraco changed the title Refactor dispatcher to reduce overhead Refactor dispatcher to reduce run time and memory overhead Sep 5, 2023
@balloob balloob merged commit a2dae60 into dev Sep 6, 2023
34 checks passed
@balloob balloob deleted the dispatcher_cleanup branch September 6, 2023 01:18
@bdraco
Copy link
Member Author

bdraco commented Sep 6, 2023

thanks

@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants