Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce frequency of Reaper and avoid copying the cache when possible #297

Merged
merged 1 commit into from
Sep 6, 2020

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Sep 3, 2020

Avoid copying the entires cache and reduce frequency of Reaper

The cache reaper was running at least every 10 seconds, making
a copy of the cache, and iterated all the entries to
check if they were expired so they could be removed.

In practice the reaper was actually running much more frequently
because it used self.zc.wait which would unblock any time
a record was updated, a listener was added, or when a
listener was removed.

This change ensures the reaper frequency is only every 10s, and
will first attempt to iterate the cache before falling back to
making a copy.

Previously it made sense to expire the cache more frequently
because we had places were we frequently had to enumerate
all the cache entries. With #247 and #232 we no longer
have to account for this concern.

On a mostly idle RPi running HomeAssistant and a busy
network the total time spent reaping the cache was
more than the total time spent processing the mDNS traffic.

Top 10 functions, idle RPi (before)

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  0.00%   0.00%    2.69s     2.69s   handle_read (zeroconf/__init__.py:1367)   <== Incoming mDNS
  0.00%   0.00%    1.51s     2.98s   run (zeroconf/__init__.py:1431)           <== Reaper
  0.00%   0.00%    1.42s     1.42s   is_expired (zeroconf/__init__.py:502)     <== Reaper
  0.00%   0.00%    1.12s     1.12s   entries (zeroconf/__init__.py:1274)       <== Reaper
  0.00%   0.00%   0.620s    0.620s   do_execute (sqlalchemy/engine/default.py:593)
  0.00%   0.00%   0.620s    0.620s   read_utf (zeroconf/__init__.py:837)
  0.00%   0.00%   0.610s    0.610s   do_commit (sqlalchemy/engine/default.py:546)
  0.00%   0.00%   0.540s     1.16s   read_name (zeroconf/__init__.py:853)
  0.00%   0.00%   0.380s    0.380s   do_close (sqlalchemy/engine/default.py:549)
  0.00%   0.00%   0.340s    0.340s   write (asyncio/selector_events.py:908)

After this change, the Reaper code paths do not show up in the top
10 function sample.

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  4.00%   4.00%    2.72s     2.72s   handle_read (zeroconf/__init__.py:1378)     <== Incoming mDNS
  4.00%   4.00%    1.81s     1.81s   read_utf (zeroconf/__init__.py:837)
  1.00%   5.00%    1.68s     3.51s   read_name (zeroconf/__init__.py:853)
  0.00%   0.00%    1.32s     1.32s   do_execute (sqlalchemy/engine/default.py:593)
  0.00%   0.00%   0.960s    0.960s   readinto (socket.py:669)
  0.00%   0.00%   0.950s    0.950s   create_connection (urllib3/util/connection.py:74)
  0.00%   0.00%   0.910s    0.910s   do_commit (sqlalchemy/engine/default.py:546)
  1.00%   1.00%   0.880s    0.880s   write (asyncio/selector_events.py:908)
  0.00%   0.00%   0.700s    0.810s   __eq__ (zeroconf/__init__.py:606)
  2.00%   2.00%   0.670s    0.670s   unpack (zeroconf/__init__.py:737)

@coveralls
Copy link

coveralls commented Sep 3, 2020

Coverage Status

Coverage increased (+0.1%) to 93.266% when pulling 9043cba on bdraco:reaper_busy_networks into 9e27d12 on jstasiak:master.

zeroconf/__init__.py Outdated Show resolved Hide resolved
@bdraco bdraco force-pushed the reaper_busy_networks branch 2 times, most recently from 29756c9 to 2c87110 Compare September 3, 2020 16:33
@bdraco bdraco changed the title Avoid copying the entires cache and reduce frequency of Reaper Reduce frequency of Reaper Sep 3, 2020
@bdraco
Copy link
Member Author

bdraco commented Sep 3, 2020

Decreasing the reaper didn't help that much on its own. It is the cache copy that kills the performance. Since its a rare exception, it might make sense to trap it and only do the copy if we hit the exception.

@bdraco bdraco changed the title Reduce frequency of Reaper Reduce frequency of Reaper and avoid copying the cache when possible Sep 3, 2020
@bdraco bdraco force-pushed the reaper_busy_networks branch 9 times, most recently from 2da92cb to 31d30df Compare September 3, 2020 19:19
@bdraco
Copy link
Member Author

bdraco commented Sep 3, 2020

At this point I'm happy with this as it equates to a significant decrease in cpu usage. I'm going to update everything I have in production with it and run it for 24 hours to make sure it doesn't have any unexpected side effects.

@jstasiak
Copy link
Collaborator

jstasiak commented Sep 4, 2020

For completeness sake:

  • can you provide the top 10 functions after the change?
  • reducing reaper frequency seems orthogonal to the core of this change, is it still needed even with your optimizations in place?

LGTM otherwise.

@bdraco
Copy link
Member Author

bdraco commented Sep 4, 2020

For completeness sake:

  • can you provide the top 10 functions after the change?
  • reducing reaper frequency seems orthogonal to the core of this change, is it still needed even with your optimizations in place?

LGTM otherwise.

I'll try setting it back to 10s and re-run all the profiles. If it still looks good, I'll adjust here and add the top 10 after

The cache reaper was running at least every 10 seconds, making
a copy of the cache, and iterated all the entries to
check if they were expired so they could be removed.

In practice the reaper was actually running much more frequently
because it used self.zc.wait which would unblock any time
a record was updated, a listener was added, or when a
listener was removed.

This change ensures the reaper frequency is only every 10s, and
will first attempt to iterate the cache before falling back to
making a copy.

Previously it made sense to expire the cache more frequently
because we had places were we frequently had to enumerate
all the cache entries. With python-zeroconf#247 and python-zeroconf#232 we no longer
have to account for this concern.

On a mostly idle RPi running HomeAssistant and a busy
network the total time spent reaping the cache was
more than the total time spent processing the mDNS traffic.

Top 10 functions, idle RPi (before)

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  0.00%   0.00%    2.69s     2.69s   handle_read (zeroconf/__init__.py:1367)   <== Incoming mDNS
  0.00%   0.00%    1.51s     2.98s   run (zeroconf/__init__.py:1431)           <== Reaper
  0.00%   0.00%    1.42s     1.42s   is_expired (zeroconf/__init__.py:502)     <== Reaper
  0.00%   0.00%    1.12s     1.12s   entries (zeroconf/__init__.py:1274)       <== Reaper
  0.00%   0.00%   0.620s    0.620s   do_execute (sqlalchemy/engine/default.py:593)
  0.00%   0.00%   0.620s    0.620s   read_utf (zeroconf/__init__.py:837)
  0.00%   0.00%   0.610s    0.610s   do_commit (sqlalchemy/engine/default.py:546)
  0.00%   0.00%   0.540s     1.16s   read_name (zeroconf/__init__.py:853)
  0.00%   0.00%   0.380s    0.380s   do_close (sqlalchemy/engine/default.py:549)
  0.00%   0.00%   0.340s    0.340s   write (asyncio/selector_events.py:908)

After this change, the Reaper code paths do not show up in the top
10 function sample.

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  4.00%   4.00%    2.72s     2.72s   handle_read (zeroconf/__init__.py:1378)     <== Incoming mDNS
  4.00%   4.00%    1.81s     1.81s   read_utf (zeroconf/__init__.py:837)
  1.00%   5.00%    1.68s     3.51s   read_name (zeroconf/__init__.py:853)
  0.00%   0.00%    1.32s     1.32s   do_execute (sqlalchemy/engine/default.py:593)
  0.00%   0.00%   0.960s    0.960s   readinto (socket.py:669)
  0.00%   0.00%   0.950s    0.950s   create_connection (urllib3/util/connection.py:74)
  0.00%   0.00%   0.910s    0.910s   do_commit (sqlalchemy/engine/default.py:546)
  1.00%   1.00%   0.880s    0.880s   write (asyncio/selector_events.py:908)
  0.00%   0.00%   0.700s    0.810s   __eq__ (zeroconf/__init__.py:606)
  2.00%   2.00%   0.670s    0.670s   unpack (zeroconf/__init__.py:737)
@bdraco bdraco marked this pull request as ready for review September 4, 2020 16:16
@bdraco
Copy link
Member Author

bdraco commented Sep 4, 2020

The reaper continues to be absent from the profile with reverting to a 10s interval 👍 The cpu graph on one of my servers with 3 Home Assistant instances is down ~6% and remains so with the switch.

@jstasiak jstasiak merged commit 1e4aaea into python-zeroconf:master Sep 6, 2020
@jstasiak
Copy link
Collaborator

jstasiak commented Sep 6, 2020

Nice!

@bdraco
Copy link
Member Author

bdraco commented Sep 6, 2020

Thanks again @jstasiak

Do you have a moment to put out a release today? I want to get this in today to go out in Home Assistant 0.115 beta tomorrow as I'm still working with a few users to track performance issues. Any unrelated data we can get out of the profile will go along way to getting better reports.

@jstasiak
Copy link
Collaborator

jstasiak commented Sep 6, 2020

Sure thing, 0.28.4 is out with your change. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants