Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use inotify/equivalent for detecting journal modifications (#17). #19

Merged
merged 3 commits into from
Jun 10, 2019
Merged

Use inotify/equivalent for detecting journal modifications (#17). #19

merged 3 commits into from
Jun 10, 2019

Conversation

witten
Copy link
Contributor

@witten witten commented Jun 6, 2019

Background

See #17. The short version is that this change reduces Beancount-import's CPU usage when idle.

Changes

  • Take a new dependency on Watchdog, a Python library that wraps inotify filesystem change notifications on Linux, and comparable APIs on other OSes.
  • Create a Watchdog observer and handler that simply watches for any filesystem events in the directory that Beancount-import's journal input file resides in. When any such events are received, poke the existing Beancount-import journal check modification function to do its thing.
  • Rip out the periodic callback now that we're using Watchdog/inotify to find out about filesystem changes.
  • Disable Tornado's debug mode unless the existing --debug parameter is passed to Beancount-import. More about that below!

Performance impact

  • As a baseline, I tested without these changes. After initialization, an idle Beancount-import took 3-4% CPU on a particular machine.
  • After making all of the changes above, CPU dropped down to near-0%.
  • In trying to isolate which changes had the most impact, I tried enabling only the journal change detection improvements, while leaving Tornado debug mode on. CPU consumption was around 3%!
  • Then I tried reverting the journal change detection improvements, and just turning off Tornado debug mode. CPU consumption was around 0.7%!

Take-away: Most of the CPU consumption was actually Tornado's debug mode. Why? Well, enabling debug mode also enables autoreload, which polls all source files for changes repeatedly so as to better support the development use case of autoreloading the web server. (I confirmed it was indeed autoreload by looking at Python profiling output.)

Even though the journal change detection improvements don't have a huge impact compared to disabling the debug mode, I personally still think both are worth it. Happy to hear other opinions though!

Testing

  • Touched a journal file and ensured that it triggered a refresh. Observed this in debug output.
  • Modified a journal file with vim and ensured that it triggered a refresh. Observed this both in debug output and in the front-end. I wanted to make sure the Watchdog approach worked with vim, because vim does some tricky shit with file replacement instead of editing the file in place.
  • Touched an unrelated file in the same directory and ensured that it triggered a refresh. Observed this in debug output.

Caveats

  • Since the Watchdog-based changed detection only operates on the directory containing the journal input file, changes to files in unrelated directories (e.g. via an include) won't trigger a refresh. Let me know if you think that's a problem.
  • The change detection triggers when anything changes in the monitored directory. Random file deletes/adds, directory changes, etc. That doesn't seem to be a problem, because all it does is poke the existing Beancount-import code to stat journal files. So if it's a little trigger happy, that's probably fine.

@coveralls
Copy link

coveralls commented Jun 6, 2019

Pull Request Test Coverage Report for Build 80

  • 62 of 75 (82.67%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+2.05%) to 65.738%

Changes Missing Coverage Covered Lines Changed/Added Lines %
beancount_import/webserver.py 23 25 92.0%
beancount_import/webserver_test.py 39 50 78.0%
Totals Coverage Status
Change from base Build 72: 2.05%
Covered Lines: 4333
Relevant Lines: 6394

💛 - Coveralls

@coveralls
Copy link

Pull Request Test Coverage Report for Build 78

  • 0 of 12 (0.0%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.06%) to 63.628%

Changes Missing Coverage Covered Lines Changed/Added Lines %
beancount_import/webserver.py 0 12 0.0%
Files with Coverage Reduction New Missed Lines %
beancount_import/webserver.py 2 0.0%
Totals Coverage Status
Change from base Build 72: -0.06%
Covered Lines: 4111
Relevant Lines: 6334

💛 - Coveralls

@jbms
Copy link
Owner

jbms commented Jun 6, 2019

This is great, thanks!

Potentially it would be nice to make it handle the case of journal files not all being in the same directory by adding all directories containing journal files (and then after the journal reloads, updating the list of directories again).

If you feel like writing a unit test for the reload detection, that would also be particularly awesome, though not necessary considering that a lot of existing functionality also lacks test coverage.

@witten
Copy link
Contributor Author

witten commented Jun 6, 2019

Good call. I was able to change up the file monitoring to reflect all journal file paths, even via includes, and even after updates as you suggest.

Manual testing:

  • Added a debugging print of the current journal paths right before registering monitoring for each path.
  • Started Beancount-import with all journal files in the same directory: /data. Observed via debugging print that it was monitoring just /data.
  • Added an empty /data/tmp/test.beancount file. Observed no changes.
  • Added an include in /data/journal.beancount to pull in /data/tmp/test.beancount. Observed via debugging print that now /data and /data/tmp were being monitored.
  • Added a bogus balance entry to /data/tmp/test.beancount. Observed a refresh and saw a corresponding balance error in the front-end.
  • Removed the bogus balance entry. Observed a refresh and saw the error go away.
  • Removed the include. Observed via debugging print that now just /data was being monitored again.
  • Removed /data/tmp/test.beancount. Observed no change.

I'll look into adding some automated testing to this PR when I get a chance!

@witten
Copy link
Contributor Author

witten commented Jun 6, 2019

Well, it's not pretty, but there's now an automated test similar to the above manual steps for the journal reload detection. Ready for re-review!

@jbms
Copy link
Owner

jbms commented Jun 10, 2019

Thanks!

@jbms jbms merged commit a9fd9c5 into jbms:master Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants