Use inotify/equivalent for detecting journal modifications (#17). #19

witten · 2019-06-06T00:07:14Z

Background

See #17. The short version is that this change reduces Beancount-import's CPU usage when idle.

Changes

Take a new dependency on Watchdog, a Python library that wraps inotify filesystem change notifications on Linux, and comparable APIs on other OSes.
Create a Watchdog observer and handler that simply watches for any filesystem events in the directory that Beancount-import's journal input file resides in. When any such events are received, poke the existing Beancount-import journal check modification function to do its thing.
Rip out the periodic callback now that we're using Watchdog/inotify to find out about filesystem changes.
Disable Tornado's debug mode unless the existing --debug parameter is passed to Beancount-import. More about that below!

Performance impact

As a baseline, I tested without these changes. After initialization, an idle Beancount-import took 3-4% CPU on a particular machine.
After making all of the changes above, CPU dropped down to near-0%.
In trying to isolate which changes had the most impact, I tried enabling only the journal change detection improvements, while leaving Tornado debug mode on. CPU consumption was around 3%!
Then I tried reverting the journal change detection improvements, and just turning off Tornado debug mode. CPU consumption was around 0.7%!

Take-away: Most of the CPU consumption was actually Tornado's debug mode. Why? Well, enabling debug mode also enables autoreload, which polls all source files for changes repeatedly so as to better support the development use case of autoreloading the web server. (I confirmed it was indeed autoreload by looking at Python profiling output.)

Even though the journal change detection improvements don't have a huge impact compared to disabling the debug mode, I personally still think both are worth it. Happy to hear other opinions though!

Testing

Touched a journal file and ensured that it triggered a refresh. Observed this in debug output.
Modified a journal file with vim and ensured that it triggered a refresh. Observed this both in debug output and in the front-end. I wanted to make sure the Watchdog approach worked with vim, because vim does some tricky shit with file replacement instead of editing the file in place.
Touched an unrelated file in the same directory and ensured that it triggered a refresh. Observed this in debug output.

Caveats

Since the Watchdog-based changed detection only operates on the directory containing the journal input file, changes to files in unrelated directories (e.g. via an include) won't trigger a refresh. Let me know if you think that's a problem.
The change detection triggers when anything changes in the monitored directory. Random file deletes/adds, directory changes, etc. That doesn't seem to be a problem, because all it does is poke the existing Beancount-import code to stat journal files. So if it's a little trigger happy, that's probably fine.

coveralls · 2019-06-06T00:20:53Z

Pull Request Test Coverage Report for Build 80

62 of 75 (82.67%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+2.05%) to 65.738%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
beancount_import/webserver.py	23	25	92.0%
beancount_import/webserver_test.py	39	50	78.0%

Totals
Change from base Build 72:	2.05%
Covered Lines:	4333
Relevant Lines:	6394

💛 - Coveralls

coveralls · 2019-06-06T00:20:54Z

Pull Request Test Coverage Report for Build 78

0 of 12 (0.0%) changed or added relevant lines in 1 file are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.06%) to 63.628%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
beancount_import/webserver.py	0	12	0.0%

Files with Coverage Reduction	New Missed Lines	%
beancount_import/webserver.py	2	0.0%

Totals
Change from base Build 72:	-0.06%
Covered Lines:	4111
Relevant Lines:	6334

💛 - Coveralls

jbms · 2019-06-06T02:49:42Z

This is great, thanks!

Potentially it would be nice to make it handle the case of journal files not all being in the same directory by adding all directories containing journal files (and then after the journal reloads, updating the list of directories again).

If you feel like writing a unit test for the reload detection, that would also be particularly awesome, though not necessary considering that a lot of existing functionality also lacks test coverage.

…ed in via includes).

witten · 2019-06-06T05:47:01Z

Good call. I was able to change up the file monitoring to reflect all journal file paths, even via includes, and even after updates as you suggest.

Manual testing:

Added a debugging print of the current journal paths right before registering monitoring for each path.
Started Beancount-import with all journal files in the same directory: /data. Observed via debugging print that it was monitoring just /data.
Added an empty /data/tmp/test.beancount file. Observed no changes.
Added an include in /data/journal.beancount to pull in /data/tmp/test.beancount. Observed via debugging print that now /data and /data/tmp were being monitored.
Added a bogus balance entry to /data/tmp/test.beancount. Observed a refresh and saw a corresponding balance error in the front-end.
Removed the bogus balance entry. Observed a refresh and saw the error go away.
Removed the include. Observed via debugging print that now just /data was being monitored again.
Removed /data/tmp/test.beancount. Observed no change.

I'll look into adding some automated testing to this PR when I get a chance!

witten · 2019-06-06T22:41:24Z

Well, it's not pretty, but there's now an automated test similar to the above manual steps for the journal reload detection. Ready for re-review!

jbms · 2019-06-10T05:03:20Z

Thanks!

Use inotify/equivalent for detecting journal modifications (#17).

d1feaf7

PR feedback: Monitor all journal file paths for changes (e.g. as pull…

2032e30

…ed in via includes).

PR feedback: Add automated tests for inotify-based reload detection.

b52a04d

jbms merged commit a9fd9c5 into jbms:master Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use inotify/equivalent for detecting journal modifications (#17). #19

Use inotify/equivalent for detecting journal modifications (#17). #19

witten commented Jun 6, 2019

coveralls commented Jun 6, 2019 •

edited

coveralls commented Jun 6, 2019

jbms commented Jun 6, 2019

witten commented Jun 6, 2019

witten commented Jun 6, 2019

jbms commented Jun 10, 2019

Use inotify/equivalent for detecting journal modifications (#17). #19

Use inotify/equivalent for detecting journal modifications (#17). #19

Conversation

witten commented Jun 6, 2019

Background

Changes

Performance impact

Testing

Caveats

coveralls commented Jun 6, 2019 • edited

Pull Request Test Coverage Report for Build 80

💛 - Coveralls

coveralls commented Jun 6, 2019

Pull Request Test Coverage Report for Build 78

💛 - Coveralls

jbms commented Jun 6, 2019

witten commented Jun 6, 2019

witten commented Jun 6, 2019

jbms commented Jun 10, 2019

coveralls commented Jun 6, 2019 •

edited