irmin-pack: remove append-only external flush callback #2235

art-w · 2023-04-07T16:17:43Z

I was trying to simplify the logic to update the control file in the file manager. The Append_only_file was using an external callback to signify the need to flush when its buffer was full, which would in turn update the control file (to make that portion of the file visible)... but then we could end up with a visible partially written commit, which isn't an issue in itself, but isn't how we think about the control file payload. As the flush threshold was fairly high, I don't think this "partial flush" was really stress tested either (?)

Anyway it's gone, as it should be fine to just write to the append-only files when their memory buffer is full, we only need to update the control file at the end to publish the writes.

While testing I discovered that I broke the Dict flushing its end_poff to the control file, so I refactored that a bit to match with the chunks/sparse file abstractions (inverting the File_manager dependency). A few breaking changes here:

Previously the dict updates would become visible "in the middle" of the store flushing, now it's at the same time as the rest (... this should be fine since dict updates are only useful to interpret that rest)
Removed the dict consumers callback, as it looked unused (?)

metanivek

This drastically simplifies/clarifies the code. Thanks!

Depending on the benchmark results, we may need to come up with some ideas for the flush threshold.

CHANGES.md

src/irmin-pack/unix/file_manager.ml

metanivek · 2023-07-19T15:49:33Z

src/irmin-pack/unix/file_manager.ml

+    let* () = flush_suffix t in
+    let+ () = flush_control t in
+    List.iter (fun { after_flush } -> after_flush ()) t.suffix_consumers


The previous code skipped writing the control file and running after_flush if the suffix was empty. I like the straightforward nature of your refactor, so I don't think we should necessarily maintain previous behavior (and I don't think it matters in practice). I mostly wanted to point it out in case you have thoughts about it.

Thanks :) I hope the control file rewrite is avoided by line 149 above! (if new_pl = pl then Ok ())
I believe the only suffix consumer is used to clear the staging table in Pack_store so it should have little impact... and I don't think this staging reset should be setup like that anyway, it's a bit obfuscated :P

metanivek · 2023-07-19T15:51:01Z

src/irmin-pack/unix/append_only_file.ml

@@ -21,23 +21,19 @@ module Make (Io : Io.S) (Errs : Io_errors.S with module Io = Io) = struct
  module Io = Io
  module Errs = Errs

+  let auto_flush_threshold = 16_384


I'm running a benchmark to see what impact this has.

Ran a benchmark that was slower but then I remembered that 3.8 changed LRU semantics. I need to re-run with an adjusted LRU size.

I ran the benchmark with adjusted LRU entry size. https://github.com/metanivek/irmin-tezos-benchmarking/tree/rm_auto_flush/benchmarks/2023-07-pr-2235

I'm having issues with generating stats.txt for the benchmark (separate issue) but here are the results.

The first comparison is against 3.7.1 baseline that we've been using.

This makes this PR look like it is a huge performance improvement. Well, then I remembered that the new LRU performs better when compared to 3.7.

So here is this PR compared to the equivalent run for the new LRU (15k entries).

I don't know the exact numbers since I can't get the summary data yet, but I can't imagine it is much different, given how closely these lines are. I think this PR is good to go from a performance perspective!

Thanks a ton for running the benchmarks! It's great to have confirmation that this PR doesn't have any impact as expected :)

codecov-commenter · 2023-08-01T09:11:42Z

Codecov Report

Merging #2235 (04e1790) into main (e6cb805) will decrease coverage by 0.01%.
The diff coverage is 93.24%.

❗ Current head 04e1790 differs from pull request most recent head 02fd19f. Consider uploading reports for the commit 02fd19f to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main    #2235      +/-   ##
==========================================
- Coverage   68.17%   68.17%   -0.01%     
==========================================
  Files         138      138              
  Lines       16739    16733       -6     
==========================================
- Hits        11412    11407       -5     
+ Misses       5327     5326       -1

Files Changed	Coverage Δ
src/irmin-pack/unix/gc_worker.ml	`3.75% <0.00%> (ø)`
src/irmin-pack/unix/pack_store.ml	`78.84% <ø> (ø)`
src/irmin-pack/unix/store.ml	`65.55% <ø> (-0.23%)`	⬇️
src/irmin-pack/unix/file_manager.ml	`86.93% <91.17%> (+1.15%)`	⬆️
src/irmin-pack/unix/append_only_file.ml	`87.30% <93.75%> (+2.30%)`	⬆️
src/irmin-pack/conf.ml	`86.79% <100.00%> (-1.74%)`	⬇️
src/irmin-pack/unix/chunked_suffix.ml	`85.96% <100.00%> (-0.13%)`	⬇️
src/irmin-pack/unix/dict.ml	`95.65% <100.00%> (+1.20%)`	⬆️
src/irmin-pack/unix/sparse_file.ml	`84.49% <100.00%> (ø)`

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

art-w force-pushed the ao-dict branch from 069f6c2 to 73cc593 Compare April 7, 2023 16:29

art-w force-pushed the ao-dict branch 2 times, most recently from 22cf7f8 to 587af95 Compare July 17, 2023 16:23

metanivek reviewed Jul 19, 2023

View reviewed changes

art-w added 7 commits August 1, 2023 10:31

irmin-pack: remove append-only buffered writes settings

df461cd

irmin-pack: refactor Dict

0c972aa

irmin-pack: remove unused dict consumers notification

5c60631

irmin-pack: fix Dict flush

454af2b

irmin-pack: fix Dict usages

d95221a

irmin-pack: fsync Append_only_file only if updated

5b27e9b

irmin-pack: update changelog

02fd19f

art-w force-pushed the ao-dict branch from 587af95 to 02fd19f Compare August 1, 2023 08:59

metanivek merged commit 0ca20cc into mirage:main Aug 1, 2023
4 checks passed

metanivek mentioned this pull request Aug 1, 2023

irmin-pack: provide guidance for auto flush thresholds #2018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

irmin-pack: remove append-only external flush callback #2235

irmin-pack: remove append-only external flush callback #2235

art-w commented Apr 7, 2023

metanivek left a comment

metanivek Jul 19, 2023

art-w Aug 1, 2023

metanivek Jul 19, 2023

metanivek Jul 19, 2023

metanivek Jul 19, 2023 •

edited

Loading

art-w Aug 1, 2023

codecov-commenter commented Aug 1, 2023

irmin-pack: remove append-only external flush callback #2235

irmin-pack: remove append-only external flush callback #2235

Conversation

art-w commented Apr 7, 2023

metanivek left a comment

Choose a reason for hiding this comment

metanivek Jul 19, 2023

Choose a reason for hiding this comment

art-w Aug 1, 2023

Choose a reason for hiding this comment

metanivek Jul 19, 2023

Choose a reason for hiding this comment

metanivek Jul 19, 2023

Choose a reason for hiding this comment

metanivek Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

art-w Aug 1, 2023

Choose a reason for hiding this comment

codecov-commenter commented Aug 1, 2023

Codecov Report

metanivek Jul 19, 2023 •

edited

Loading