Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine small files into composite blocks #151

Merged
merged 28 commits into from Nov 1, 2020
Merged

combine small files into composite blocks #151

merged 28 commits into from Nov 1, 2020

Conversation

sourcefrog
Copy link
Owner

Fixes #66

Give it its own loop for walking the tree, so that it can combine small
files.
To iterate a local tree's subtree, start directly there and descend,
rather than walking the whole thing and throwing data away.

MergeTrees works on entry iterators not trees.
No longer uses WriteTree trait
@codecov
Copy link

codecov bot commented Oct 16, 2020

Codecov Report

Merging #151 into main will increase coverage by 0.72%.
The diff coverage is 94.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #151      +/-   ##
==========================================
+ Coverage   83.65%   84.37%   +0.72%     
==========================================
  Files          35       35              
  Lines        3402     3482      +80     
==========================================
+ Hits         2846     2938      +92     
+ Misses        556      544      -12     
Impacted Files Coverage Δ
src/lib.rs 50.00% <ø> (ø)
src/copy_tree.rs 65.78% <33.33%> (-13.70%) ⬇️
src/restore.rs 70.96% <50.00%> (ø)
src/index.rs 87.72% <91.42%> (+1.20%) ⬆️
src/bin/conserve.rs 90.12% <92.68%> (-0.15%) ⬇️
src/backup.rs 94.52% <94.15%> (-0.79%) ⬇️
src/live_tree.rs 77.54% <94.73%> (-0.15%) ⬇️
src/stats.rs 83.48% <97.14%> (+12.65%) ⬆️
src/archive.rs 87.97% <100.00%> (-0.34%) ⬇️
src/band.rs 87.42% <100.00%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44af4e7...9627d37. Read the comment docs.

This changes the  backup file-count stats to include files that later
failed to be stored.
All we need here is a reusable buffer.
The backup code needs to know about index hunk boundaries so that it can
combine and parallelize storage of data within that hunk. And so the
higher-level backup code now tells the index when it's done.
Previously the API required they be added to the IndexBuilder in order,
but that gets in the way of grouping together small files and
parallelizing compression.

Instead, allow them to be added in arbitrary order within the hunk, and
the IndexWriter will sort before serializing.  Backup code is still
responsible for keeping them in order across hunks, and the IndexWriter
checks this.

Also, rename IndexBuilder to IndexWriter.
Factor out code to read with retries
@sourcefrog sourcefrog force-pushed the combine branch 2 times, most recently from 05e020f to f8d8d96 Compare November 1, 2020 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Combine multiple small files into single blocks
1 participant