Customize block size #177

road2react · 2022-08-12T06:01:20Z

Right now, blocks are limited to 1 MB each. When backing up to a cloud storage, the latency related to reading and writing may be significant, reaching up to several seconds per operation.

A potential way to reduce overhead would be to increase the block size.

Would this be possible?

sourcefrog · 2022-08-12T13:45:26Z

So yes, we could add an option, at least as a way to experiment.

I would like to set a default that is good for the widest range of situations rather than making people work it out.

The block size also determines the granularity with which identical content can be found across different files, or across different versions of the same file. So increasing it is likely to, to some extent, decrease block reuse.

One goal for Conserve is to very aggressively issue lots of parallel IO, making use of Rust's fearless concurrency. It already does this to some extent, and there is room to do much more. Many factors in contemporary systems align with this approach: many cores, deep SSD device command queues, high bandwidth-delay networks. If we have say 10-100 requests in flight then the per-request latency still matters but not as much. So I think this is the main thing to lean on, but it is more complicated to implement than just increasing the target block size.

Another thing to consider there is that Conserve tries to write blocks of a certain size but for various reasons some objects can be smaller; they could be made larger but that would have other tradeoffs e.g. around recovering from an interrupted backup. So again, parallelism for throughput.

Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?

sourcefrog · 2022-08-12T13:54:55Z

If someone is interested in doing this here is a guide:

There are actually a few variables to consider, including:

conserve/src/lib.rs

Lines 91 to 98 in 1c8a847

    
           /// Break blocks at this many uncompressed bytes. 
        
           pub(crate) const MAX_BLOCK_SIZE: usize = 1 << 20; 
        
           /// Maximum file size that will be combined with others rather than being stored alone. 
        
           const SMALL_FILE_CAP: u64 = 100_000; 
        
           /// Target maximum uncompressed size for combined blocks. 
        
           const TARGET_COMBINED_BLOCK_SIZE: usize = MAX_BLOCK_SIZE;

conserve/src/index.rs

Line 31 in 1c8a847

pub const MAX_ENTRIES_PER_HUNK: usize = 1000;

Add variables to BackupOptions
Migrate users of the constants to look in BackupOptions instead
Add command line options to set them: rather than adding a bunch of specific rarely-used options perhaps we should have -o entries_per_hunk=10000
Add CLI tests that use them and then inspect the stats and the contents of the repo

I think that's it.

WolverinDEV · 2022-08-12T15:35:00Z

Hey, just a quick thought:
What happens if we mix a backups with different block size in the same archive?

road2react · 2022-08-12T17:27:46Z

Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?

I'm using Box mounted using rclone. Using the terminal, cd-ing into the mount takes ~1s for a normal directory, but several seconds if there are many files in the directory.

sourcefrog · 2022-08-12T17:28:38Z

Hey, just a quick thought: What happens if we mix a backups with different block size in the same archive?

Nothing too bad: nothing should be making strong assumptions that the blocks are of any particular size.

Unchanged files (same mtime) will continue to reference the blocks they used last time.

Files not recognized as unchanged but which in fact have content in common will no longer match that content if the block size changes, so all their content will be written again to new-sized blocks. That would include cases like: the file was touched (mtime updated with no content change); the file was renamed or copied; more data was appended to the file.

We should still test it of course. And if this is relied upon it should be (more?) explicit in the docs.

If we want larger files probably the index hunks would be the place to start.

There is also an assumption that a number of blocks can be fairly freely held in memory. So we shouldn't make them 2GB or anything extreme like that, where holding 20 simultaneously could cause problems.

sourcefrog · 2022-08-12T17:33:16Z

Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?

I'm using Box mounted using rclone. Using the terminal, cd-ing into the mount takes ~1s for a normal directory, but several seconds if there are many files in the directory.

Interesting... I wonder how many API calls are generated from a single file read or write.

Running conserve with -D may give you an idea which file IOs are slow.

There might be a big win from a transport that talks to the Box API directly, which would be some more work, but perhaps not an enormous amount.

road2react · 2022-08-12T17:43:23Z

I ran with -D and only received the following output:

2022-08-12T17:38:55.829222Z TRACE conserve: tracing enabled
2022-08-12T17:38:55.829283Z DEBUG globset: built glob set; 1 literals, 2 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

Nothing else has been printed and the progress indicator does not show.

Running without -D allows the progress bar to show and indicates that the backup is working.

Without -D, 10 new entries, 0 changed, 0 deleted, 0 unchanged appears to show a new entry 2-3 times per second. (I assume the several seconds caused by cd is due to the traversal of every file in the directory.)

road2react · 2022-08-12T17:53:58Z

There might be a big win from a transport that talks to the Box API directly, which would be some more work, but perhaps not an enormous amount.

I'm using rclone's encryption function, which may not work if talking to the Box API directly.

sourcefrog · 2022-08-12T18:27:57Z

Oh the logging might be on my SFTP branch.

sourcefrog · 2022-08-12T18:33:09Z

I don't think Conserve requests to read any archive directory during a backup. (It does during validate, delete, and gc).

If rclone reads the remote directory repeatedly even when the app does not request it that may be a performance drag regardless of block size.

Perhaps you can get a request log out of rclone?

And, let's split Box/rclone performance to a separate bug.

road2react · 2022-08-13T07:18:38Z

rclone reports a lot of repeated reads:

rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Open: flags=OpenReadOnly
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Open: flags=O_RDONLY
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Open: fd=conserve/b0000/BANDHEAD (r), err=<nil>
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Open: fh=&{conserve/b0000/BANDHEAD (r)}, err=<nil>
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Attr:
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Attr: a=valid=1m0s ino=0 size=56 mode=-rw-r--r--, err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Read: len=4096, offset=0
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ChunkedReader.openRange at 0 length 1048576
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ChunkedReader.Read at 0 length 4096 chunkOffset 0 chunkSize 1048576
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: >Read: read=56, err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Flush:
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: >Flush: err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Release:
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ReadFileHandle.Release closing

These reads repeat several times before reading conserve/d/<id>.

Each instance of the BANDHEAD read operation aligns with an increment of the amount of new entries in the progress display.

Stopping and restarting the backup operation continues reads on b0000, even after new ones are created.

Occasionally, rclone reports

DEBUG : conserve/d/1d6: Re-reading directory (25h16m9.701252182s old)
DEBUG : conserve/d/1d6/: >Lookup: node=conserve/d/1d6/1d61f19d98dbc87b0a6b3b1941b7133aaaebed5a19db44f85b3f42a1d739a36cab1d0e7179ba1297a0338d21873e253662152156fd4f6b551f2b36e8a71a9674, err=<nil>

in succession, for different ids. I believe that this corresponds to writing a block file.

sourcefrog · 2022-08-13T17:22:11Z

This might be connected to #175 just fixed by @WolverinDEV which is one cause of repeatedly re-reading files.

However, there are some other cases where it reads a small file repeatedly in a way that is cheap on a local filesystem (where it will be in cache) but might be very slow remotely. It's definitely worth fixing, and I think I have fixed some in the sftp branch, but there are probably more.

road2react · 2022-08-13T18:31:49Z

With the latest changes:

There looks like there are no more repeated reads. Now, it looks like for each block, it first checks if it is already written. If not, it will create a temp file (tmp08SFF2), write to that file, and rename it to the block id. This appears to be 3 round trips per block, which aligns with the network pattern:

1 MB spike around every second, where each operation has ~300ms round trip.

sourcefrog · 2022-08-13T21:52:25Z

Yep, it does currently

See if the block is already present, in which case we don't need to do the work to compress it.
Write to a temporary file, so that if the process is interrupted we won't be left with an incomplete file under the final name. (This is not 100% guaranteed by the filesystem, but it's the usual pattern.)
Rename into place.

This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.

A few options:

Just issue more parallel IO.
Remember which blocks are referenced by the basis index: we can already assume they're present and should not need to check. (The most common case, of an unchanged file, does not check, but there might be other edge cases. This should be pretty rare.)
Similarly, remember blocks that we've already seen are present. (Cache in RAM for presence of blocks #106)
If we have a Transport API for the remote filesystem, then in some cases that may already support a reliable atomic write that cannot leave the file half-written. For example this should be possible on S3. Then we don't need the rename.
Even on Unix or Windows maybe a faster atomic write is possible?

WolverinDEV · 2022-08-15T12:28:48Z

This might be connected to #175 just fixed by @WolverinDEV which is one cause of repeatedly re-reading files.

What @road2react describes seems to be pretty much, what I experienced as well.
The odds are high, that this has been fixed by #175.

sourcefrog mentioned this issue Aug 13, 2022

Repeated reads of some files causes suboptimal performance on slow or remote filesystems #179

Open

This was referenced Aug 13, 2022

Performance of blockdir prefix directories #180

Open

Optimize look-write-rename for writing blocks #181

Open

sourcefrog added type:format-change issues requiring an archive format change topic:performance labels Aug 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize block size #177

Customize block size #177

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

WolverinDEV commented Aug 12, 2022

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

road2react commented Aug 12, 2022

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

road2react commented Aug 13, 2022 •

edited

Loading

sourcefrog commented Aug 13, 2022

road2react commented Aug 13, 2022

sourcefrog commented Aug 13, 2022

WolverinDEV commented Aug 15, 2022

Customize block size #177

Customize block size #177

Comments

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

WolverinDEV commented Aug 12, 2022

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

road2react commented Aug 12, 2022

road2react commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

sourcefrog commented Aug 12, 2022

road2react commented Aug 13, 2022 • edited Loading

sourcefrog commented Aug 13, 2022

road2react commented Aug 13, 2022

sourcefrog commented Aug 13, 2022

WolverinDEV commented Aug 15, 2022

road2react commented Aug 13, 2022 •

edited

Loading