Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support variable sized blocks #4807

Closed
calmh opened this issue Mar 12, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@calmh
Copy link
Member

commented Mar 12, 2018

(I can't believe we didn't already have this in the tracker, but if we did I can't find it.)

We should support blocks of other sizes than 128 KiB at the protocol level. The gain is mostly for large files, where the block list becomes much smaller and the corresponding memory and database juggling becomes much lighter. Currently Syncthing will OOM on very large files just due to the block list becoming very large and getting copied & marshalled multiple times.

The current 128 KiB block size should become the minimum, with block sizes increasing by factors of two up to a maximum - I suggest 16 MiB. When scanning files we select a block size to keep the number of blocks reasonable - I suggest between 1000 and 2000. (When we've reached the maximum block size files can have more blocks than 2000.) This method of selecting the block size is exemplified in the three years old wiki post on the subject.

There should be a hysteresis mechanism so that block sizes are not changed too often for any given file. Devices must be prepared to handle that a file can have a block size different from what would have been selected without bias, both for hysteresis and legacy reasons.

We will still issue requests for data in units of blocks. This makes requests larger for larger files, which is also a gain in throughput, readaheads, etc.

The block size will be added as an attribute on the FileInfo message. This saves having to have the actual block list at hand to know the block size used, which is useful when loading just the FileInfo and not the block list from database. When this attribute is zero or absent we assume the old default block size of 128 KiB.

As large block support will be at best pseudo backwards compatible it will need to be enabled manually to begin with. At some point in the future it can become the default and only way.

@DrSchnagels

This comment has been minimized.

Copy link

commented Mar 13, 2018

So 2000 x 16 MB = 32 GB?
People with such huge files, like me, dont expect a fast scan time. Why not increase it to 128 MB?

@calmh

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2018

Then we’re entering territory where the block itself becomes unwieldy and uses up a lot of memory to juggle. A 1TB file having 62k blocks isn’t a problem, that’s still just a handful and requires less than a megabyte to represent. (Almost 8 million blocks, like today, is a problem.)

Since one of the purposes is to allow devices with limited memory to handle large files, we must consider that there will be a number of block sized buffers required in RAM to sync a file.

@bugith

This comment has been minimized.

Copy link

commented Mar 13, 2018

@calmh

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2018

I wrote that. And linked to it in the issue above.

@bugith

This comment has been minimized.

Copy link

commented Mar 14, 2018

Ooops, sorry, I read "three years old forum post", then thought you forgot it

calmh added a commit to calmh/syncthing that referenced this issue Mar 23, 2018

calmh added a commit to calmh/syncthing that referenced this issue Mar 23, 2018

calmh added a commit to calmh/syncthing that referenced this issue Apr 11, 2018

calmh added a commit to calmh/syncthing that referenced this issue Apr 11, 2018

@calmh calmh added this to the v0.14.48 milestone May 5, 2018

@syncthing syncthing locked and limited conversation to collaborators Apr 17, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.