Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Support variable sized blocks #4807
(I can't believe we didn't already have this in the tracker, but if we did I can't find it.)
We should support blocks of other sizes than 128 KiB at the protocol level. The gain is mostly for large files, where the block list becomes much smaller and the corresponding memory and database juggling becomes much lighter. Currently Syncthing will OOM on very large files just due to the block list becoming very large and getting copied & marshalled multiple times.
The current 128 KiB block size should become the minimum, with block sizes increasing by factors of two up to a maximum - I suggest 16 MiB. When scanning files we select a block size to keep the number of blocks reasonable - I suggest between 1000 and 2000. (When we've reached the maximum block size files can have more blocks than 2000.) This method of selecting the block size is exemplified in the three years old wiki post on the subject.
There should be a hysteresis mechanism so that block sizes are not changed too often for any given file. Devices must be prepared to handle that a file can have a block size different from what would have been selected without bias, both for hysteresis and legacy reasons.
We will still issue requests for data in units of blocks. This makes requests larger for larger files, which is also a gain in throughput, readaheads, etc.
The block size will be added as an attribute on the FileInfo message. This saves having to have the actual block list at hand to know the block size used, which is useful when loading just the FileInfo and not the block list from database. When this attribute is zero or absent we assume the old default block size of 128 KiB.
As large block support will be at best pseudo backwards compatible it will need to be enabled manually to begin with. At some point in the future it can become the default and only way.
Then we’re entering territory where the block itself becomes unwieldy and uses up a lot of memory to juggle. A 1TB file having 62k blocks isn’t a problem, that’s still just a handful and requires less than a megabyte to represent. (Almost 8 million blocks, like today, is a problem.)
Since one of the purposes is to allow devices with limited memory to handle large files, we must consider that there will be a number of block sized buffers required in RAM to sync a file.
Hi Jakob. Did you search this : https://github.com/syncthing/syncthing/wiki/Variable-Block-Size ?