Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public rs_sig_args() function for recommend signature args from filesize. #109

Merged
merged 33 commits into from Apr 6, 2020

Conversation

dbaarda
Copy link
Member

@dbaarda dbaarda commented Jul 20, 2017

This adds an rs_sig_args() function to librsync.h that can be used to get recommended block_len and strong_len arguments to use for a signature from the old filesize. This is then used by the whole file operations and rdiff for the default --block-size and --sum-size arguments.

This uses the same algorithm used by rsync for the default block_len and and strong_len but is more conservative to reduce the risk of hash collisions (rsync has a whole-file checksum and multiple retries with different hash seeds to mitigate that, librsync doesn't).

This makes rdiff produce significantly smaller signature files and calculate much faster deltas by default while ensuring they are still safe against random hash collisions. However, it also makes it more vulnerable against hostile files crafted to produce hash collisions that would result in those files being corrupted.

There are a few minor tidy-ups I could make to this pull request before submitting it, but I'm posting it now in the hope of prompting discussion about "should rdiff default to safe+fast signature settings, or to paranoid safe settings", and if the proposed API is OK.

An alternative API would be to provide an alternative rs_sig_begin() method (rs_sig_begin2? rs_sig_begin_fs?) that also takes the old filesize to calculate default block_len and/or strong_len arguments if they are zero. Note this API approach could also be used to provide an alternative rs_sig_load() method that takes the signature file size for pre-allocating the blocksum storage instead of the currently non-public approach of setting the filesize in the job struct.

dbaarda added 5 commits Jun 22, 2017
These will be needed for computing the recommended block_len and
strong_len.
…size.

In librsync.h add RS_DEFAULT_STRONG_LEN for default strong_len when no
filesize is provided. Add rs_sig_args() for getting recommended args
for rs_sig_begin() from the old filesize. Update comments to explain
these.

In sumset.[ch] add rs_sig_args_check() macro for checking
rs_sig_begin() arguments are valid, and use it inside the
rs_signature_check() macro. Add implementation of rs_sig_args()
function and use it inside rs_signature_init(). Update
rs_signature_init() arguments to the same types used by rs_sig_begin()
and thus rs_sig_args().

In whole.c make rs_sig_file() get the old filesize and use it with
rs_sig_args() to get the recommended sig_magic, block_len and
strong_len.

In rdiff.c make the default block_len=0 so that rs_sig_file() uses the
recommended block_len based on the filesize from rs_sig_args(). Remove
the default of block_len=8 for hash=md4 which is not really required
for backwards compatibility at all, so it uses the rs_sig_args()
recommendation instead.

Note this change means strong_len=0 no-longer means "max size for
given hash", but instead means "recommended size based on filesize".

Update tests/signature.input expected output files to reflect new
default strong_len and block_len values.
Note block_len is what is used in librsync.h for this function.
@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Aug 31, 2017

Because this is potentially contentious I'm not going to merge this before the next release unless people explicitly request it.

@dbaarda dbaarda mentioned this pull request Aug 31, 2017
dbaarda added 14 commits Oct 1, 2017
Assume files will grow by worst case 1TB, so even small files could
have a very large delta and thus a large number of collision chances.

This makes the recommended strong_sum size 8 to 12 bytes (32TB file
with 1K blocksize), which together with another 2 effective bytes
worth of weaksum is 10~14 hash bytes, which is probably not enough to
protect against a birthday attack.

People concerned about birthday attacks should set their strongsum to
16 (md4sum) or higher (up to 32 for blake2).
With the new more conservative rs_sig_args() the blocksum_size changes
from 5 to 8 for these signature tests.
The more conservative rs_sig_args() change incorrectly used a pointer
instead of the value for the blocksize, and did 1<<40 which overflows
unless you cast using (rs_long_t)1 first.
Make sure rs_sig_args() is exported. Change change rs_get_filesize() to
rs_file_size() in rs_sig_file().
@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Sep 13, 2019

See #129 for thoughts.

I don't think we should do this until we have whole-file hashes implemented to provide at least detection of hash collisions.

I also think rdiff should default to a 12byte strongsum as a compromise between protection and signature size, but users should be able to pass --sum-size=0 to automatically get the smallest sum size recommended for the given file and block size.

Make rs_sig_args() so that the strong_len argument will be increased, not just
log a warning, if the recommended strong_len based on the old_fsize is larger.
This means strong_len can be used to specify a minimum level of
hash-collision-attack protection, and it will be increased if necessary to
avoid accidental collisions based on the file size. Also change the assumed
new size from +1TB to +16MB to reduce the min strong_len from 8bytes to
6bytes.

Update rdiff.c help text to make it clear `-S` sets the minium strong_len.

Update the signature_test expected signature files to reflect the new min
strong_len size of 6 bytes.
@dbaarda dbaarda self-assigned this Sep 15, 2019
dbaarda added 4 commits Sep 18, 2019
Change rs_file_size() to return -1 for unknown file sizes. in librsync.h
update docstrings for rs_sig_args() and rs_file_size() for this. In sumset.c
update rs_sig_args() and rs_signature_init() to use this. This means you can
use old_fsize=0 in rs_sig_args() to mean "don't increase my strong_len"
provided it is >5, which is a minium for even a zero length file.

Update sumset_test.c to use -1 for unknown filesizes and add some tests for
checking invalid blake2 strong_len arguments.
In sumset.c make rs_sig_args() output max_strong_len for input strong_len=0.

In librsync.h update docstrings for rs_sig_args() and rs_sig_begin() to
correctly document the "recommended" and "maximum" behaviour for zero
arguments. Make the argument names for rs_sig_begin() more consistent.

In rdiff.c update the help string for the -S argument to say "(default: max)".

Expand and update tests/signature.test to test different -S argument values,
adding extra expected output files. Rename the expected output files to
include the arguments in the filename, not put them in different directories.
This means provided strong_len will not be adjusted up to recommended sizes.
In particular you can use strong_len<5 if you really want, and using
block_len=0 with old_fsize=0 will give you the default block_len of 2048.

In librsync.h update rs_sig_args() docstring to describe new behaviour. in
sumset.h update docstring on rs_signature_init() to describe "zero gives
defaults" behaviour. In sumset.c update rs_sig_args().

In tests/sumset_test.c add tests for rs_sig_args() and change
rs_signature_init() tests to check behaviour for zero args.
@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Sep 23, 2019

The current version of this pull request has the following rdiff behaviour;

  • -S 0 means "use the maximum full length of the hash used".
  • -S <n> argument specifies a minimum strongsum size, and a longer strongsum will be used if the filesize recommends it.
  • A minimum recommended strongsum size of 12 bytes will be used if the filesize is unknown (eg a pipe to stdin).
  • The minimum recommended strongsum size for a small file (<1 block) is 5 bytes.

In practice this means it's not possible to set a strongsum size <5 for any file, or a strongsum size <12 for piped input. This might be a problem.

People might really want to explicitly set the strongsum size to a small value, either because they know the piped input is small, or they really do want to risk hash collisions.

Do we really want rdiff to treat the -S <n> argument as a minimum hash size? Or should we always use it directly and do something like -S -<n> to mean "recommended, at least <n>"?

Also the minimum recommended strongsum of 5 bytes comes from assuming new_filesize=old_filesize+16MB and using +2bytes safety margin. The +16MB behaves like a miniumum +3bytes. If we dropped that the minimum would be 2bytes (for a 1byte filesize). With RabinKarp the number of useful bits in the weaksum is much improved... do we need the min strongsum length to be 5bytes?

@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Oct 25, 2019

I'm not happy with this yet. I don't particularly like the changed meaning of rdiff's -S strongsum length argument. I'm also not 100% with the librsync API for this.

I'm happy with the behaviour of the -B blocksize argument though, and I think that's were most of the wins with no risks lie.

I guess I need to think on this a bit more... feedback/advise are welcome.

dbaarda added 4 commits Feb 4, 2020
These are parsed by popt with `POPT_ARG_INT` so they should be int, and was
probably a bug for non-little-endian platforms. Note they are automatically
converted (without any compiler warnings) to size_t when they are passed to
functions that take size_t arguments.

Another alternative would have been to change the popt argument type, but it
doesn't have an `POPT_ARG_*` definition for size_t, only `LONG` and
`LONG_LONG` which would risk miss-matching the size_t type. In practice the
int size for a platform are practical limits for these values anyway.
In librsync.h update docstrings and rename RS_DEFAULT_STRONG_LEN to
RS_DEFAULT_MIN_STRONG_LEN.

In rdiff.c update `--help` output.

In sumset.c update rs_sig_args() to understand `strong_len=-1` and rs_log() a
warning if strong_len is too small. Make old_fsize=0 not be special, and make
rs_signature_init() use old_fsize=-1 (unknown) instead.

In sumset.h update docstrings.

Update signature.test and rename expected output files to use `-S=-1` for
minimum strong sum size.

Update sumset_test.c to reflect changes and better cover rs_signature_init()
for different possible arguments.
@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Apr 6, 2020

I've changed it to use -S=-1 to mean use minimum strong_len for the given file size and block_len. This reverts to the master branch behavior of not treating -S as a minimum, but using it directly. This means it is still possible to set strong_len's small enough to be unsafe, but I've added a rs_log() warning if you do this.

I think this is ready to merge. The rdiff behavior is unchanged from the master branch except that -1 can be used as the strong_len to mean "use minimum", and you get warnings logged if you set the strong_len small enough to be unsafe. It possibly also includes a bugfix for rdiff around getopt parsing of int's into size_t that would bite people on non-little-endian platforms.

I'm going to merge this after I update NEWS.md

dbaarda added 2 commits Apr 6, 2020
In NEWS.md update with all changes applied so far.

In CMakeLists.txt and librsync.spec bump the minor version to 2.3.0 to reflect
addional rs_sig_args() and strong_len=-1 support.
@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented Apr 6, 2020

For some reason the travis test results are not currently showing on github, but I can see them on travis-ci.org and they are passing clean.

I'm going to merge this now.

@ericzolf
Copy link

@ericzolf ericzolf commented May 22, 2020

Hi, we use rdiff to check that our usage of the librsync is correct but in the recent past, rdiff has started to change its behaviour with each version jump, see rdiff-backup/rdiff-backup#304. Despite a typo, fixing the rollsum-algorithm was easy enough, but I'm at loss to guess the sum-size value which would give us the old behaviour. Can you help me out?

@dbaarda
Copy link
Member Author

@dbaarda dbaarda commented May 22, 2020

@ericzolf I'm really sorry this caused you guys problems... replying on that bug for you.

Always feel free to file a bug asking for assistance...I will generally respond pretty quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants