-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public rs_sig_args() function for recommend signature args from filesize. #109
Conversation
These will be needed for computing the recommended block_len and strong_len.
…size. In librsync.h add RS_DEFAULT_STRONG_LEN for default strong_len when no filesize is provided. Add rs_sig_args() for getting recommended args for rs_sig_begin() from the old filesize. Update comments to explain these. In sumset.[ch] add rs_sig_args_check() macro for checking rs_sig_begin() arguments are valid, and use it inside the rs_signature_check() macro. Add implementation of rs_sig_args() function and use it inside rs_signature_init(). Update rs_signature_init() arguments to the same types used by rs_sig_begin() and thus rs_sig_args(). In whole.c make rs_sig_file() get the old filesize and use it with rs_sig_args() to get the recommended sig_magic, block_len and strong_len. In rdiff.c make the default block_len=0 so that rs_sig_file() uses the recommended block_len based on the filesize from rs_sig_args(). Remove the default of block_len=8 for hash=md4 which is not really required for backwards compatibility at all, so it uses the rs_sig_args() recommendation instead. Note this change means strong_len=0 no-longer means "max size for given hash", but instead means "recommended size based on filesize". Update tests/signature.input expected output files to reflect new default strong_len and block_len values.
Note block_len is what is used in librsync.h for this function.
Because this is potentially contentious I'm not going to merge this before the next release unless people explicitly request it. |
Assume files will grow by worst case 1TB, so even small files could have a very large delta and thus a large number of collision chances. This makes the recommended strong_sum size 8 to 12 bytes (32TB file with 1K blocksize), which together with another 2 effective bytes worth of weaksum is 10~14 hash bytes, which is probably not enough to protect against a birthday attack. People concerned about birthday attacks should set their strongsum to 16 (md4sum) or higher (up to 32 for blake2).
With the new more conservative rs_sig_args() the blocksum_size changes from 5 to 8 for these signature tests.
The more conservative rs_sig_args() change incorrectly used a pointer instead of the value for the blocksize, and did 1<<40 which overflows unless you cast using (rs_long_t)1 first.
Make sure rs_sig_args() is exported. Change change rs_get_filesize() to rs_file_size() in rs_sig_file().
See #129 for thoughts. I don't think we should do this until we have whole-file hashes implemented to provide at least detection of hash collisions. I also think rdiff should default to a 12byte strongsum as a compromise between protection and signature size, but users should be able to pass |
Make rs_sig_args() so that the strong_len argument will be increased, not just log a warning, if the recommended strong_len based on the old_fsize is larger. This means strong_len can be used to specify a minimum level of hash-collision-attack protection, and it will be increased if necessary to avoid accidental collisions based on the file size. Also change the assumed new size from +1TB to +16MB to reduce the min strong_len from 8bytes to 6bytes. Update rdiff.c help text to make it clear `-S` sets the minium strong_len. Update the signature_test expected signature files to reflect the new min strong_len size of 6 bytes.
Change rs_file_size() to return -1 for unknown file sizes. in librsync.h update docstrings for rs_sig_args() and rs_file_size() for this. In sumset.c update rs_sig_args() and rs_signature_init() to use this. This means you can use old_fsize=0 in rs_sig_args() to mean "don't increase my strong_len" provided it is >5, which is a minium for even a zero length file. Update sumset_test.c to use -1 for unknown filesizes and add some tests for checking invalid blake2 strong_len arguments.
In sumset.c make rs_sig_args() output max_strong_len for input strong_len=0. In librsync.h update docstrings for rs_sig_args() and rs_sig_begin() to correctly document the "recommended" and "maximum" behaviour for zero arguments. Make the argument names for rs_sig_begin() more consistent. In rdiff.c update the help string for the -S argument to say "(default: max)". Expand and update tests/signature.test to test different -S argument values, adding extra expected output files. Rename the expected output files to include the arguments in the filename, not put them in different directories.
This means provided strong_len will not be adjusted up to recommended sizes. In particular you can use strong_len<5 if you really want, and using block_len=0 with old_fsize=0 will give you the default block_len of 2048. In librsync.h update rs_sig_args() docstring to describe new behaviour. in sumset.h update docstring on rs_signature_init() to describe "zero gives defaults" behaviour. In sumset.c update rs_sig_args(). In tests/sumset_test.c add tests for rs_sig_args() and change rs_signature_init() tests to check behaviour for zero args.
The current version of this pull request has the following rdiff behaviour;
In practice this means it's not possible to set a strongsum size <5 for any file, or a strongsum size <12 for piped input. This might be a problem. People might really want to explicitly set the strongsum size to a small value, either because they know the piped input is small, or they really do want to risk hash collisions. Do we really want rdiff to treat the Also the minimum recommended strongsum of 5 bytes comes from assuming new_filesize=old_filesize+16MB and using +2bytes safety margin. The +16MB behaves like a miniumum +3bytes. If we dropped that the minimum would be 2bytes (for a 1byte filesize). With RabinKarp the number of useful bits in the weaksum is much improved... do we need the min strongsum length to be 5bytes? |
I'm not happy with this yet. I don't particularly like the changed meaning of rdiff's I'm happy with the behaviour of the I guess I need to think on this a bit more... feedback/advise are welcome. |
These are parsed by popt with `POPT_ARG_INT` so they should be int, and was probably a bug for non-little-endian platforms. Note they are automatically converted (without any compiler warnings) to size_t when they are passed to functions that take size_t arguments. Another alternative would have been to change the popt argument type, but it doesn't have an `POPT_ARG_*` definition for size_t, only `LONG` and `LONG_LONG` which would risk miss-matching the size_t type. In practice the int size for a platform are practical limits for these values anyway.
In librsync.h update docstrings and rename RS_DEFAULT_STRONG_LEN to RS_DEFAULT_MIN_STRONG_LEN. In rdiff.c update `--help` output. In sumset.c update rs_sig_args() to understand `strong_len=-1` and rs_log() a warning if strong_len is too small. Make old_fsize=0 not be special, and make rs_signature_init() use old_fsize=-1 (unknown) instead. In sumset.h update docstrings. Update signature.test and rename expected output files to use `-S=-1` for minimum strong sum size. Update sumset_test.c to reflect changes and better cover rs_signature_init() for different possible arguments.
I've changed it to use I think this is ready to merge. The rdiff behavior is unchanged from the master branch except that I'm going to merge this after I update NEWS.md |
In NEWS.md update with all changes applied so far. In CMakeLists.txt and librsync.spec bump the minor version to 2.3.0 to reflect addional rs_sig_args() and strong_len=-1 support.
For some reason the travis test results are not currently showing on github, but I can see them on travis-ci.org and they are passing clean. I'm going to merge this now. |
Hi, we use rdiff to check that our usage of the librsync is correct but in the recent past, rdiff has started to change its behaviour with each version jump, see rdiff-backup/rdiff-backup#304. Despite a typo, fixing the rollsum-algorithm was easy enough, but I'm at loss to guess the sum-size value which would give us the old behaviour. Can you help me out? |
@ericzolf I'm really sorry this caused you guys problems... replying on that bug for you. Always feel free to file a bug asking for assistance...I will generally respond pretty quickly. |
This adds an rs_sig_args() function to librsync.h that can be used to get recommended block_len and strong_len arguments to use for a signature from the old filesize. This is then used by the whole file operations and rdiff for the default --block-size and --sum-size arguments.
This uses the same algorithm used by rsync for the default block_len and and strong_len but is more conservative to reduce the risk of hash collisions (rsync has a whole-file checksum and multiple retries with different hash seeds to mitigate that, librsync doesn't).
This makes rdiff produce significantly smaller signature files and calculate much faster deltas by default while ensuring they are still safe against random hash collisions. However, it also makes it more vulnerable against hostile files crafted to produce hash collisions that would result in those files being corrupted.
There are a few minor tidy-ups I could make to this pull request before submitting it, but I'm posting it now in the hope of prompting discussion about "should rdiff default to safe+fast signature settings, or to paranoid safe settings", and if the proposed API is OK.
An alternative API would be to provide an alternative rs_sig_begin() method (rs_sig_begin2? rs_sig_begin_fs?) that also takes the old filesize to calculate default block_len and/or strong_len arguments if they are zero. Note this API approach could also be used to provide an alternative rs_sig_load() method that takes the signature file size for pre-allocating the blocksum storage instead of the currently non-public approach of setting the filesize in the job struct.