Multi-stream rsync wrapper
Switch branches/tags
Nothing to show
Clone or download
jbd Merge pull request #13 from madmax2012/master
Always use python2 interpreter
Latest commit cb65f45 Nov 11, 2018
Failed to load latest commit information.
.gitignore Initial release May 17, 2015
LICENSE Initial release May 17, 2015
Makefile Initial release May 17, 2015 Markdow typo Oct 12, 2018
msrsync Always use python2 interpreter Nov 11, 2018

msrsync: maximize rsync bandwidth usage

msrsync (multi-stream rsync) is a python wrapper around rsync. It only depends on python >= 2.6 and rsync.

It will split the transfer in multiple buckets while the source is scanned and will hopefully help maximizing the usage of the available bandwidth by running a configurable number of rsync processes in parallel. The main limitation is it does not handle remote source or target directory, they must be locally accessible (local disk, nfs/cifs/other mountpoint). I hope to address this in a near future.

Quick example

$ msrsync -p 4 /source /destination # you can also use -P/--progress and --stats options

This will copy /source directory in the /destination directory (same behaviour as rsync regarding the slash handling) using 4 rsync processes (using "-aS --numeric-ids" as default option. Could be override with --rsync option). msrsync will split the files and directory list into bucket of 1G or 1000 files maximum (see --size and --files options) before feeding them to each rsync process in parallel using the --files-from option. As long as the source and the destination can cope with the parallel I/O (think big boring "enterprise grade" NAS), it should be faster than a single rsync.

msrsync shares the same spirit as fpart (and its fpsync associated tool) by Ganaël Laplanche or parsync by Harry Mangalam. Those are two fantastic much more complete tools used in the field to do real work. Please check them out, they might be what you're looking for.

You can also check fcp from the pcircle project. It looks very powerful. See the associated publication.


Why write msrsync if tools like fpart, parsync or pftool exist ? While reasonable, their dependencies can be a point of friction given the constraints we can have on a given system. When you're lucky, you can use your package manager (fpart seems to be well supported among various GNU/Linux and FreeBSD distribution: FreeBSD, Debian, Ubuntu, Archlinux, OBS) to deal with the requirements but more often than not, I found myself struggling with the sad state of the machine I'm working with.

That's why the only dependencies of msrsync are python >=2.6 and rsync. What python 2.6 ? I'm aiming RHEL6 like distribution as a minimum requirement here, so I'm stuck with python 2.6. I miss some cool features, but that's part of the project.

The devil is in the details. If you need a starting point to think about data migration, this overview by Jeff Layton is very informative: Moving Your Data – It’s Not Always Pleasant.

The "How to transfer large amounts of data via network" article by parsync author is updated regularly and its worth a read also.

If you can read french, I co-wrote an article with Ganaël Laplanche about fpart : Parallélisez vos transferts de fichiers.

You might be also interested by this Intel whitepaper on data migration : Data Migration with Intel® Enterprise Edition for Lustre* Software which mentions all of those tools (but not msrsync).


python >= 2.6 and rsync


msrsync is a single python file, you just have to download it. Or if you prefer, you can clone the repository and use the provided Makefile:

$ wget && chmod +x msrsync


$ git clone && cd msrsync && sudo make install


$ msrsync --help
usage: msrsync [options] [--rsync "rsync-options-string"] SRCDIR [SRCDIR2...] DESTDIR
   or: msrsync --selftest

msrsync options:
    -p, --processes ...   number of rsync processes to use [1]
    -f, --files ...       limit buckets to <files> files number [1000]
    -s, --size ...        limit partitions to BYTES size (1024 suffixes: K, M, G, T, P, E, Z, Y) [1G]
    -b, --buckets ...     where to put the buckets files (default: auto temporary directory)
    -k, --keep            do not remove buckets directory at the end
    -j, --show            show bucket directory
    -P, --progress        show progress
    --stats               show additional stats
    -d, --dry-run         do not run rsync processes
    -v, --version         print version

rsync options:
    -r, --rsync ...       MUST be last option. rsync options as a quoted string ["-aS --numeric-ids"]. The "--from0 --files-from=... --quiet --verbose --stats --log-file=..." options will ALWAYS be added, no
                            matter what. Be aware that this will affect all rsync *from/filter files if you want to use them. See rsync(1) manpage for details.

self-test options:
    -t, --selftest        run the integrated unit and functional tests
    -e, --bench           run benchmarks
    -g, --benchshm        run benchmarks in /dev/shm or the directory in $SHM environment variable

If you want to use specific options for the rsync processes, use the --rsync option.

$ msrsync -p4 --rsync "-a --numeric-ids --inplace" source destination

Some examples:

$ msrsync -p 8 /usr/share/doc/ /tmp/doc/
$ msrsync -P -p 8 /usr/share/doc/ /tmp/doc/
[33491/33491 entries] [602.1 M/602.1 M transferred] [3378 entries/s] [60.7 M/s bw] [monq 1] [jq 1]
$ msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/
[33491/33491 entries] [602.1 M/602.1 M transferred] [3533 entries/s] [63.5 M/s bw] [monq 1] [jq 1]
Working directory: /home/jbdenis/Code/msrsync
Command line: ./msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/
Total size: 602.1 M
Total entries: 33491
Buckets number: 34
Mean entries per bucket: 985
Mean size per bucket: 17.7 M
Entries per second: 3533
Speed: 63.5 M/s
Rsync workers: 8
Total rsync's processes (34) cumulative runtime: 73.0s
Crawl time: 0.4s (4.3% of total runtime)
Total time: 9.5s


You can launch a benchmark using the --bench option or make bench. It is only for testing purpose. They are comparing the performance between vanilla rsync and msrsync using multiple options. Since I'm just creating a huge fake file tree with empty files, you won't see any msrsync benefits here, unless you're trying with many many files. They need to be run as root since I'm dropping disk cache between run.

$ sudo make bench # or sudo msrsync --bench
Benchmarks with 100000 entries (95% of files):
rsync -a --numeric-ids took 14.05 seconds (speedup x1)
msrsync --processes 1 --files 1000 --size 1G took 18.58 seconds (speedup x0.76)
msrsync --processes 2 --files 1000 --size 1G took 10.61 seconds (speedup x1.32)
msrsync --processes 4 --files 1000 --size 1G took 6.60 seconds (speedup x2.13)
msrsync --processes 8 --files 1000 --size 1G took 6.58 seconds (speedup x2.14)
msrsync --processes 16 --files 1000 --size 1G took 6.66 seconds (speedup x2.11)

Please test on real data instead =). There is also a --benchshm option that will perform the benchmark in /dev/shm.

Here is a real test on a big nas box (not known for handling small files well) on a 1G network (you'll see that is more than useless due to the I/O overhead) with the linux 4.0.4 kernel decompressed source 21 times in different folders:

$ ls /mnt/nfs/linux-src/
0  1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9
$ du -s --apparent-size --bytes /mnt/nfs/linux-src
11688149821     /mnt/nfs/linux-src
$ du -s --apparent-size --human /mnt/nfs/linux-src
11G     /mnt/nfs/linux-src
$ find /mnt/nfs/linux-src -type f | wc -l
$ find /mnt/nfs/linux-src -type d | wc -l

The source and the destination are on an nfs mount.

Let's run rsync and msrsync with a various number of process:

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ time rsync -a --numeric-ids /mnt/nfs/linux-src /mnt/nfs/dest

real    136m10.406s
user    1m54.939s
sys     7m31.188s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 1 /mnt/nfs/linux-src /mnt/nfs/dest

real    144m8.954s
user    2m20.426s
sys     8m4.127s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 2 /mnt/nfs/linux-src /mnt/nfs/dest

real    73m57.312s
user    2m27.543s
sys     7m56.484s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 4 /mnt/nfs/linux-src /mnt/nfs/dest

real    42m31.105s
user    2m24.196s
sys     7m46.568s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 8 /mnt/nfs/linux-src /mnt/nfs/dest

real    36m55.141s
user    2m27.149s
sys     7m40.392s

$ rm -rf /mnt/nfs/dest
$ echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
$ msrsync -p 16 /mnt/nfs/linux-src /mnt/nfs/dest

real    33m0.976s
user    2m35.848s
sys     7m40.623s

Ridiculous rates due to the size of each file and the I/O overhead (nfs + network), but that's a real use case and we've got nice speedup without too much thinking : just use msrync and you're good to go. That's exactly what I wanted. Here is a summary of the previous results:

 Command Time Entries per second Bandwidth (MBytes/s)  Speedup
rsync 136m10s 133 1.36 x1
msrsync -p 1 144m9s 126 1.28 x0.94
msrsync -p 2 73m57s 246 2.51 x1.84
msrsync -p 4 42m31s 428 4.36 x3.20
msrsync -p 8 36m55s 494 5.03 x3.68
msrsync -p 16 33m0s 552 5.62 x4.12

Astute readers will notify the slight overhead of msrync over the equivalent rsync in the single process case. This overhead vanishes (but still exists) when you increase processes number.


  • The rsync processes are always run with the --from0 --files-from=... --quiet --verbose --stats --log-file=... options, no matter what. --from0 option affects --exclude-from, --include-from, --files-from, and any merged files specified in a --filter rule.

  • This may seem obvious but if the source or the destination of the copy cannot handle parallel I/O well, you won't see any benefits (quite the opposite in fact) using msrsync.


I'm targeting python 2.6 without external dependencies besides rsync. The provided Makefile is just an helper around the embedded testing and

$ make help
Please use `make <target>' where <target> is one of
  clean         => clean all generated files
  cov           => coverage report using /usr/bin/python-coverage (use COVERAGE env to change that)
  covhtml       => coverage html report
  man           => build manpage
  test          => run embedded tests
  install       => install msrsync in /usr/bin (use DESTDIR env to change that)
  lint          => run pylint
  bench         => run benchmarks (linux only. Need root to drop buffer cache between run)
  benchshm      => run benchmarks using /dev/shm (linux only. Need root to drop buffer cache between run)

There is an integrated test suite (--selftest option, or make test). Since I'm using unittest from python 2.6 library, I cannot capture the output of the tests (buffer parameter from TestResult object appeared in 2.7).

$ make test # or msrsync --selftest
test_get_human_size (__main__.TestHelpers)
convert bytes to human readable string ... ok
test_get_human_size2 (__main__.TestHelpers)
convert bytes to human readable string ... ok
test_human_size (__main__.TestHelpers)
convert human readable size to bytes ... ok
test simple msrsync synchronisation ... ok
test_msrsync_cli_2_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_msrsync_cli_4_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_msrsync_cli_8_processes (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_simple_msrsync_cli (__main__.TestSyncCLI)
test simple msrsync synchronisation ... ok
test_simple_rsync (__main__.TestSyncCLI)
test simple rsync synchronisation ... ok

Ran 29 tests in 3.320s