Performance Numbers ‐ up to v0.10

Here are some sample performance numbers for Duperemove v0.11. See this page for Duperemove v0.10 numbers

The tests are run on a Dell Precision T3610 workstation. You can find more information about the hardware and software setup here.

The kernel version is 4.7.0.

The version of duperemove used here is v0.11beta2 from master branch.

My testing is limited by two technical factors

the total amount of fast disk space I have available (this is not a cost issue but one of technology and convenience)
the data needs to be sourced from somewhere (I don't run a data center)

So I do two primary tests to get an idea of how Duperemove is performing with respect to previous versions.

Many files / Mixed data test

This test gives us a decent idea of file scan performance with a non-trivial number of files. There is a low level of duplication in this data set however the files come from a real world source.

There are about 1.1 million files in the data set totaling ~721 gigabytes of data. Of those files, duperemove finds ~129 thousand of them to be candidates for deduplication. The data itself is a very mixed set of documents, (source code, papers, etc) and media files (ISO images, music, movies, books).

The first two tests measure performance of the file hash and extent finding steps independent of each other. Finally we do a full combined run with dedupe to get a more realistic test.

File scan / hash performance

weyoun2:~ # time duperemove -hr --io-threads=16 --write-hashes=/root/slash-home-pre-dedupe.dup /btrfs/ &> duperemove.log
real    24m21.103s
user    2m41.310s
sys     10m0.896s

Dedupe Performance

weyoun2:~ # time duperemove -dvh --io-threads=16 --read-hashes=/root/slash-home-pre-dedupe.dup &> duperemove.log
real    18m22.071s
user    1m42.245s  
sys     27m16.076s

Full run

We reboot to run with no disk cache present. The numbers until now were just breaking down the first two steps for informational purposes. This is representative of what a user would actually experience if they ran duperemove against this data set. I saved the results to a file to check for errors.

weyoun2:~ # time duperemove -drh --io-threads=16 --hashfile=test.dup /btrfs/ &> full_run.txt
real    42m13.699s
user    3m58.341s
sys     32m4.163s

So, for this run, duperemove took about 42 minutes to hash and dedupe 721 Gigabytes of data. We deduped a total of 79G.

High duplication of data

Compared to our previous test, the data in this test is entirely synthetic. In many ways however this test is far more realistic in terms of what I have learned by talking to users of Duperemove, often via the Duperemove issue tracker.

The data itself is about 40% zeros, with the remaining blocks being filled from a random rotating pattern. There are a total of 112 8G files for a total of 898G of data. We wind up deduping around 385G.

Scan performance is just about where we expect it:

weyoun2:~ # time duperemove -rh  --io-threads=16 --write-hashes=test.dup /btrfs/ &> duperemove.log
real    28m1.884s
user    3m12.104s
sys     4m14.704s

This time we'll proceed directly to the full run. We have scan performance and can work back from there to isolate the dedupe stage. We will see that this takes longer than the mixed data test. The reason for this is the much higher count of duplicate blocks that need to be processed.

Our full run:

weyoun2:~ time duperemove --io-threads=16 -rdh --dedupe-options=nofiemap --hashfile=test.dup /btrfs &> duperemove.log
real    164m28.000s
user    359m12.216s
sys     25m46.192s

Note: I had to turn fiemap off for this test due to a Btrfs bug.

So, for this run, duperemove took about 2 hours and 44 minutes to hash and dedupe 898 Gigabytes of data. We deduped a total of 385G.

Some more notes

Most of the time spent here was in finding extents from our duplicate blocks. The actual dedupe stage takes roughly 50 minutes. Here's the extent finding stage, isolated:

weyoun2:~ # time duperemove --cpu-threads=8 -h --read-hashes=test.dup &> duperemove.log
real    88m18.711s
user    352m39.888s
sys     0m11.220s

Block dedupe

Block dedupe mode skips the extent finding stage and proceeds directly to deduping from our block hashes. This gives us even more performance, at the expense of optimizing some of our dedupes into larger extents. There are more specifics on block dedupe in the man page.

weyoun2:~ # time duperemove --io-threads=16 -rdh --dedupe-options=nofiemap,block --hashfile=block-dedupe.dup /btrfs/ &> duperemove.log
real    66m42.962s
user    3m47.076s
sys     68m38.044s

Differences from v0.10

Duperemove v0.10 does not complete this test in a reasonable amount of time (Note, check without fiemap).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly