-
Notifications
You must be signed in to change notification settings - Fork 75
Performance Numbers ‐ up to v0.10
Here are some sample performance numbers for Duperemove v0.11. See this page for Duperemove v0.10 numbers
The tests are run on a Dell Precision T3610 workstation. You can find more information about the hardware and software setup here.
The kernel version is 4.7.0.
The version of duperemove used here is v0.11beta2 from master branch.
My testing is limited by two technical factors
- the total amount of fast disk space I have available (this is not a cost issue but one of technology and convenience)
- the data needs to be sourced from somewhere (I don't run a data center)
So I do two primary tests to get an idea of how Duperemove is performing with respect to previous versions.
This test gives us a decent idea of file scan performance with a non-trivial number of files. There is a low level of duplication in this data set however the files come from a real world source.
There are about 1.1 million files in the data set totaling ~721 gigabytes of data. Of those files, duperemove finds ~129 thousand of them to be candidates for deduplication. The data itself is a very mixed set of documents, (source code, papers, etc) and media files (ISO images, music, movies, books).
The first two tests measure performance of the file hash and extent finding steps independent of each other. Finally we do a full combined run with dedupe to get a more realistic test.
weyoun2:~ # time duperemove -hr --io-threads=16 --write-hashes=/root/slash-home-pre-dedupe.dup /btrfs/ &> duperemove.log
real 24m21.103s
user 2m41.310s
sys 10m0.896s
weyoun2:~ # time duperemove -dvh --io-threads=16 --read-hashes=/root/slash-home-pre-dedupe.dup &> duperemove.log
real 18m22.071s
user 1m42.245s
sys 27m16.076s
We reboot to run with no disk cache present. The numbers until now were just breaking down the first two steps for informational purposes. This is representative of what a user would actually experience if they ran duperemove against this data set. I saved the results to a file to check for errors.
weyoun2:~ # time duperemove -drh --io-threads=16 --hashfile=test.dup /btrfs/ &> full_run.txt
real 42m13.699s
user 3m58.341s
sys 32m4.163s
So, for this run, duperemove took about 42 minutes to hash and dedupe 721 Gigabytes of data. We deduped a total of 79G.
Compared to our previous test, the data in this test is entirely synthetic. In many ways however this test is far more realistic in terms of what I have learned by talking to users of Duperemove, often via the Duperemove issue tracker.
The data itself is about 40% zeros, with the remaining blocks being filled from a random rotating pattern. There are a total of 112 8G files for a total of 898G of data. We wind up deduping around 385G.
Scan performance is just about where we expect it:
weyoun2:~ # time duperemove -rh --io-threads=16 --write-hashes=test.dup /btrfs/ &> duperemove.log
real 28m1.884s
user 3m12.104s
sys 4m14.704s
This time we'll proceed directly to the full run. We have scan performance and can work back from there to isolate the dedupe stage. We will see that this takes longer than the mixed data test. The reason for this is the much higher count of duplicate blocks that need to be processed.
Our full run:
weyoun2:~ time duperemove --io-threads=16 -rdh --dedupe-options=nofiemap --hashfile=test.dup /btrfs &> duperemove.log
real 164m28.000s
user 359m12.216s
sys 25m46.192s
Note: I had to turn fiemap off for this test due to a Btrfs bug.
So, for this run, duperemove took about 2 hours and 44 minutes to hash and dedupe 898 Gigabytes of data. We deduped a total of 385G.
Most of the time spent here was in finding extents from our duplicate blocks. The actual dedupe stage takes roughly 50 minutes. Here's the extent finding stage, isolated:
weyoun2:~ # time duperemove --cpu-threads=8 -h --read-hashes=test.dup &> duperemove.log
real 88m18.711s
user 352m39.888s
sys 0m11.220s
Block dedupe mode skips the extent finding stage and proceeds directly to deduping from our block hashes. This gives us even more performance, at the expense of optimizing some of our dedupes into larger extents. There are more specifics on block dedupe in the man page.
weyoun2:~ # time duperemove --io-threads=16 -rdh --dedupe-options=nofiemap,block --hashfile=block-dedupe.dup /btrfs/ &> duperemove.log
real 66m42.962s
user 3m47.076s
sys 68m38.044s
Duperemove v0.10 does not complete this test in a reasonable amount of time (Note, check without fiemap).