New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison with wimlib #16
Comments
Hi @chungy, thanks for letting me know about wimlib! I'll be sure to include it in the benchmarks when I get some time! I'm looking at your
That's 4 solid hours. Whoa! Also, the actual CPU time spent by So here we go,
So
By solid archive presumably you mean that in order to access a single file, you'd have to extract the whole archive, at least up to the point where the file you're looking for has been decompressed? I think DwarFS is somewhere in the middle between a "classic" archive (each file compressed individually) and a solid archive. What it does is:
If you enable metadata compression, in this particular case you'll get a much smaller image as the metadata can be surprisingly well compressed. It's typically around 40%-50%, but in this instance it shrinks down to 20% and saves around 50 MiB. The only drawback of metadata compression is potentially slow mount time of the file system, and in this instance it seems to be well worth it. |
I did some more experiments with regards to scanning speed. I copied all Now, I wanted to ensure that I'm not doing something ridiculously stupid in
as a "gold standard" for how fast it is possible to consume all the data from the SD card. This is significantly longer than the 3 minutes it took After unmounting/mounting the SD card:
I stopped it right there as that's when all the files have been scanned. That's the step that took 4 hours in your case. Here, it takes a bit more than 50 minutes (with ~20% less data), which is pretty much the same as what That all being said, I'm now quite certain that the slowness you saw with |
Hi @chungy, I just did some more reading through the I'll do some more comparisons with matching block sizes and will add them to the documentation. |
I added some wimlib comparison & benchmarks to the documentation for the next release: https://github.com/mhx/dwarfs/tree/next-release#with-wimlib |
Now in the main branch: https://github.com/mhx/dwarfs#with-wimlib |
FWIW, the latest release is about 5 times faster and also creates a significantly smaller image:
With more aggressive similarity ordering, the file size shrinks even more:
|
Ruby/MSys compatibility (pass 3)
DwarFS seems pretty nice. File access within the archive is quick. Main feature seems to be deduplication. Judging by resulting file sizes, I'm guessing this is based on whole file deduplication rather than being block-based?
Downside... DwarFS seems slow to make, compared to both wimlib wimcapture and squashfs.
Testing with a copy of every released Wine version, extracted by doing
for tag in $(git tag); do git archive --prefix=$tag/ $tag | tar -xC /mnt/wine; done
(requires, naturally, the Wine git repository):wimlib is significantly faster to create this massive archive than DwarFS, and the resulting file size is marginally smaller. Git itself stores the Wine history in about 310MB, though that's not the fairest of comparisons given git's delta-based storage and the inclusion of every interim commit between the releases too.
DwarFS still beats out this particular WIM archive for performance as a mounted file system, because I used solid compression and random access in wimlib is not fast in this circumstance. I also think (correct me if I'm wrong!) that a solid archive was the better comparison, since DwarFS seems to group like files together and compress them as one unit (311 blocks in this particular file system). wimcapture's mode compresses each stream individually and the archive size balloons up to 2.4GB while making random access much quicker.
The text was updated successfully, but these errors were encountered: