Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I benchmarked desync and alternatives #243

Closed
safinaskar opened this issue Jul 29, 2023 · 7 comments
Closed

I benchmarked desync and alternatives #243

safinaskar opened this issue Jul 29, 2023 · 7 comments

Comments

@safinaskar
Copy link

So here is my own simplistic parallel casync/desync alternative, written in Rust, which uses fixed sized chunking (which is great for VM images): borgbackup/borg#7674 (comment) . You can also see there benchmark, which compares my tool to casync, desync and other alternatives. And my tool is way faster than all them. (But I cheat by using fixed sized chunking). See whole issue for context and especially this comment borgbackup/borg#7674 (comment) for comparison between casync, desync and other CDC-based tools.

As you can see from benchmark, desync performs well compared to other CDC-based tools. Still, borg is significantly faster when compressing (despite borg not being parallel!!!!!)

@safinaskar
Copy link
Author

Also, as you can see (CDC-based) rdedup absolutely beats desync on chunk size 4096 KiB on compression

@safinaskar
Copy link
Author

Okay, so here is list of Github issues I spammed wrote in last few days on this topic (i. e. fast fixed-sized and CDC-based deduplication). I hope they provide great insight to everyone interested in fast deduplicated storage.
borgbackup/borg#7674
systemd/casync#259
#243
ipfs/specs#227
dpc/rdedup#222
opencontainers/umoci#256

@charles-dyfis-net
Copy link
Collaborator

charles-dyfis-net commented Jul 30, 2023

Are you sure this is an appropriate topic for GitHub issues at all? A blog or forum post might make more sense; I don't see anything to fix here, and things that need to be fixed are the beginning and end of what issues are for.

@safinaskar
Copy link
Author

@charles-dyfis-net, borg is faster in compressing than desync, despite borg is single-threaded and desync is parallel. On same settings (same chunking method and same compression). This means that desync has some bug, which needs to be fixed

@folbricht
Copy link
Owner

I'm having a hard time understanding the methodology to see what's even being compared and what the output is supposed to show, and I'm somewhat skeptical that single-thread decompression (is that what you tried to test), would be faster than multithreaded decompression for reasonably-sized data. If you believe there's an issue, please provide a bit more evidence, perhaps code-snippets or a way to repro, ideally in a new issue.

@charles-dyfis-net
Copy link
Collaborator

charles-dyfis-net commented Jul 30, 2023

While it's not my final say, I'm not convinced that the existence of more performant competitors constitutes a bug. There's more than one axis of competition; runtime is only one, and not necessarily the most important one.

The impetus for desync's initial creation arguably had more to do with robustness, implementation (and error handling) clarity, safety/security (moving to a memory-safe language) and compatibility with casync as an existing tool more than performance -- better performance than casync was a happy side effect, not a primary goal, and we would have gone ahead with desync even had it been slower.

If there's a specific known respect in which the performance of desync's implementation can be improved without compromising maintainability, error recovery, robustness, etc -- that's great! But without much more work into root-causing the difference and analyzing the tradeoffs, this ticket strikes me as much more theoretical than actionable.

@safinaskar
Copy link
Author

@folbricht:

and I'm somewhat skeptical that single-thread decompression

I'm talking about compression (i. e. desync make), not decompression.

Okay, I created issue with exact reproducing steps as requested: #244 . I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants