Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FastCDC to split chunk #31

Closed
wants to merge 1 commit into from

Conversation

yuanjingsong
Copy link

I tried to use FastCDC algorithm as a split algorithm, and I find it can speed up 20%-30% than the original method when it computes the large files.
In short, FastCDC is a fast chunking algorithm which has a better performance than traditional CDC algorithm. So I tried to use it to speed up chunking phase.
The paper about FastCDC is here

@yuanjingsong
Copy link
Author

It's normal if it cant pass the CI test, because the different split algorithm changes the chunking result.

@fd0
Copy link
Member

fd0 commented Oct 2, 2020

Thanks for your contribution! I'm sorry, this is not something we can merge. There are several programs (including restic, for which I've built this package in 2014, two years before the FastCDC paper) which depend on deterministic output from this library. Changing the chunking algorithm is not possible.

I suggest you publish a chunker package yourself. Please don't use a fork, create a new repo for that . Feel free to use this package as a base, but then please remove every mention of the original chunking method ("Rabin CDC") from all source files, so users can clearly distinguish your new package from this one.

I'm closing this issue for now, please feel free to add further comments!

@fd0 fd0 closed this Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants