Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better streaming implementation of fastcdc #15

Closed
tych0 opened this issue Feb 16, 2021 · 1 comment
Closed

better streaming implementation of fastcdc #15

tych0 opened this issue Feb 16, 2021 · 1 comment
Assignees

Comments

@tych0
Copy link
Contributor

tych0 commented Feb 16, 2021

Right now the fastcdc implementation leaves a little to be desired. We really want callbacks when a chunk has been created, so we can purge that buffer and write the chunk out. Since we don't have those, we end up storing all the chunks in memory until we hit a file boundary. This means that if e.g. a file is 5GB, we'll allocate 5GB of memory to chunk it when we really don't need to.

However, the max size we'll ever allocate is the size of the largest file with the current design, so maybe it's OK for now. For "normal" sized files, the max allocation is the size of the largest possible chunk.

@ariel-miculas
Copy link
Collaborator

version 3.0 of fastcdc supports streaming: https://docs.rs/fastcdc/latest/fastcdc/v2020/struct.StreamCDC.html

@ariel-miculas ariel-miculas self-assigned this Feb 12, 2023
ariel-miculas added a commit to ariel-miculas/puzzlefs that referenced this issue Feb 12, 2023
Changed FastCDC parameters to
minimum chunk size: 1MB
average chunk size: 4MB
maximum chunk size: 16MB
due to v2020 library constraints

Fixes project-machine#15

Signed-off-by: Ariel Miculas <amiculas@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants