file_scan.c: don't use calloc() in csum_whole_file()#318
Merged
JackSlateur merged 1 commit intomarkfasheh:masterfrom Nov 2, 2023
trofi:less-memset-overhead
Merged
file_scan.c: don't use calloc() in csum_whole_file()#318JackSlateur merged 1 commit intomarkfasheh:masterfrom trofi:less-memset-overhead
JackSlateur merged 1 commit intomarkfasheh:masterfrom
trofi:less-memset-overhead
Conversation
The setup: create 100K files 1024 bytes each. This is 100MB input:
echo "Creating directory structure, will take a minute"
mkdir dd
for d in `seq 1 100`; do
mkdir dd/$d
for f in `seq 1 1000`; do
printf "%*s" 1024 "$f" > dd/$d/$f
done
done
sync
Before the change this input took 40 seconds to process:
$ time ./duperemove -q -rd dd/
...
real 0m39,835s
user 1m54,903s
sys 0m8,922s
After the change we get 2x speedup in performance:
$ time ./duperemove -q -rd dd/
...
real 0m14,616s
user 0m11,942s
sys 0m2,580s
The main overhead was in a single `calloc(8MB)` call against each small
file. The change should decrease this setup overhead when running
against small files.
Collaborator
|
Thank you ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The setup: create 100K files 1024 bytes each. This is 100MB input:
Before the change this input took 40 seconds to process:
After the change we get 2x speedup in performance:
The main overhead was in a single
calloc(8MB)call against each small file. The change should decrease this setup overhead when running against small files.