Skip to content

file_scan.c: don't use calloc() in csum_whole_file()#318

Merged
JackSlateur merged 1 commit intomarkfasheh:masterfrom
trofi:less-memset-overhead
Nov 2, 2023
Merged

file_scan.c: don't use calloc() in csum_whole_file()#318
JackSlateur merged 1 commit intomarkfasheh:masterfrom
trofi:less-memset-overhead

Conversation

@trofi
Copy link
Contributor

@trofi trofi commented Nov 1, 2023

The setup: create 100K files 1024 bytes each. This is 100MB input:

echo "Creating directory structure, will take a minute"
mkdir dd
for d in `seq 1 100`; do
    mkdir dd/$d
    for f in `seq 1 1000`; do
        printf "%*s" 1024 "$f" > dd/$d/$f
    done
done
sync

Before the change this input took 40 seconds to process:

$ time ./duperemove -q -rd dd/
...
real    0m39,835s
user    1m54,903s
sys     0m8,922s

After the change we get 2x speedup in performance:

$ time ./duperemove -q -rd dd/
...
real    0m14,616s
user    0m11,942s
sys     0m2,580s

The main overhead was in a single calloc(8MB) call against each small file. The change should decrease this setup overhead when running against small files.

The setup: create 100K files 1024 bytes each. This is 100MB input:

    echo "Creating directory structure, will take a minute"
    mkdir dd
    for d in `seq 1 100`; do
        mkdir dd/$d
        for f in `seq 1 1000`; do
            printf "%*s" 1024 "$f" > dd/$d/$f
        done
    done
    sync

Before the change this input took 40 seconds to process:

    $ time ./duperemove -q -rd dd/
    ...
    real    0m39,835s
    user    1m54,903s
    sys     0m8,922s

After the change we get 2x speedup in performance:

    $ time ./duperemove -q -rd dd/
    ...
    real    0m14,616s
    user    0m11,942s
    sys     0m2,580s

The main overhead was in a single `calloc(8MB)` call against each small
file. The change should decrease this setup overhead when running
against small files.
@JackSlateur JackSlateur merged commit ea1c9e4 into markfasheh:master Nov 2, 2023
@JackSlateur
Copy link
Collaborator

Thank you !

@trofi trofi deleted the less-memset-overhead branch November 2, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants