Skip to content

duperemove-master hangup with a reproducer #316

@trofi

Description

@trofi

I think I have a reproducer script of a hanging duperemove. I initially wanted to use it to measure scalability bottlenect of duperemove, but looks like I got it to get stuck:

#!/usr/bin/env bash

rm -fv /tmp/h1K.db /tmp/h1M.db

# create a directory suitable for deduping:
# it contains 1M files of size 1024 bytes.
if [[ ! -d dd ]]; then
    echo "Creating directory structure, will take a minute"
    mkdir dd
    for d in `seq 1 1000`; do
        mkdir -v dd/$d
        for f in `seq 1 1000`; do
            printf "%*s" 1024 "$f" > dd/$d/$f
        done
    done
    sync
fi

echo "duperemove defaults, batch of size 1M"
time { ./duperemove -q --batchsize=1000000 -rd --hashfile=/tmp/h1M.db dd/ >/dev/null 2>&1; }

echo "duperemove defaults, batch of size 1024"
time { ./duperemove -q                     -rd --hashfile=/tmp/h1K.db dd/ >/dev/null 2>&1; }
$ time ./bench.bash
duperemove defaults, batch of size 1M
^C^X
real    164m12,365s
user    154m14,324s
sys     1m42,050s

Note: there is no progress over two hours. I think it should succeed in minutes (or tens of minutes worst). I ran it on compressed btrfs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions