Add delete option? #2

DarrienG · 2021-01-29T04:50:25Z

Having an all in one binary would be amazing. If this supported deleting all dupes after finding, it would be great.

jRimbault · 2021-01-29T12:14:10Z

Most [all] of the issues related to that feature would be around UI/UX.

Do I just delete all duplicates ? obviously not, so I have to somehow defer control to the user over what gets deleted or not, there are different ways to go around that. Show each groups of duplicate to the user, let them choose which gets deleted, how do I present them each group ? A group can grow quite large and cumbersome for a human to handle. Just expose a set of options, flags and switches to act as criteria for deletion ? But those would surely be different for each set of duplicate.
And then there are the easy technical aspects, do I build an interactive mode into the main tool or do I output a dedicated script like rmlint does ? rmlint's script is my preferred way, I find it quite clever in fact, though I have style issues with the script it outputs.

I haven't thought of a good way to solve all that ? I'm open to bouncing ideas.

DarrienG · 2021-01-29T14:17:03Z

Honestly if there were just a --delete-all-dupes option without input I would be ok with that. Nice and simple, just delete them all.

maluramichael · 2021-02-09T19:58:27Z

For my case it would be nice to leave just the oldest one and remove everything else. I try to cleanup up a huge drive full of family photos. They are so heavy cluttered and duplicated. So i would look up the exif create date.

But that is just one case. I would be fine with some kind of interface inside the code. So we can extend the behaviour on our own.

A function which gets a list of the duplicated list and returns a new list with filenames that need to be deleted.

jRimbault · 2021-02-09T23:13:24Z

Thank you for your feedback, it adds to the list of items I'll keep in mind in the future.

I'm still not sure how to proceed (or if at all) with this feature. I have been thinking (in the back of my mind) about it for quite some time now.

In the meantime would you be able to make this kind of solution work ?

Running yadf path/to/your/files -f ldjson | python_script.py. Piping the line delimited json output to a python script doing the deletion with your own criteria ?

~~Untested~~ Tested a bit :

#!/usr/bin/env python3

import fileinput
import json
import os


def main():
    for line in fileinput.input():
        files = json.loads(line)
        files.sort(key=exifdate)
        for filename in files[1:]:
            os.remove(filename)


def exifdate(filename):
    # get the exif date for each file
    # I don't know how to extract that information with the python stdlib
    # I'd expect PIL/Pillow has something for that, but it's a third party package
    return filename

if __name__ == "__main__":
    main()

Or a more elaborate script in this example.

hyperfine -w 10 "./target/release/yadf ~" "./target/release/yadf -H ~" "./target/release/yadf ~" Benchmark #1: ./target/release/yadf ~ Time (mean ± σ): 2.977 s ± 0.031 s [User: 9.598 s, System: 13.738 s] Range (min … max): 2.935 s … 3.021 s 10 runs Benchmark #2: ./target/release/yadf -H ~ Time (mean ± σ): 3.785 s ± 0.040 s [User: 9.698 s, System: 13.917 s] Range (min … max): 3.730 s … 3.886 s 10 runs Benchmark #3: ./target/release/yadf ~ Time (mean ± σ): 2.954 s ± 0.025 s [User: 9.555 s, System: 13.737 s] Range (min … max): 2.919 s … 2.991 s 10 runs Summary './target/release/yadf ~' ran 1.01 ± 0.01 times faster than './target/release/yadf ~' 1.28 ± 0.02 times faster than './target/release/yadf -H ~'

GGG-KILLER · 2021-11-23T22:05:31Z

In my case I'd prefer hardlinking the duplicate files so only 1 remains on disk.

EDIT: Maybe have a --merge-mode flag?
Then have a few options like:

delete-older
delete-newer
hardlink-older
hardlink-newer
softlink-older
softlink-newer

Though this would lead into the issue of "What's 'older' and what's 'newer'?" Do we check creation time, modification time or access time?

jRimbault added enhancement New feature or request investigate This needs to be researched labels Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add delete option? #2

Add delete option? #2

DarrienG commented Jan 29, 2021

jRimbault commented Jan 29, 2021 •

edited

Loading

DarrienG commented Jan 29, 2021

maluramichael commented Feb 9, 2021

jRimbault commented Feb 9, 2021 •

edited

Loading

GGG-KILLER commented Nov 23, 2021 •

edited

Loading

Add delete option? #2

Add delete option? #2

Comments

DarrienG commented Jan 29, 2021

jRimbault commented Jan 29, 2021 • edited Loading

DarrienG commented Jan 29, 2021

maluramichael commented Feb 9, 2021

jRimbault commented Feb 9, 2021 • edited Loading

GGG-KILLER commented Nov 23, 2021 • edited Loading

jRimbault commented Jan 29, 2021 •

edited

Loading

jRimbault commented Feb 9, 2021 •

edited

Loading

GGG-KILLER commented Nov 23, 2021 •

edited

Loading