-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delete option? #2
Comments
Most [all] of the issues related to that feature would be around UI/UX. Do I just delete all duplicates ? obviously not, so I have to somehow defer control to the user over what gets deleted or not, there are different ways to go around that. Show each groups of duplicate to the user, let them choose which gets deleted, how do I present them each group ? A group can grow quite large and cumbersome for a human to handle. Just expose a set of options, flags and switches to act as criteria for deletion ? But those would surely be different for each set of duplicate. I haven't thought of a good way to solve all that ? I'm open to bouncing ideas. |
Honestly if there were just a |
For my case it would be nice to leave just the oldest one and remove everything else. I try to cleanup up a huge drive full of family photos. They are so heavy cluttered and duplicated. So i would look up the exif create date. But that is just one case. I would be fine with some kind of interface inside the code. So we can extend the behaviour on our own. A function which gets a list of the duplicated list and returns a new list with filenames that need to be deleted. |
Thank you for your feedback, it adds to the list of items I'll keep in mind in the future. I'm still not sure how to proceed (or if at all) with this feature. I have been thinking (in the back of my mind) about it for quite some time now. In the meantime would you be able to make this kind of solution work ?Running
#!/usr/bin/env python3
import fileinput
import json
import os
def main():
for line in fileinput.input():
files = json.loads(line)
files.sort(key=exifdate)
for filename in files[1:]:
os.remove(filename)
def exifdate(filename):
# get the exif date for each file
# I don't know how to extract that information with the python stdlib
# I'd expect PIL/Pillow has something for that, but it's a third party package
return filename
if __name__ == "__main__":
main() Or a more elaborate script in this example. |
hyperfine -w 10 "./target/release/yadf ~" "./target/release/yadf -H ~" "./target/release/yadf ~" Benchmark #1: ./target/release/yadf ~ Time (mean ± σ): 2.977 s ± 0.031 s [User: 9.598 s, System: 13.738 s] Range (min … max): 2.935 s … 3.021 s 10 runs Benchmark #2: ./target/release/yadf -H ~ Time (mean ± σ): 3.785 s ± 0.040 s [User: 9.698 s, System: 13.917 s] Range (min … max): 3.730 s … 3.886 s 10 runs Benchmark #3: ./target/release/yadf ~ Time (mean ± σ): 2.954 s ± 0.025 s [User: 9.555 s, System: 13.737 s] Range (min … max): 2.919 s … 2.991 s 10 runs Summary './target/release/yadf ~' ran 1.01 ± 0.01 times faster than './target/release/yadf ~' 1.28 ± 0.02 times faster than './target/release/yadf -H ~'
In my case I'd prefer hardlinking the duplicate files so only 1 remains on disk. EDIT: Maybe have a
Though this would lead into the issue of "What's 'older' and what's 'newer'?" Do we check creation time, modification time or access time? |
Having an all in one binary would be amazing. If this supported deleting all dupes after finding, it would be great.
The text was updated successfully, but these errors were encountered: