Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

Open
AndroYD84 opened this issue Jun 11, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@AndroYD84
Copy link

Feature Description
To avoid redundant scans, I suggest a feature for loading/importing saved lists as-is without re-comparing hashes, at most just checking if those files still exist physically or as a symlink.

I spent a week to scan all my drives, after identifying all duplicate files, I saved them in a list, but I cannot import them back into Czkawka.
So I lost all progress due to a crash and I'm forced to repeat the scan again, the cached .bin hashes are not helping because it's still going to recompare all hashes all over again, taking days.
It's genuinely dowright depressing to lose a week time of progress in a whim and having the constant fear of this happening again.

@AndroYD84
Copy link
Author

Made this python script to convert a saved duplicate finder result from Czkawka into a Dupeguru file. So you can keep working on Dupeguru if Czkawka crashes without wasting time rescanning the drives again from scratch.
Dupeguru also allows to choose which file you want to keep as the original (Right click > "Mark Selected into Reference") when you symlink groups ("Actions > "Send Marked to Recycle Bin" > "Link deleted files" > "Symlink") as requested here #903 and #149

import json
import xml.etree.ElementTree as ET

def convert_json_to_xml(json_file, xml_file):
    # Read JSON data from the input file
    with open(json_file, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    # Create the root element of the XML document
    results = ET.Element("results")
    
    # Iterate over the data and create XML structure
    for size_group in data.values():
        for group in size_group:
            group_element = ET.SubElement(results, "group")
            for file in group:
                file_element = ET.SubElement(group_element, "file")
                file_element.set("path", file["path"])
                file_element.set("words", "")
                file_element.set("is_ref", "n")
                file_element.set("marked", "n")
    
    # Create an ElementTree object and write it to the XML file
    tree = ET.ElementTree(results)
    tree.write(xml_file, encoding='utf-8', xml_declaration=True)

# Convert JSON to XML
convert_json_to_xml('czkawka_duplicates.json', 'dupeguru_duplicates.dupeguru')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant