[Feature request] Load/Import saved duplicate files list into Czkawka #1295

AndroYD84 · 2024-06-11T06:32:38Z

Feature Description
To avoid redundant scans, I suggest a feature for loading/importing saved lists as-is without re-comparing hashes, at most just checking if those files still exist physically or as a symlink.

I spent a week to scan all my drives, after identifying all duplicate files, I saved them in a list, but I cannot import them back into Czkawka.
So I lost all progress due to a crash and I'm forced to repeat the scan again, the cached .bin hashes are not helping because it's still going to recompare all hashes all over again, taking days.
It's genuinely dowright depressing to lose a week time of progress in a whim and having the constant fear of this happening again.

AndroYD84 · 2024-06-20T09:59:18Z

Made this python script to convert a saved duplicate finder result from Czkawka into a Dupeguru file. So you can keep working on Dupeguru if Czkawka crashes without wasting time rescanning the drives again from scratch.
Dupeguru also allows to choose which file you want to keep as the original (Right click > "Mark Selected into Reference") when you symlink groups ("Actions > "Send Marked to Recycle Bin" > "Link deleted files" > "Symlink") as requested here #903 and #149

import json
import xml.etree.ElementTree as ET

def convert_json_to_xml(json_file, xml_file):
    # Read JSON data from the input file
    with open(json_file, 'r', encoding='utf-8') as f:
        data = json.load(f)
    
    # Create the root element of the XML document
    results = ET.Element("results")
    
    # Iterate over the data and create XML structure
    for size_group in data.values():
        for group in size_group:
            group_element = ET.SubElement(results, "group")
            for file in group:
                file_element = ET.SubElement(group_element, "file")
                file_element.set("path", file["path"])
                file_element.set("words", "")
                file_element.set("is_ref", "n")
                file_element.set("marked", "n")
    
    # Create an ElementTree object and write it to the XML file
    tree = ET.ElementTree(results)
    tree.write(xml_file, encoding='utf-8', xml_declaration=True)

# Convert JSON to XML
convert_json_to_xml('czkawka_duplicates.json', 'dupeguru_duplicates.dupeguru')

AndroYD84 added the enhancement New feature or request label Jun 11, 2024

AndroYD84 mentioned this issue Jun 17, 2024

GUI(+CLI) Entire UI freezes during long opperation (e.g. delete). #1192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

AndroYD84 commented Jun 11, 2024

AndroYD84 commented Jun 20, 2024

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

[Feature request] Load/Import saved duplicate files list into Czkawka #1295

Comments

AndroYD84 commented Jun 11, 2024

AndroYD84 commented Jun 20, 2024