Skip to content

Releases: markyip/CloneWiper

v1.1

09 Apr 04:33

Choose a tag to compare

CloneWiper v1.1 — Release notes

Release date: April 9, 2026

Highlights

  • Faster scans and comparisons in the core engine (partial hashing, similarity grouping, and parallel hash-pair processing).
  • Persistent thumbnail cache in SQLite for quicker UI scrolling on repeat runs.
  • Windows build fix for PyInstaller 6 on Python 3.12+ (distutils exclude removed).

Performance

  • Partial-hash phase uses concurrent.futures.wait(..., FIRST_COMPLETED) instead of a tight as_completed timeout loop, reducing wasted CPU while waiting for workers.
  • Multi-algorithm similarity: pre-parse perceptual hashes once per file hash string; reuse for LSH buckets, parallel comparisons, and the small-dataset path (fewer repeated hex_to_hash calls).
  • Candidate hash-pair comparison uses as_completed instead of executor.map, avoiding head-of-line blocking when pair cost varies (e.g. ORB).
  • Full-hash aggregation no longer takes a global lock on every file (single-threaded consumer from as_completed).
  • Background file prefetch for hashing is disabled by default (_file_prefetch_enabled = False) to avoid extra threads and redundant I/O; the prefetch pool shutdown behavior was corrected when prefetch is re-enabled in code.

Thumbnail cache

  • New module core/thumbnail_cache.py: SQLite-backed storage for generated thumbnails (platform-appropriate path under %LOCALAPPDATA%\CloneWiper on Windows).
  • Optional script verify_thumbnail_cache.py to inspect or validate cache behavior.

Windows executable build

  • build_windows.bat: removed --exclude-module=distutils, which triggered
    ValueError: Target module "distutils" already imported as "ExcludedModule('distutils',)"
    under PyInstaller 6 and Python 3.12+ during dependency analysis.

Full list of touched areas (summary)

Area Change
core/engine.py Scan/compare performance, logging, prefetch default off
qt_app.py Logging, CLONEWIPER_DEBUG, traceback via logger
core/thumbnail_cache.py New persistent thumbnail cache
verify_thumbnail_cache.py New optional utility
build_windows.bat PyInstaller / distutils fix
README.md v1.1, docs, structure

Thank you for using CloneWiper. Issues and PRs welcome on GitHub.

v1.0

19 Dec 00:51

Choose a tag to compare

CloneWiper Release Notes

Version 1.0

🎯 Core Features

Smart Duplicate Detection

  • Multi-Algorithm Perceptual Hashing Technology
    • Combines four hash algorithms: average_hash, phash (perceptual), dhash (difference), and whash (wavelet)
    • More accurate than single algorithms, capable of detecting similar images and videos even after slight modifications, resizing, or different compression
    • Supported image formats: JPEG, PNG, GIF, BMP, TIFF, WebP
    • RAW File Support: Full support for RAW formats (CR2, NEF, ARW, DNG, etc.)
    • Video Support: Perceptual hashing for video files through keyframe extraction
  • MD5 Hashing: Fast and precise matching for identical files
  • Timestamp Correlation: Automatically identifies RAW/JPEG pairs (even when perceptual hashes differ)

High-Performance Scanning Engine

  • Asynchronous Multi-threaded Processing: Fully utilizes multi-core CPUs for fast scanning of large file collections
  • SQLite Cache System: Persistent hash caching for significantly faster re-scans
  • Cancellable Scanning: Support for canceling operations at any time during scanning
  • Real-time Progress Display: Shows scanning progress and thumbnail loading progress

🎨 User Interface

Material Design 3 Design

  • Modern Dark Theme: Follows Material Design 3 design guidelines
  • Frameless Window: Custom title bar and control buttons
  • Rounded Buttons and Cards: Unified visual style
  • Responsive Layout: Adapts to different screen sizes

Smart Thumbnail Previews

  • Image Thumbnails: Fast previews, including RAW file support
  • Video Thumbnails: Extracts keyframes from videos as previews (requires opencv-python)
  • Document Thumbnails: High-quality PDF, EPUB, MOBI, and AZW3 thumbnails
    • Uses PyMuPDF for EPUB/MOBI/AZW3 and as PDF fallback
    • Optional pypdfium2 for higher quality PDF rendering
  • Music Album Art: Automatically extracts album covers and metadata from music files
    • Uses mutagen library to support various audio formats (MP3, FLAC, M4A, OGG, Opus, APE, etc.)

Interactive File Cards

  • Hover Effects: Shows detailed information on mouse hover
  • Long Filename Scrolling: Automatically scrolls to display complete filenames
  • Selection Management: Click cards to select/deselect files
  • Visual Feedback: Selected files have clear visual indicators

Pagination System

  • Efficient Pagination: Displays 50 duplicate groups per page, handling large result sets
  • Page Navigation: Previous/Next page buttons with clear current page display
  • Auto Reset: Automatically resets to first page when starting a new scan

⚡ Quick Actions

Smart Selection Strategies

  • Keep Newest: Keeps the most recently modified file, deletes other duplicates
  • Keep Oldest: Keeps the oldest file, deletes other duplicates
  • Keep Best: Keeps the highest resolution image, deletes other duplicates
    • Smart sorting: Area → Max dimension → Min dimension → File size → Modification time → Path length
  • Keep RAW: In RAW/JPEG mixed groups, keeps RAW files and deletes JPEG files
    • Button only appears when RAW and JPEG mixing is detected

Batch Operations

  • Scope Selection: Choose to apply quick actions to "Current page" or "All pages"
  • Batch Deletion: Safely delete selected files (moves to recycle bin/trash using send2trash)
  • Delete Counter: Real-time display of selected files to be deleted

🔧 Advanced Features

File Management

  • Multi-folder Scanning: Scan multiple folders simultaneously
  • Folder Memory: Remembers previously scanned folder paths
  • System Directory Skipping: Automatically skips system directories (Windows/macOS)

Sorting and Grouping

  • Multiple Sorting Options:
    • Count (High to Low / Low to High)
    • File Size (Large to Small / Small to Large)
    • Filename (A-Z / Z-A)
    • Date (Newest First / Oldest First)

Safe Deletion

  • Recycle Bin Support: Uses send2trash to move files to recycle bin/trash
  • Confirmation Dialog: Shows confirmation dialog before deletion to prevent accidental deletion
  • Auto Refresh: Automatically refreshes display and updates results after deletion

🌐 Cross-Platform Support

Supported Operating Systems

  • Windows 10/11: Full support, can build EXE executable
  • macOS: Can run from source code (python3 main.py), executable build not currently supported

Platform-Specific Optimizations

  • Windows: Uses Windows-specific window flags for optimization

📦 Technical Specifications

Core Dependencies

  • PySide6 (≥6.5.0): Qt for Python, modern UI framework
  • Pillow (≥10.0.0): Image processing library
  • ImageHash (≥4.3.0): Perceptual hashing algorithms
  • rawpy (≥0.19.0): RAW file processing
  • send2trash (≥1.8.0): Safe file deletion
  • PyMuPDF (≥1.23.0): PDF/EPUB/MOBI/AZW3 document processing (required for EPUB/MOBI/AZW3, fallback for PDF)
  • mutagen (≥1.47.0): Music album art and metadata extraction

Optional Dependencies (Recommended for Enhanced Features)

  • opencv-python (≥4.8.0): Video thumbnail extraction
  • pypdfium2 (≥0.20.0): High-quality PDF rendering (preferred over PyMuPDF for PDF files)

Performance Optimizations

  • Multi-threaded Scanning: Fully utilizes multi-core CPUs
  • Asynchronous Thumbnail Loading: Doesn't block UI responsiveness
  • Smart Caching: Avoids redundant hash calculations
  • Memory Optimization: Timely release of image resources

🐛 Bug Fixes & Improvements

This Version Fixes

  • ✅ Fixed truncated image file handling
  • ✅ Improved multi-algorithm hash error handling (still usable when some algorithms fail)
  • ✅ Fixed "Keep Best" logic to be consistent with "Keep Newest/Oldest"
  • ✅ Fixed Windows-specific window flags for cross-platform compatibility
  • ✅ Improved RAW/JPEG timestamp correlation logic
  • ✅ Optimized UI element alignment and font size consistency

User Experience Improvements

  • ✅ Auto-reset selection state and pagination when starting new scan
  • ✅ Improved delete button counter display
  • ✅ Optimized sort dropdown text center alignment
  • ✅ Improved thumbnail loading progress display
  • ✅ Unified button and control styles

📝 Usage Instructions

Basic Usage Workflow

  1. Add Scan Folders: Click the "+ Add Folder" button to select folders to scan
  2. Choose Scan Mode:
    • Check "Multi-Algorithm Perceptual Hash" for intelligent similar file detection
    • Uncheck for precise MD5 matching only
  3. Start Scanning: Click the "Start Scanning" button
  4. View Results: After scanning completes, duplicate files are displayed in groups
  5. Select Files to Delete:
    • Manually click file cards to select
    • Or use quick action buttons (Keep Newest/Oldest/Best/RAW)
  6. Delete Files: Click the "Delete" button to confirm deletion

Quick Action Descriptions

  • Keep Newest: Keeps the most recently modified file in each duplicate group
  • Keep Oldest: Keeps the oldest file in each duplicate group
  • Keep Best: Keeps the highest resolution image in each duplicate group (only shown when applicable)
  • Keep RAW: Keeps RAW files in RAW/JPEG mixed groups (only shown when applicable)

🔮 Future Plans

  • Support cloud storage scanning

📄 License

This project is licensed under the MIT License - see the LICENSE file for details


🙏 Acknowledgments

Thanks to the following open-source projects:

  • PySide6 - Qt for Python
  • Pillow - Image processing
  • ImageHash - Perceptual hashing
  • PyMuPDF - PDF/EPUB rendering
  • pypdfium2 - High-quality PDF rendering
  • Material Design 3 - Design guidelines

ypdfium2** - High-quality PDF rendering

  • Material Design 3 - Design guidelines

  • Image processing

  • ImageHash - Perceptual hashing

  • PyMuPDF - PDF/EPUB rendering

  • pypdfium2 - High-quality PDF rendering

  • Material Design 3 - Design guidelines