v1.0
CloneWiper Release Notes
Version 1.0
🎯 Core Features
Smart Duplicate Detection
- Multi-Algorithm Perceptual Hashing Technology
- Combines four hash algorithms:
average_hash,phash(perceptual),dhash(difference), andwhash(wavelet) - More accurate than single algorithms, capable of detecting similar images and videos even after slight modifications, resizing, or different compression
- Supported image formats: JPEG, PNG, GIF, BMP, TIFF, WebP
- RAW File Support: Full support for RAW formats (CR2, NEF, ARW, DNG, etc.)
- Video Support: Perceptual hashing for video files through keyframe extraction
- Combines four hash algorithms:
- MD5 Hashing: Fast and precise matching for identical files
- Timestamp Correlation: Automatically identifies RAW/JPEG pairs (even when perceptual hashes differ)
High-Performance Scanning Engine
- Asynchronous Multi-threaded Processing: Fully utilizes multi-core CPUs for fast scanning of large file collections
- SQLite Cache System: Persistent hash caching for significantly faster re-scans
- Cancellable Scanning: Support for canceling operations at any time during scanning
- Real-time Progress Display: Shows scanning progress and thumbnail loading progress
🎨 User Interface
Material Design 3 Design
- Modern Dark Theme: Follows Material Design 3 design guidelines
- Frameless Window: Custom title bar and control buttons
- Rounded Buttons and Cards: Unified visual style
- Responsive Layout: Adapts to different screen sizes
Smart Thumbnail Previews
- Image Thumbnails: Fast previews, including RAW file support
- Video Thumbnails: Extracts keyframes from videos as previews (requires opencv-python)
- Document Thumbnails: High-quality PDF, EPUB, MOBI, and AZW3 thumbnails
- Uses
PyMuPDFfor EPUB/MOBI/AZW3 and as PDF fallback - Optional
pypdfium2for higher quality PDF rendering
- Uses
- Music Album Art: Automatically extracts album covers and metadata from music files
- Uses
mutagenlibrary to support various audio formats (MP3, FLAC, M4A, OGG, Opus, APE, etc.)
- Uses
Interactive File Cards
- Hover Effects: Shows detailed information on mouse hover
- Long Filename Scrolling: Automatically scrolls to display complete filenames
- Selection Management: Click cards to select/deselect files
- Visual Feedback: Selected files have clear visual indicators
Pagination System
- Efficient Pagination: Displays 50 duplicate groups per page, handling large result sets
- Page Navigation: Previous/Next page buttons with clear current page display
- Auto Reset: Automatically resets to first page when starting a new scan
⚡ Quick Actions
Smart Selection Strategies
- Keep Newest: Keeps the most recently modified file, deletes other duplicates
- Keep Oldest: Keeps the oldest file, deletes other duplicates
- Keep Best: Keeps the highest resolution image, deletes other duplicates
- Smart sorting: Area → Max dimension → Min dimension → File size → Modification time → Path length
- Keep RAW: In RAW/JPEG mixed groups, keeps RAW files and deletes JPEG files
- Button only appears when RAW and JPEG mixing is detected
Batch Operations
- Scope Selection: Choose to apply quick actions to "Current page" or "All pages"
- Batch Deletion: Safely delete selected files (moves to recycle bin/trash using
send2trash) - Delete Counter: Real-time display of selected files to be deleted
🔧 Advanced Features
File Management
- Multi-folder Scanning: Scan multiple folders simultaneously
- Folder Memory: Remembers previously scanned folder paths
- System Directory Skipping: Automatically skips system directories (Windows/macOS)
Sorting and Grouping
- Multiple Sorting Options:
- Count (High to Low / Low to High)
- File Size (Large to Small / Small to Large)
- Filename (A-Z / Z-A)
- Date (Newest First / Oldest First)
Safe Deletion
- Recycle Bin Support: Uses
send2trashto move files to recycle bin/trash - Confirmation Dialog: Shows confirmation dialog before deletion to prevent accidental deletion
- Auto Refresh: Automatically refreshes display and updates results after deletion
🌐 Cross-Platform Support
Supported Operating Systems
- Windows 10/11: Full support, can build EXE executable
- macOS: Can run from source code (
python3 main.py), executable build not currently supported
Platform-Specific Optimizations
- Windows: Uses Windows-specific window flags for optimization
📦 Technical Specifications
Core Dependencies
- PySide6 (≥6.5.0): Qt for Python, modern UI framework
- Pillow (≥10.0.0): Image processing library
- ImageHash (≥4.3.0): Perceptual hashing algorithms
- rawpy (≥0.19.0): RAW file processing
- send2trash (≥1.8.0): Safe file deletion
- PyMuPDF (≥1.23.0): PDF/EPUB/MOBI/AZW3 document processing (required for EPUB/MOBI/AZW3, fallback for PDF)
- mutagen (≥1.47.0): Music album art and metadata extraction
Optional Dependencies (Recommended for Enhanced Features)
- opencv-python (≥4.8.0): Video thumbnail extraction
- pypdfium2 (≥0.20.0): High-quality PDF rendering (preferred over PyMuPDF for PDF files)
Performance Optimizations
- Multi-threaded Scanning: Fully utilizes multi-core CPUs
- Asynchronous Thumbnail Loading: Doesn't block UI responsiveness
- Smart Caching: Avoids redundant hash calculations
- Memory Optimization: Timely release of image resources
🐛 Bug Fixes & Improvements
This Version Fixes
- ✅ Fixed truncated image file handling
- ✅ Improved multi-algorithm hash error handling (still usable when some algorithms fail)
- ✅ Fixed "Keep Best" logic to be consistent with "Keep Newest/Oldest"
- ✅ Fixed Windows-specific window flags for cross-platform compatibility
- ✅ Improved RAW/JPEG timestamp correlation logic
- ✅ Optimized UI element alignment and font size consistency
User Experience Improvements
- ✅ Auto-reset selection state and pagination when starting new scan
- ✅ Improved delete button counter display
- ✅ Optimized sort dropdown text center alignment
- ✅ Improved thumbnail loading progress display
- ✅ Unified button and control styles
📝 Usage Instructions
Basic Usage Workflow
- Add Scan Folders: Click the "+ Add Folder" button to select folders to scan
- Choose Scan Mode:
- Check "Multi-Algorithm Perceptual Hash" for intelligent similar file detection
- Uncheck for precise MD5 matching only
- Start Scanning: Click the "Start Scanning" button
- View Results: After scanning completes, duplicate files are displayed in groups
- Select Files to Delete:
- Manually click file cards to select
- Or use quick action buttons (Keep Newest/Oldest/Best/RAW)
- Delete Files: Click the "Delete" button to confirm deletion
Quick Action Descriptions
- Keep Newest: Keeps the most recently modified file in each duplicate group
- Keep Oldest: Keeps the oldest file in each duplicate group
- Keep Best: Keeps the highest resolution image in each duplicate group (only shown when applicable)
- Keep RAW: Keeps RAW files in RAW/JPEG mixed groups (only shown when applicable)
🔮 Future Plans
- Support cloud storage scanning
📄 License
This project is licensed under the MIT License - see the LICENSE file for details
🙏 Acknowledgments
Thanks to the following open-source projects:
- PySide6 - Qt for Python
- Pillow - Image processing
- ImageHash - Perceptual hashing
- PyMuPDF - PDF/EPUB rendering
- pypdfium2 - High-quality PDF rendering
- Material Design 3 - Design guidelines
ypdfium2** - High-quality PDF rendering
-
Material Design 3 - Design guidelines
-
Image processing
-
ImageHash - Perceptual hashing
-
PyMuPDF - PDF/EPUB rendering
-
pypdfium2 - High-quality PDF rendering
-
Material Design 3 - Design guidelines