Skip to content
lukasgrabowicz edited this page Apr 2, 2022 · 1 revision

Welcome to the filegrunt wiki!

Project Overview

The project aims to solve the problem of identifying duplicate image files in large repositories of image files. The primary use case emerged from the need to organise and merge multiple offline and online archives created over a period of twenty years that contained duplicate image files. Due to the number of images 30K+, identifying duplicates by sight would prove time consuming and prone to human error. Copying and merging files would also be risky, as relying on image file names being unique is not an option.

The approach taken was to build an application that could run on Linux/Mac/Windows Operating Systems and allow the user to specify folders where duplicate image files were to be identified. The initial test was to compare file names and sizes to see if a match existed. If a match was found, then the images are compared. If a match existed then the file being compared was marked as a duplicate image file. The user can then choose to review by sight the images and move the duplicate images to a separate location for disposal or archival.

The project has progressed to where the application runs on Linux and duplicate image files are successfully identified using a large test data set and a small repository designed for unit testing. The ability to view duplicate image files and move the duplicates has yet to be developed. Cross platform compilation also remains to be developed with Linux the only OS currently catered for. It is at this stage that the code was shared on GitHub https://github.com/dcreedon/filegrunt as the starting point for Group 4’s Open Source project. Following a suggestion from a team member the repo ownership was transferred to a dedicated filegrunt repository https://github.com/filegrunt/filegrunt .

Technology Stack

Debian Bullseye as the Linux Development environment https://www.debian.org/

Jetbrains CLION IDE - IDE for C++/C Development https://www.jetbrains.com/clion/

QT6 for Application User Interface https://www.qt.io/product/qt6

OpenCV for image processing https://opencv.org/

CMAKE for software compilation process https://cmake.org/

GCC (GNU C Compiler) for compiling application code https://gcc.gnu.org/

DB Browser Application for SQLite 3 to view SQL tables http://sqlitebrowser.org

GitHub for repository management and collaboration https://github.com/filegrunt/filegrunt

Under Investigation:

Unit Testing/QA - https://lp.jetbrains.com/qa

Clone this wiki locally