When creating image dataset for deep learning project, there are high changes that the dataset contains multiple duplicate images.
This standalone tool will help in finding and removing those duplicate images using a simple interface. And if you are a developer
you can easily customize the code according to your need.
git clone https://github.com/rajat-1994/DIF.git
cd DIF
pip install -r requirements.txt
Just run below command after installation and you are good to go.
python app.py
NOTE : As you delete images from the interface, in the backend a file files.csv
is saved.
After you are done with cleaning your dataset you can just read the csv and filter the deleted images.
df = pd.read_csv('files.csv')
df = df[df.is_deleted==0]