Skip to content
😎 Finding duplicate images made easy!
Python Shell
Branch: master
Clone or download
Latest commit 6ad3325 Oct 9, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples Rename notebooks and add explanation for TF GPU support. Oct 3, 2019
imagededup Correct error in cnn threshold documentation. Oct 4, 2019
mkdocs Add Google Analytics to doc. Oct 3, 2019
readme_figures Update Readme to add a new image. Oct 1, 2019
tests Merge to dev after fixing travis conflict. Oct 2, 2019
.bumpversion.cfg Bump version to 0.1.0. Oct 4, 2019
.gitignore Delete readme_figures from docs; rather cp it in travis. Oct 2, 2019
.travis.yml Add codecov to travis. Oct 4, 2019
CONTRIBUTING.md Update Readme to add a new image. Oct 1, 2019
LICENSE
README.md Small fixes in readme. Oct 9, 2019
pypi.sh Uncomment twine in setup. Sep 26, 2019
setup.cfg Remove unnecessary docs folder as some of them are auto-generated, mo… Sep 28, 2019
setup.py Bump version to 0.1.0. Oct 4, 2019

README.md

Image Deduplicator (imagededup)

Build Status Docs codecov PyPI Version License

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

This package provides functionality to make use of hashing algorithms that are particularly good at finding exact duplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation framework is also provided to judge the quality of deduplication for a given dataset.

Following details the functionality provided by the package:

Detailed documentation for the package can be found at: https://idealo.github.io/imagededup/

imagededup is compatible with Python 3.6 and is distributed under the Apache 2.0 license.

πŸ“– Contents

βš™οΈ Installation

There are two ways to install imagededup:

  • Install imagededup from PyPI (recommended):
pip install imagededup

⚠️ Note: imagededup comes with TensorFlow CPU-only support by default. If you have GPUs, you should rather install the TensorFlow version with GPU support especially when you use CNN to find duplicates. It's way faster. See the TensorFlow guide for more details on how to install it.

  • Install imagededup from the GitHub source:
git clone https://github.com/idealo/imagededup.git
cd imagededup  
python setup.py install

πŸš€ Quick Start

In order to find duplicates in an image directory using perceptual hashing, following workflow can be used:

  • Import perceptual hashing method
from imagededup.methods import PHash
phasher = PHash()
  • Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')
  • Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)
  • Plot duplicates obtained for a given file (eg: 'ukbench00120.jpg') using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

The output looks as below:

The complete code for the workflow is:

from imagededup.methods import PHash
phasher = PHash()

# Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')

# Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)

# plot duplicates obtained for a given file using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

For more examples, refer this part of the repository.

For more detailed usage of the package functionality, refer: https://idealo.github.io/imagededup/

🀝 Contribute

We welcome all kinds of contributions. See the Contribution guide for more details.

πŸ“ Citation

Please cite Imagededup in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{idealods2019imagededup,
  title={Imagededup},
  author={Tanuj Jain and Christopher Lennan and Zubin John and Dat Tran},
  year={2019},
  howpublished={\url{https://github.com/idealo/imagededup}},
}

πŸ— Maintainers

Β© Copyright

See LICENSE for details.

You can’t perform that action at this time.