Skip to content
Visually similar image search using pre-trained neural networks and approximate nearest neighbour lookup
Jupyter Notebook Python
Branch: master
Clone or download
Pull request Compare This branch is 15 commits ahead, 3 commits behind dilpreetsingh:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
notebooks
results
scripts
src
web-app @ 9e34713
.gitignore
.gitmodules
DEVELOPMENT.md
LICENSE
README.md
requirements.txt

README.md

Visually Similar Image Search

Finding similar images can be useful in many cases, for example one can use it to retrieve mountain photos from a ton of photos in his gallery. In this project, we aim to achieve this search by using a nearest neighbour approach over image features produced by pretrained neural networks.

Motivation

Neural networks learn to extract features from data without any explicit knowledge. Not only these features are relevant to solve given problems that they are trained for, the features might also be useful for other tasks. In this case, we use three pretrained networks, namely VGG16, ResNet152, and DenseNet. We aim to use these features as a representation of each image. We hope that visually similar images would have a similar representation. In other words, these images lay closely in this feature space.


Fig. 1: Project Overview

With this representation, it enables us to perform nearest neighbor search. We use Annoy, performing approximate nearest neightbor search.

Analysis Tool


Fig. 1: Project Overview

We have developed a website that provides an interface for exploring results from our experiments in an informative and reproducible way. If you are interested in self-exploring and digging into the results, please give it a try 😎.

Experiment 1: Visually Similar Artworks

In this first part, we randomly take 5000 artworks from MoMa's collection. The goal is to explore how the images of these artworks are projected onto the feature space.


Fig. 2: Visually Similar Artworks

From Fig. 2, we can see that the nearest neighbours in these feature spaces are somehow related to the given images. For example, if we look at the artwork Pettibon with Strings, similar artworks from ResNet152 and Densenet contains faces. Please explore our analysis tool for more examples.

Experiment 2: Recovery Perturbed Artworks

The goal of this experiment is to verify whether the neural network features of close images are also more or less the same. In other word, these images are semantically the same for us. As shown in Fig. 3, we use five profiles to perturb 1000 original images from MoMa's collection, producing close images for the experiment.


Fig. 3: Perturbation Profiles

Therefore, if the representation of an image and its perturbed versions are similar, we should be able to recover those perturbed images when performing nearest neighbour search.


Fig. 4: Recovery Perturbed Artworks

As shown in Fig. 4, we can see that the two original images and their perturbed versions are proximiately close in the feature spaces, particular the feature space of VGG16. For these two examples, VGG16's feature space allows us to recover 4/5 corrupted versions while the feature spaces of the other networks yield the ratio of 3/5.

With this setting, we can also quantitatively measure the performance of the results by looking at precision, recall, and f1-score.

  • Precision: no. correctly returned samples / no. returned samples
  • Recall: no. correctly returned samples / no. all relevant samples in data
  • f1-score: 2*(precision*recall)/(precision + recall)

In this case, the no. correctly returned samples is simply the number of an image's perturbed versions being returned. no. all relevant samples in data is 5 because we have five perturbation profiles, and no. returned samples is k whose values are 1, 3, 5.


Fig. 5: Averaged Precision, Recall, and F1-score

From Fig. 5, we can see that VGG16 performs quite good on average and better than the other architectures for this purpose of study. Their large variation seems to suggest that there are some cases that ResNet152 and DenseNet can embed close images to near locations in the feature spaces. This might be a good further analysis.

Future Work

  • Try with more samples. Maybe 10,000 artworks?
  • Use features from autoencoders (vanila, VAE)
  • Train autoencoders with the following scheme: perturbed image -> autoencoder -> original image.

Development

Please refer to DEVELOPMENT.md.

Acknowledgements

You can’t perform that action at this time.