# Identifying Similar Images with Tensorflow

Douglas Duhaime wrote an [excellent tutorial](http://douglasduhaime.com/posts/identifying-similar-images-with-tensorflow.html). This notebook is my experiment trying to follow his steps, using a github repo and a jupyter binder.

The repo is at [github](https://github.com/o-date/Identifying-Similar-Images-with-TensorFlow). The various python bits and pieces are called from the `requirements.txt` file, which saves us from having to `!pip install`.

I have a bunch of images in \images. In Duhaime's tutorial, he had a folder with 2000 images. I'm just going with 25 here because a) I'm impatient and b) I don't know how many I can push into github.

Run the modified classify script (which pulls out the second but last layer), and write it to the new `image_vectors` dir. Then cluster, then project. Then run the affinity propagation notebook, which is written in R.

## Acknowledgements

My thanks to Douglas Duhaime for his clear explanation of how to use tensorflow to explore images this way. Katherine Davidson, Eric Hobson, and Ian Davidson tried these notebooks out under a variety of circumstances, trouble-shooting all the bugs that emerged as we went.


## Now Let's Identify Similar Images

In [1]:
!python classify_images.py "images/*"

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])

>> Downloading inception-2015-12-05.tgz 100.0%
Succesfully downloaded inception-2015-12-05.tgz 88931400 bytes.
Instructions for updating:
Use tf.gfile.GFile.
W0205 15:45:31.917876 140354482947904 deprecation.py:323] From classify_images.py:154: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updat

now cluster with his nearest neighbours script. "Each of those outfiles will identify the 30 images most similar to the given image. To search for more or fewer nearest neighbors, one just needs to update the n_nearest_neighbors variable in the nearest neighbors script."

In [2]:
!python cluster_vectors.py

and now we run the t-sne clustering script

In [3]:
# ok let's project this stuff!
from sklearn.manifold import TSNE
import numpy as np
import glob, json, os

# create datastores
vector_files = []
image_vectors = []
chart_data = []
# note the maximum here - change up for total number you've got!
maximum_imgs = 25

# build a list of image vectors
vector_files = glob.glob('image_vectors/*.npz')[:maximum_imgs]
for c, i in enumerate(vector_files):
  image_vectors.append(np.loadtxt(i))
  print(' * loaded', c, 'of', len(vector_files), 'image vectors')

# build the tsne model on the image vectors
print('building tsne model')
model = TSNE(n_components=2, random_state=0)
np.set_printoptions(suppress=True)
fit_model = model.fit_transform( np.array(image_vectors) )
 
# store the coordinates of each image in the chart data
for c, i in enumerate(fit_model):
  image_name = os.path.basename(vector_files[c]).replace('.npz', '') 
  chart_data.append({
    'image': os.path.join('images', image_name),
    'x': i[0],
    'y': i[1]
  })

 * loaded 0 of 25 image vectors
 * loaded 1 of 25 image vectors
 * loaded 2 of 25 image vectors
 * loaded 3 of 25 image vectors
 * loaded 4 of 25 image vectors
 * loaded 5 of 25 image vectors
 * loaded 6 of 25 image vectors
 * loaded 7 of 25 image vectors
 * loaded 8 of 25 image vectors
 * loaded 9 of 25 image vectors
 * loaded 10 of 25 image vectors
 * loaded 11 of 25 image vectors
 * loaded 12 of 25 image vectors
 * loaded 13 of 25 image vectors
 * loaded 14 of 25 image vectors
 * loaded 15 of 25 image vectors
 * loaded 16 of 25 image vectors
 * loaded 17 of 25 image vectors
 * loaded 18 of 25 image vectors
 * loaded 19 of 25 image vectors
 * loaded 20 of 25 image vectors
 * loaded 21 of 25 image vectors
 * loaded 22 of 25 image vectors
 * loaded 23 of 25 image vectors
 * loaded 24 of 25 image vectors
building tsne model


then we write the `chart_data` to file.

In [4]:
thefile = open('image_tsne_projections.json', 'w')
for item in chart_data:
  thefile.write("%s\n" % item)
thefile.close()

Now go view your work by clicking on the 'jupyter' button above! Outputs, folders, etc, all visible there.

## Moving on to plotting affinity groups 

Before you can work with that json file though, you need to add a `[` and a `]` to the front and end of the file, and a `,` at the end of each line (except the last one).

If you don't do this, the import json in the affinity propagation notebook will throw and error.

## Moving on to generating a network visualization

Try the [Visualizing Nearest Neighbors as a Network notebook](Visualizing-Nearest-Neighbors-as-a-Network.ipynb).