A stroll through the neighborhood manifold
====

When we created the Snake-Eyes dataset, one of the ideas was to offer so many points that data augmentation might not be necessary to achieve great results. This is also one of the reasons the resolution is so small compared to other popular datasets. In this kernel we are going to make some tests to check if our sampling volume is really so large that it allows us to investigate the structure of the underlying 3- or 6-dimensional manifolds these 400-dimensional datapoints are theoretically lying upon.

We are basically going to retrieve some nearest neighbors, and look at the rank of the covariance matrix, that should have rank three for single-dice images. The eigenvectors in a neighborhood should also look like the jacobian of the iamge relative to the parameters from the image synthesis, namely the position and orientation of the dice, apart from the face. It is interesting to compare what we are going to see here, the PCA from very similar images that are undergoing restricted forms of transformation, to the PCA from the whole dataset, that [we covered in another kernel](https://www.kaggle.com/nicw102168/trying-out-some-pca-nmf-and-knn).

In [None]:
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
%matplotlib inline   
plt.rcParams['image.cmap'] = 'gray'

This is a generator to read our data outputting a single image each time. This is useful if you can't fit all the data in your computer's memory.

In [None]:
def image_generator(*filenames):
    for filename in filenames:
        with open(filename, 'rb') as fp:
            for _ in range(100000):
                yield np.array(np.fromstring(fp.read(401), dtype=np.uint8)[1:], dtype=float)

def img_gen():
    return image_generator(*["../input/snake-eyes/snakeeyes_{:02d}.dat".format(nn) for nn in range(10)])

Just a couple of functions to diplay our nice dice images.

In [None]:
def plotdice(x):
    plt.imshow(x.reshape(20,20))
    plt.axis('off')                
def plotdicez(x):
    plt.imshow(x.reshape(20,20), cmap=plt.cm.RdBu, vmin=-128, vmax=128)
    plt.axis('off')

Let's pick the first 6 images in the dataset to use as references.

In [None]:
refs = np.array([x for _, x in zip(range(6), img_gen())])

for n in range(6):
    plt.subplot(1,6,n+1)
    plotdice(refs[n])

Neat. Now we'll do the big-data step and go through all of our dataset looking for matching images. And notice our matching criterion is pretty tight: We are not looking for something below a certain euclidean distance, for instance. We want images that have a maximum difference (128) over all of the pixels. That means if a pixel is 255 in our reference image, we can tolerate at most it getting down to 127. A white pixel cannot trurn actual black, or vice-versa.

We are really looking for images that look very much alike, that would mean they sit closely together in the manifold that contains the images for that given class/dice face. It would also mean the original parameters used to synthesize the image (position and rotation) should also be very close.

In [None]:
query = [ww for ww in img_gen() if (np.max(np.abs(refs - ww), axis=1) < 128).any()]

Let's pick from our cache just the images relative to our first reference one.

In [None]:
qq = query[0]

sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]

plt.figure(figsize=(8,6))
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 5, k + 1)
    plotdice(ww)

It seems we got what we wanted, all images look pretty similar. And we got 20 of them, hopefully this will really serve to our purposes of studying this manifold where this data lives.

Because this is a single dice image, apart from the possibility of changing the face, there are only three parameters controlling this image appearence, 2 related to the translation and 1 to rotation. That would mean this neighborhood of images should be sitting in a 3-dimensional manifold. That would mean in turn that if we look at the covariange matrix of these images it should have rank-3, and its eigenvectors should look like the derivatives related to the three degrees of freedom. Let's first look at the differences from each image to our reference image (that should be the "mean" from all of them).

In [None]:
plt.figure(figsize=(8,6))
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 5, k + 1)
    plotdicez(ww - qq)

We can see the differences are mostly over the edges of the image, and they have some kind of structure. The dice is always kind of moving to a certain direction, making the opposite edges red and blue, for instance. Let's look now at the singular values from our covariance matrix.

In [None]:
u, s, v = np.linalg.svd(np.cov(np.array(sim).T))

plt.figure()
plt.plot(s[:40], '-+')
plt.grid()

Indeed, we got quite a drop after the third singular value, so apparently our theory was right!... Let's look now at the eigenvectors.

In [None]:
plt.figure()
for n in range(3):
    plt.subplot(1, 3, n + 1)
    plotdicez(v[n] * 400)

I would say the first one kind of models a rotation mostly, and the other two model the translations... Let's do the same analysis on other reference images and see what comes out.

In [None]:
qq = query[1]

sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]

plt.figure(figsize=(20,8))
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 1)
    plotdice(ww)
    
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 6)
    plotdicez(ww - qq)

u, s, v = np.linalg.svd(np.cov(np.array(sim).T))

plt.figure()
plt.plot(s[:40], '-+')
plt.grid()

plt.figure()
for n in range(3):
    plt.subplot(1, 3, n + 1)
    plotdicez(v[n] * 400)

Similar result. I would say it is easier to see in this case the second image is mostly modeling the rotation. You can see that the top left edge changes from blue to red, and the color around the two dots are kind of inverted, what would only be possible rotating the dice. On the other images we quite clearly have either blue or red edges, and there are no opposite edges with the same color, that is what you would expect for components modeling translation.

Let's look at other reference images.

In [None]:
qq = query[2]
sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]
len(sim)

In [None]:
qq = query[3]
sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]
len(sim)

For these two, which were images with two dice, the query did not return a single neighboring image!! Even thogh we got more than 20 hits for a since dice, with two dice this seems too restrictive. I didn't quite expect such a huge difference from the two cases, I must confess. Let's see the other single-dice references.

In [None]:
qq = query[4]

sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]

plt.figure(figsize=(20,8))
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 1)
    plotdice(ww)
    
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 6)
    plotdicez(ww - qq)

u, s, v = np.linalg.svd(np.cov(np.array(sim).T))

plt.figure()
plt.plot(s[:40], '-+')
plt.grid()

plt.figure()
for n in range(3):
    plt.subplot(1, 3, n + 1)
    plotdicez(v[n] * 400)

In [None]:
qq = query[5]

sim = [ww for ww in query if np.max(np.abs((qq - ww))) < 128][1:]

plt.figure(figsize=(20,8))
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 1)
    plotdice(ww)
    
for k, ww in enumerate(sim):
    if k >= 20:
        break
    plt.subplot(4, 10, k//5*10 + k%5 + 6)
    plotdicez(ww - qq)

u, s, v = np.linalg.svd(np.cov(np.array(sim).T))

plt.figure()
plt.plot(s[:40], '-+')
plt.grid()

plt.figure()
for n in range(3):
    plt.subplot(1, 3, n + 1)
    plotdicez(v[n] * 400)

# Conclusion
One of the principles we had in mind when creating the Snake-Eyes dataset was to make sure there was so many sampled instances compared to the image resolution that we would really be able to anlayze the manifolds where these datapoints lie upon. The manifolds are a consequence of the fact the images are created from a parametric model with three degrees of freedom. Our experiments demonstrated that given a certain distance threshold, we are capable of retrieving quite a large number of neighboring images, and this neighborhood does seem to be consistent with the fact these 400-dimension data points should live in a rank-3 manifold.

The results concern only the single-dice images, though. For the 2-dice case, the sample volume wasn't enough to achieve the same result. Two-dice images should result in a 6-dimensional manifold. We still must check wether a less restrictive nearest-neighborhood query allow us to retrieve a sample where this property can be tested.

Another future analysis we have in mind is to try to estimate these manifold dimensionalities using techniques such as box-counting. We also plan to investigate what happens when you throw this data in random projection trees, or utilize other dimensionality reduction techniques.