# Session 06: Image Similarity with HSV

We use our new HSV values to show how images are similar based on their
hue, saturation, and value.

## Setup

We need to load the modules within each notebook. Here, we load the
same set as in the previous question.

In [None]:
%pylab inline

import numpy as np
import scipy as sp
import pandas as pd
import urllib

import os
from os.path import join

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (8,8)

## WikiArt corpus

We are going to look now at a larger corpus of images from the WikiArt collection.
It consists of work from three artists. Here's the metadata:

In [None]:
df = pd.read_csv(join("..", "data", "wikiart.csv"))
df

Let's look a few of these images. Adjust the following code by choosing a
different number for 0 to 643:

In [None]:
img = imread(join("..", "images", "wikiart", df.filename[10]))

img_hsv = matplotlib.colors.rgb_to_hsv(img)          # convert to hsv space
img_new = img_hsv.copy()                             # make a copy of the image
img_new[:, :, 1] = 1                                 # set saturation to 1
img_new[:, :, 2] = 1                                 # set value to 1

img_new[img_hsv[:, :, 1] < 0.2, 1] = 0               # set saturation to 0 (white) if unsaturated 
img_new[img_hsv[:, :, 1] < 0.2, 1] = 0               # set saturation to 0 (white) if unsaturated

img_new_rgb = matplotlib.colors.hsv_to_rgb(img_new)  # convert back to rgb to print

In [None]:
plt.imshow(img)         # show the original image
plt.figure(2)           # start a new figure
plt.imshow(img_new_rgb) # show the hue version of image

## Aggregating the corpus

Okay, now let's try to actually start using the entire corpus. We will
compute the histograph counts for each of the hue for every single image
in the corpus.

In [None]:
X = np.zeros((len(df), 30))

for i in range(len(df)):
    img = imread(join("..", "images", "wikiart", df.filename[i]))
    img_hsv = matplotlib.colors.rgb_to_hsv(img)
    img_hsv[img_hsv[:, :, 1] < 0.2, 0] = img_hsv[img_hsv[:, :, 1] < 0.2, 2] + 1
    X[i, :] = np.histogram(img_hsv[:, :, 0].flatten(), bins=30, range=(0, 2))[0]
    if i % 25 == 0:
        print("Done with {0:d} of {1:d}".format(i, len(df)))

We can see the first few rows of these numbers here:

In [None]:
X[:10, :]

Now, let's look at images that are closest in terms of the hue.

In [None]:
plt.figure(figsize=(14, 14))

ref_img_num = 0  # change this number!

print(df.iloc[ref_img_num])
idx = np.argsort(np.sum(np.abs(X - X[ref_img_num, :]), axis=1))[:9]

for ind, i in enumerate(idx):
    plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
    plt.subplot(3, 3, ind + 1)

    img = imread(join('..', 'images', 'wikiart', df.filename[i]))
    plt.imshow(img)
    plt.axis("off")

## Next steps

We are now, finally, ready to move into the second part of the course. We have
seen how features extracted from images let us explore a collection, but we need
better features to capture more interesting things about the images... for that
we need to digress a bit and talk about machine learning and deep learning.

In [None]:
"../"