## Unsupervised Clustering using Self Organising Feature Maps 

In [3]:
#Importing Dependencies
import numpy as np
import math
from PIL import Image

In [4]:
#Defining the network
class SOM:
    def __init__(self, x_size, y_size, trait_num, t_iter, t_step):
        self.weights = np.random.randint(256, size=(x_size, y_size, trait_num)).astype('float64')
        self.t_iter = t_iter
        self.map_radius = max(self.weights.shape)/2
        self.t_const = self.t_iter/math.log(self.map_radius)
        self.t_step = t_step

    def show(self):
        im = Image.fromarray(self.weights.astype('uint8'), mode='RGB')
        im.format = 'JPG'
        im.show()

    def distance_matrix(self, vector):
        return np.sum((self.weights - vector) ** 2, 2)

    def bmu(self, vector):
        distance = self.distance_matrix(vector)
        return np.unravel_index(distance.argmin(), distance.shape)

    def bmu_distance(self, vector):
        x, y, rgb = self.weights.shape
        xi = np.arange(x).reshape(x, 1).repeat(y, 1)
        yi = np.arange(y).reshape(1, y).repeat(x, 0)
        return np.sum((np.dstack((xi, yi)) - np.array(self.bmu(vector))) ** 2, 2)

    def hood_radius(self, iteration):
        return self.map_radius * math.exp(-iteration/self.t_const)

    def teach_row(self, vector, i, dis_cut, dist):
        hood_radius_2 = self.hood_radius(i) ** 2
        bmu_distance = self.bmu_distance(vector).astype('float64')
        if dist is None:
            temp = hood_radius_2 - bmu_distance
        else:
            temp = dist ** 2 - bmu_distance
        influence = np.exp(-bmu_distance / (2 * hood_radius_2))
        if dis_cut:
            influence *= ((np.sign(temp) + 1) / 2)
        return np.expand_dims(influence, 2) * (vector - self.weights)

    def teach(self, t_set, distance_cutoff=False, distance=None):
        for i in range(self.t_iter):
            for x in t_set:
                self.weights += self.teach_row(x, i, distance_cutoff, distance)
        self.show()

In [13]:
#Converting 3 dimensional RGB values to 2 dimensions
x_size = 200 
y_size = 200
dimension = 3
no_of_iterations = 100
t_step = 1

#Getting instance of SOM
nn = SOM(x_size, y_size, dimension, no_of_iterations, t_step)

#Generating training data
training_data = np.random.randint(256, size=(15, 3))

#Training
nn.teach(training_data)

#Display Image
nn.show()

### Using SOM on image dataset

#### Reference: Image Classification using Inception model and keras notebook.
This notebook classifies images of cats and dogs.
Feature vector used here has been generated by extracting the weights of Images from the penultimate layer of Inception model.
These feature vectors are used to train the SOM.

In [6]:
import pickle
# Loading stored values of bottleneck values
feature_dogs = pickle.load(open('feature_dogs','rb'))
feature_cats = pickle.load(open('feature_cats','rb'))

In [11]:
#Preparing dataset for training SOM by selecting first 10 feature vectors of cats and dogs
training_features = np.r_[feature_dogs[:10,:],feature_cats[:10,:]]

In [21]:
#Converting 2048 dimensional feature vector to 2 dimensions
x_size = 15
y_size = 15
dimension = 2048
no_of_iterations = 50
t_step = 1

#Getting instance of SOM
nn_som = SOM(x_size, y_size, dimension, no_of_iterations, t_step)

#Training
nn_som.teach(training_features)

## Testing the clustering

Now that we have trained the model we need to verify if cats and dogs have been clustered separately.
For this we need to classify feature vectors at different positions to gain inference.
Accordingly neural network trained in the other notbook is used to classify the feature vectors into cats and dogs.
Trained weights are loaded into neural network and the feature vectors are classified.

In [36]:
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import optimizers

In [37]:
#Defining network structure to classify feature vectors
nn = Sequential()
nn.add(Dense(256,input_dim=2048,activation = 'relu'))
nn.add(Dropout(0.5))
nn.add(Dense(1, activation='sigmoid'))

In [38]:
#Load weights into the neural network
nn.load_weights('bottleneck_train.h5')

In [39]:
#Predict labels 0 corresponds to cat and 1 to dog
#Predicting for feature vector present at position 0,0 
weight = np.reshape(nn_som.weights[0][0],(1,2048)) #Need to reshape the vector for feeding into neural network
nn.predict(weight)

array([[ 1.]], dtype=float32)

In [40]:
#Predicting for feature vector present at position 0,1 and 1,0
weight = np.reshape(nn_som.weights[0][1],(1,2048))
nn.predict(weight)

array([[ 1.]], dtype=float32)

In [32]:
weight = np.reshape(nn_som.weights[1][0],(1,2048))
nn.predict(weight)

array([[ 1.]], dtype=float32)

1 corresponds to dogs and they are clustered towards the top left corner.

In [33]:
#Predicting for feature vector present at position [14,14], [14,13] and [13,14]. Values are almost zeros and corresponds
#cats
weight = np.reshape(nn_som.weights[14][14],(1,2048))
nn.predict(weight)

array([[  3.33482862e-24]], dtype=float32)

In [34]:
weight = np.reshape(nn_som.weights[13][14],(1,2048))
nn.predict(weight)

array([[  3.74070505e-24]], dtype=float32)

In [35]:
weight = np.reshape(nn_som.weights[14][13],(1,2048))
nn.predict(weight)

array([[  2.81645707e-24]], dtype=float32)

Prediction for feature vectors of nodes in the bottom right corner corresponds to cats. 
Cats are clustered towards the bottom right and dogs towards the top left.
This broadly verifies the clustering. To get the exact accuracy of clustering every node should be classified