# Self-Organizing Feature Maps
The goal of this notebook is to build and understand the properties of Kohonen Self-Organizing Feature Maps, an unsupervised learning technique for representing high dimensional data in much lower dimensional spaces, while maintaining the topological relationships within the training set.

We will begin first by building a model that can map colors in RGB to a 2 dimensional representation. 

![](http://www.ai-junkie.com/ann/som/images/Figure1.jpg)

In [4]:
# Imports
import numpy as np
import math
import pandas as pd
from PIL import Image

An SOM is a lattice made up of n by n nodes. From an initial random distribution of weights, the weights are adjusted over training to classify the input data more effectively, until it converges to a steady representation of different zones (equivalent to classes). Given an input data, the SOM can classify based as nodes in zone with similar weight vectors will be simulated.

## Learning Algorithm
  

For N Iterations:
1. Initialise weights of each node in lattice randomly
2. Select a vector from training data to compare with the lattice.
3. Choose Best Matching Unit (BMU) from lattice, by computing the most similar node to vector.
4. Compute the radius of BMU's neighbourhood. This value decreases each time-step.
5. Adjust each node within the BMU's neighbourhood to make them more similar to the input vector. Closer nodes are adusted more.
6. Repeat Step 2 to 5.

## Defining Network
The following equations are implemented in the SOM code. For detailed explanation, check out http://www.ai-junkie.com/ann/som/som3.html.

### Neighbourhood Radius 
![](http://www.ai-junkie.com/ann/som/images/equation2.gif)

### Weight Adjustment for Nodes in BMU's Neighbourhood
![](http://www.ai-junkie.com/ann/som/images/equation5.gif)

### Theta - amount of influence distance has on learning
![](http://www.ai-junkie.com/ann/som/images/equation6.gif)

### Decaying Learning Rate
![](http://www.ai-junkie.com/ann/som/images/equation3.gif)

In [2]:
class SOM:
    def __init__(self, x_size, y_size, dimen, num_iter, t_step):
        # init weights to 0 < w < 256
        self.weights = np.random.randint(256, size=(x_size, y_size, dimen))\
                            .astype('float64')
        self.num_iter = num_iter
        self.map_radius = max(self.weights.shape)/2 # sigma_0
        self.t_const = self.num_iter/math.log(self.map_radius) # lambda
        self.t_step = t_step
        
    def get_bmu(self, vector):
        # calculate euclidean dist btw weight matrix and vector
        distance = np.sum((self.weights - vector) ** 2, 2)
        min_idx = distance.argmin()
        return np.unravel_index(min_idx, distance.shape)
        
    def get_bmu_dist(self, vector):
        # initialize array where values are its index
        x, y, rgb = self.weights.shape
        xi = np.arange(x).reshape(x, 1).repeat(y, 1)
        yi = np.arange(y).reshape(1, y).repeat(x, 0)
        # returns matrix of distance of each index in 2D from BMU
        return np.sum((np.dstack((xi, yi)) - np.array(self.get_bmu(vector))) ** 2, 2)

    def get_nbhood_radius(self, iter_count):
        return self.map_radius * np.exp(-iter_count/self.t_const)
        
    def teach_row(self, vector, i):
        nbhood_radius = self.get_nbhood_radius(i)
        bmu_dist = self.get_bmu_dist(vector).astype('float64')
        
        # exponential decaying learning rate
        lr = 0.1 * np.exp(-i/self.num_iter) 
        
        # influence
        theta = np.exp(-(bmu_dist)/ (2 * nbhood_radius ** 2))
        return np.expand_dims(theta, 2) * (vector - self.weights)
        
    def teach(self, t_set):
        for i in range(self.num_iter):
            for x in t_set:
                self.weights += self.teach_row(x, i)
        self.show()
        
    def show(self):
        im = Image.fromarray(self.weights.astype('uint8'), mode='RGB')
        im.format = 'JPG'
        im.show()

### Training Network

In [3]:
s = SOM(200, 200, 3, 100, 0.1)
training_set = np.random.randint(256, size=(15, 3))
s.teach(training_set)

In [6]:
pd.read_csv('Iris.csv')

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


In [None]:
s1 = SOM(200, 200, 4, 100, 0.1)
training_set = pd.read_csv('Iris.csv')