# Self-Organising Systems
## Exercise 3
* **Firbas Alexander**, `11775819`
* **Sowula Robert**, `11708475`
* **Zehetner Clemens**, `01645005`


## Table of Contents
+ Python Code
    - How to operate the code
    - Example run with Iris dataset
+ Comparison between Python and Java implementations (Evaluation)
    - Chainlink
        * Evaluation on small grid size
            * Visualization of our python version
            * Visualizations of the java SOM Toolbox
        * Evaluation on large grid size
            * Visualization of our python version
            * Visualizations of the java SOM Toolbox
        * Comparison
    - 10 Clusters
        * Evaluation on small grid size
            * Visualization of our python version
            * Visualizations of the java SOM Toolbox
        * Evaluation on large grid size
            * Visualization of our python version
            * Visualizations of the java SOM Toolbox
        * Comparison
+ Further Findings

j

## Python Code

In [1]:
import tabulate
import numpy as np
import json
from collections import defaultdict

from SOMToolBox_Parse import SOMToolBox_Parse



# Parse .tv files, i.e. a generator yielding attribute names.
# For example, for the iris dataset, this will yield ['sep_length', 'sep_width', 'pet_length', 'pet_width']
# This is needed as the provided template does not offer this functionality
def parse_template_file(filename):
    with open(filename) as fp:
        for line in fp:
            line = line.strip()
            if not line or line.startswith("#") or line.startswith("$"):
                continue
            yield line.split(" ")[1]

# Use a trained SOM to create a dictonary, that, given the index of a unit,
# returns the list of indices of units that are mapped to the unit
def index_to_mapped_indices_map(_m, _n, _weights, _idata):
    mapping = defaultdict(list)
    
    for i, vector in enumerate(_idata): 
        position =np.argmin(np.sqrt(np.sum(np.power(_weights - vector, 2), axis=1)))
        mapping[position].append(i)

    return mapping

# a helper method that takes the dimensions of a SOM to create a dict that converts coordinates to unit-indices
def coords_to_index_map(_m, _n):
    indices_linear = np.arange( _m * _n)
    indices_grid = indices_linear.reshape(_m, _n)
    return indices_grid

# a helper method yielding the inverse of coords_to_index_map
def index_to_coords_map(_m, _n):
    indices_to_coords = []
    for i in range(_m * _n):
        indices_to_coords.append( (i // _n, i % _m) )
    return indices_to_coords
    
# The main labeling procedure.
# This implements http://www.ifs.tuwien.ac.at/~andi/somlib/labelsom.html
# and follows closely LabelSOM.java of the Java SOMToolbox.
#
# All functionality of the java-equivalent is provided, excpet for subordinate maps.
# 
# i ... index of the unit to label
# _w ... weight vectors of the units
# _x ... input vectors
# _index_to_mapped_indices_map ... see index_to_mapped_indices_map
# _attributes ... the attribute names of the different dimensions of the input vectors
# num ... the number of labels to generate
# ignore_labels_with_zero .. ignore labels of value equal to zero and q.e. close to zero, useful for textual data where the input vectors will be sparse (i.e. many zero values, as described in the web resource linked above)
def get_labels_for_index(i, _w, _x, _index_to_mapped_indices_map, _attributes, num, ignore_labels_with_zero):
    # obtain input vectors that were mapped to this unit
    mapped_inputs = _x.take(_index_to_mapped_indices_map[i], axis=0)
    
    # if no vectors were mapped, we cannot give labels
    if len(mapped_inputs) == 0:
        return []
    
    #the dimensions of the input vectors (will equal len(attributes) given well-formed data) 
    dim = _x.shape[1]
    
    # a vectorized implementation of the main calculation, the original used a for-loop
    # calculate mean of each attribute of vectors mapped to this unit
    mean_vals = np.mean(mapped_inputs, axis=0)
    # calculate sum of absolute differences of each attribute and the units weight vector and the vectors mapped to this unit per attribute
    # i.e. the "quantization error"
    qe_vals = np.mean(np.absolute(mapped_inputs - _w[i]), axis=0)
    
    # create a label (represented by a 3-tuple) for each attribute with the scores calulated above
    all_labels = []
    for ve in range(dim):
        if ignore_labels_with_zero and mean_vals[ve] == 0 and qe_vals[ve] * 100 < 0.1:
            all_labels.append( ("", mean_vals[ve], qe_vals[ve]) )
        else:
            all_labels.append( (_attributes[ve], mean_vals[ve], qe_vals[ve]) )
        
    # create sorted copies of the labels just created (sort by mean, i.e. second slot and by q.e, third slot (ascending))
    labels_sorted_by_mean = sorted(all_labels, key = lambda label: label[1])
    labels_sorted_by_qe = sorted(all_labels, key = lambda label: label[2])
    
    # select num best labels (least q.e.), accoring to the original algorithm
    labels = [None] * num
    found = 0
    lab = 0
    while found < num and lab < dim:
        found2 = False
        lab2 = dim - 1
        while not found2 and lab2 >= dim - num:
            if labels_sorted_by_mean[lab2] == labels_sorted_by_qe[lab]:
                found2 = True
                labels[found] = labels_sorted_by_qe[lab]
                found += 1
            lab2 -= 1
        lab += 1
                
    # sort the found labels by mean first (descending), and q.e. (ascending) second
    # the sorting of attributes may be slightly different to this
    # due to an error in the quicksort implementation of the Java original
    # => see the evaluation section for details
    final_labels = sorted(labels, key = lambda label: (-label[1], label[2]) )
    
    return final_labels

# label a whole som rather than a single unit only.
# returns a dictionary (int, int) -> [(str, float, float)] mapping coordinates to a list of label triplets
def get_labels_for_som(_m, _n, _w, _x, _index_to_mapped_indices_map, _attributes, num, ignore_labels_with_zero):
    dim = _x.shape[1]
    
    # ensure that we do not try to generate more labels than attributes
    if num > dim:
        print(f"Specified number of labels ({num}) exceeds number of features in template vector ({dim}) - defaulting to number of features as maximum possible value.");
        num = dim
        
    c2i = coords_to_index_map(_m, _n)
    labels = {}
    for i in range(_m):
        for j in range(_n):
            labels[i, j] = get_labels_for_index( c2i[i,j], _w, _x, _index_to_mapped_indices_map, _attributes, num, ignore_labels_with_zero)
    return labels

# code to format a label_map generated by get_labels_for_som as an html-table
def labels_to_str(labels, num):
    return "\n".join(label[0] for label in labels) if labels else "\n........\n"

def label_map_to_row(_m, _n, num, label_map, row_index):
    return [labels_to_str(label_map[row_index, j], num) for j in range(_n)]

def label_map_to_table(_m, _n, num, label_map):
    return tabulate.tabulate([label_map_to_row(_m, _n, num, label_map, i) for i in range(_m)], tablefmt='html')

In [2]:
def visualize_lable_som(idata, weights, attributes, json_file = None, num=3, ignore_labels_with_zero=False):
    m, n, w, x = weights['ydim'], weights['xdim'], weights['arr'], idata['arr']
    _index_to_mapped_indices_map = index_to_mapped_indices_map(m, n, w, x)
    
    label_map = get_labels_for_som(m, n, w, x, _index_to_mapped_indices_map, attributes, num, ignore_labels_with_zero)
    
    if json_file:
        with open(json_file, "w") as f:
            # convert tuple keys to string so it can be serialized as json
            json_label_map = {str(k):label_map[k] for k in label_map}
            json.dump(json_label_map, f, indent=4, sort_keys=True)
            
    return label_map_to_table(m, n, num, label_map)


## How to operate the code

Additionally to the already present dependencies, only the [tabluate](https://pypi.org/project/tabulate/) library is required to run this code.

The label visualisation function takes 5 parameters, 3 of them are required.  
- vector file (\*.vec)
- weights file (\*.wgtz.gz)
- attributes file (\*.tv)
- \[optional\] json_file: path to a file were optionally the results of the computed labelling map are written to
- \[optional\] num: amount of labels per unit, defaults to 3
- \[optional\] ignore_labels_with_zero: ignore labels of value equal to zero and q.e. close to zero, useful for textual data where the input vectors will be sparse, de

The next cell is an example on how to use our labelling function. 
First the vector, weight and attribute files are read and passed as the first 3 arguments. The last argument, json_file, tells the function to also write the computed labelling map into a file.

In [3]:
idata = SOMToolBox_Parse("datasets/iris/iris.vec").read_weight_file()
weights = SOMToolBox_Parse("datasets/iris/iris.wgt.gz").read_weight_file()
attributes = list(parse_template_file("datasets/iris/iris.tv"))

visualize_lable_som(idata, weights, attributes, json_file = "iris_labels.json")

0,1,2,3,4,5,6,7,8,9
sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width
sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,........,sep_length pet_length sep_width,sep_length pet_length sep_width,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,........
sep_length sep_width pet_length,sep_length sep_width pet_length,sep_length sep_width pet_length,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,........,........,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,sep_length sep_width pet_length,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,........,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,sep_length sep_width pet_length,........,........,sep_length pet_length sep_width,sep_length pet_length sep_width,sep_length pet_length sep_width,........,sep_length pet_length sep_width
sep_length sep_width pet_length,sep_length sep_width pet_length,........,sep_length pet_length sep_width,sep_length pet_length sep_width,........,sep_length pet_length sep_width,........,........,sep_length pet_length sep_width


## Evaluation

The evaluation was relatively straight forward, since we obtained similar results for both our python implementation and the implementation from the Java SOM Toolbox.

Setting extreme values in our case does not yield remarkable results:
For our first parameter `num`, setting a small value, or even zero value results in a predictable reduction in the number of labels.
Increasing `num`  beyond the number of attributes is not possible by construction - therefore there is no extreme behaviour obtainable this way.

The second option, `ignore_labels_with_zero`, results in labels with zero value to be set to the empty string.
By the sorting procedure employed, such labels will be displayed last (irrespective of the option) and empty ("") (due to the option).


### Chainlink

### Small (40x20)
#### Python Port

In [4]:
idata = SOMToolBox_Parse("trained_soms/chainlink/chainlink.vec").read_weight_file()
weights = SOMToolBox_Parse("trained_soms/chainlink/small/chainlink_small.wgt.gz").read_weight_file()
attributes = list(parse_template_file("trained_soms/chainlink/chainlink.tv"))

visualize_lable_som(idata, weights, attributes)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_3 comp_1
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,........,........,comp_3 comp_2 comp_1,........,comp_2 comp_3 comp_1,comp_3 comp_2 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3
comp_3 comp_1 comp_2,........,........,........,........,........,........,........,comp_3 comp_1 comp_2,comp_3 comp_1 comp_2,comp_3 comp_2 comp_1,comp_3 comp_2 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,comp_2 comp_3 comp_1,comp_2 comp_1 comp_3,comp_2 comp_3 comp_1,........,........,........,........,........,........,........,comp_2 comp_1 comp_3


#### Java SOM Toolbox

#### Java SOM Toolbox: Labelling
![title](trained_soms/chainlink/chainlink_small_pain_plain.png)
#### Java SOM Toolbox: Hit Histrogram - Colour Coding
![title](trained_soms/chainlink/chainlink_small_pain_dots.png)
#### Java SOM Toolbox: Hit Histrogram - Colour Coding (Zoomed in)
![title](trained_soms/chainlink/chainlink_small_pain_dots_zoom.png)

### Large (100x60)

In [5]:
idata = SOMToolBox_Parse("trained_soms/chainlink/chainlink.vec").read_weight_file()
weights = SOMToolBox_Parse("trained_soms/chainlink/large/chainlink_large.wgt.gz").read_weight_file()
attributes = list(parse_template_file("trained_soms/chainlink/chainlink.tv"))

visualize_lable_som(idata, weights, attributes)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
........,........,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,........,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,........,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_2 comp_3,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,........,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,comp_1 comp_3 comp_2,........,........
........,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_1 comp_2 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2
comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_2 comp_1 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_3 comp_2


##### Java SOM Toolbox: Labelling
![title](trained_soms/chainlink/chainlink_large_pain_plain.png)
##### Java SOM Toolbox: Hit Histrogram - Colour Coding
![title](trained_soms/chainlink/chainlink_large_pain_dots.png)

### Comparison

When comparing our implementation and the Java SOM Toolbox labelling visualisation, no variations could be found. The clustering and ordering of the labels was the same for both implementations.

It is possible that in some units, the ordering between the Java SOM Toolbox and our port varies. These variations of the label order can be explained by a bug in the original Java toolkit. As can be seen in the further findings section, the implemented quicksort algorithm is slightly flawed, causing it to sometimes not sort all values correctly.

As can be seen in the figures displaying the the 100 by 60 units sized grid, the standard export feature of the Java SOM Toolbox does not add labels to the units, since they are too small. Therefore, we improved our solution images with the Hit Histogram - Colour Coding visualisation. For the small grid (40 by 20) we also provided one zoomed-in figure to compare it to our results.

### 10 Clusters

### Small (40x20)
#### Python Port

In [6]:
idata = SOMToolBox_Parse("trained_soms/10_clusters/10clusters.vec").read_weight_file()
weights = SOMToolBox_Parse("trained_soms/10_clusters/small/10clusters_small.wgt.gz").read_weight_file()
attributes = list(parse_template_file("trained_soms/10_clusters/10clusters.tv"))

visualize_lable_som(idata, weights, attributes)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
........,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_1 comp_5 comp_4,........,........,........,........,........,........,........,........,........,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9
........,........,........,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9,comp_8 comp_6 comp_9
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
comp_10 comp_2 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........


### Java SOM Toolbox
#### Java SOM Toolbox: Labeling
![title](trained_soms/10_clusters/10clusters_small_pain_plain.png)
#### Java SOM Toolbox: Hit Histrogram - Colour Coding
![title](trained_soms/10_clusters/10clusters_small_pain_dots.png)
#### Java SOM Toolbox: Hit Histrogram - Colour Coding (Zoomed in)
![title](trained_soms/10_clusters/10clusters_small_pain_dots_zoom.png)

### Large (100x60)
#### Python Port

In [7]:
idata = SOMToolBox_Parse("trained_soms/10_clusters/10clusters.vec").read_weight_file()
weights = SOMToolBox_Parse("trained_soms/10_clusters/large/10clusters_large.wgt.gz").read_weight_file()
attributes = list(parse_template_file("trained_soms/10_clusters/10clusters.tv"))

visualize_lable_som(idata, weights, attributes)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_9 comp_3,comp_8 comp_9 comp_3,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_8 comp_9 comp_3
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,comp_10 comp_3 comp_4,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,........,........,........,........,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,........,........,........,........,........,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,comp_10 comp_3 comp_4,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........
........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........,........


#### Java SOM Toolbox
##### Figure 1: Java SOM Toolbox: Labelling
![title](trained_soms/10_clusters/10clusters_large_pain_plain.png)
##### Figure 2: Java SOM Toolbox: Hit Histrogram - Colour Coding
![title](trained_soms/10_clusters/10clusters_large_pain_dots.png)

### Comparison

As can be seen, the results of our Python implementation and the Java SOM Toolbox are virtually identical with no differences in clustering and labelling.

The figures displaying the the 100 by 60 units sized grid show that the standard export feature of the Java SOM Toolbox does not add labels to the units, since they are too small. Therefore, we improved our solution images with the Hit Histogram - Colour Coding visualisation to allow for easier recognition of the labelled units. We also provided a zoomed in picture for the 40 by 20 unit sized grid to compare it to our results.

## Further Findings

During developement, we stumbled upon a few issues with the Java SOM Toolbox:

### Incorrect quicksort implementation

In the `Label.java` file, where the sorting of labels is implemented using a custom sorting procedure, we detected errors:

For example, if `Label.java` is augmented with the following main procedure:

```java
public static void main(String args[]) {
    Label[] a = {new Label("a", 1, 0.1), new Label("b", 1, 0.2)};

    System.out.println("Labels to sort:");
    for (Label l : a) {
        System.out.println(l.name + ", " + l.value + ", " + l.qe);
    }
    sortByValueQe(a, SORT_DESC, SORT_DESC);

    System.out.println("Sorted labels: (Expected: Descending order in both columns)");

    for (Label l : a) {
        System.out.println(l.name + ", " + l.value + ", " + l.qe);
    }
}
```

we obtain the output

```
Labels to sort:
a, 1.0, 0.1
b, 1.0, 0.2
Sorted labels: (Expected: Descending order in both columns)
a, 1.0, 0.1
b, 1.0, 0.2
```

Observe that the second numeric column is not sorted in a descending manner.

### Recalculation of Labels not possible in SOMViewer

Using `Labelling > Rerun labeling` it is only possible to compute a labelling if none already exists.
Otherwise, the old labelling persists.

### Calculated Labels do not update automatically

After a labelling is computed, it is required to manually toggle the `Show labels` option off and on again - if omitted, stale data is presented.
