# 03 Developing the SOM algorithm

In this notebook the functions used to train the SOM are tested to tune
the hyperparameters:

- Choice of alpha function (exponential decay or percentage of datapoints remaining)
- Choice of theta function (nearest neighbours only or Gaussian function)
- Alpha parameters - rate of decay, initial value
- Size of matrix
- Size of training data (equivalent to number of training iterations)

The size of the matrix will initially be approximated as:  

$$M = 5 \sqrt{N} \quad (1)$$ 
 <div style="text-align: right"><i>(Tian et al., 2014)</i></div>

where:
 - M = Number of nodes
 - N = number of observations used to train the matrix.
 
 All functions used to train the matrix are stored in `featureextractionsom/som_adjustable_parameters.py`

## 3.1 Load the data

In [2]:
from pickle import load
from featureextractionsom.config import data_path
data_path = 'data'
training_data = load(open(data_path + '/training_data.pkl', 'rb'))
test_vectors = load(open(data_path + '/test_vectors.pkl', 'rb'))

num_features = training_data.shape[1] 
num_features

18

## 3.2 Hyper parameter tuning

Initiate a dictionary of parameters that will be incrementally tuned and tested.

Store the formula from Tian et al. as a function.

In [3]:
params = {'gaussian': False, 'alpha_type': 'exp', 'half_life': 0.5, 'initial_value': 1, 'max_iterations': 100}

def get_size(i: int) -> int:
    """
    Apply equation 1 to return the suggested matrix size for supplied number of iterations.
    """
    # number of nodes as in formula
    num_nodes = 5 * i**0.5
    # square root number of nodes to find number of rows/columns in square array
    return int(num_nodes**0.5)

### 3.2.1 Initial testing

The alpha function determines the learning rate of the matrix. It should decrease
with each iteration and be initialised at a high enough value that the matrix
adjusts to the data.

In [4]:
from featureextractionsom.functions.utils import try_make_folder

# set output path
outputs = 'output'
alpha_per_path = outputs + '/3_2_1/alpha_per'
alpha_exp_path = outputs + '/3_2_1/alpha_exp'

try_make_folder(alpha_exp_path)
try_make_folder(alpha_per_path)

For each possible alpha function, test a wide range of possible initial values
and values that determine how the alpha value decreases with each iteration.

In [4]:
import numpy as np
from featureextractionsom.functions.som_adjustable_parameters import train_matrix
from featureextractionsom.functions.matrix_operations import distance, build_node_matrix
from featureextractionsom.functions.evaluation import record_response, get_test_vectors

# Randomly choose 3 vectors from the validation set
validation_set = [(i, vec) for (i, vec) in get_test_vectors(test_vectors, 3)]
print(f'Vectors tested: {[i for i,_ in validation_set]}')

# Alpha as an exponential decay function

# Start with two values of max_iteration (will tune later)
for max_iterations in [100,1000]:
    size = get_size(max_iterations)
    params['max_iterations'] = max_iterations
    # build matrix
    node_matrix = build_node_matrix(size,num_features)
    
    # Try twoinitial values
    for initial_value in [0.5, 1]:
        params['initial_val'] = initial_value
        
        # Try two values of *h*
        for decay_hlife in [0.2, 0.4]:
            params['half_life'] = decay_hlife
            
            # train the matrix with the hyper parameters
            trained_matrix = train_matrix(params, node_matrix, training_data)
            
            # Generate matrix of distances between weights and test vectors
            for i, test_vector in validation_set:
                # Evaluate the matrix by finding the distance between a test vector and all of the nodes
                distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
            
                response_image_path = alpha_exp_path
                response_image_path += f'/vec{i}_max_iter_{max_iterations}_initval_{initial_value}_decay_{decay_hlife}.png'
                record_response(distance_matrix, response_image_path)

Vectors tested: [7, 22, 16]


On inspection, only limited evidence of clustering was found, see examples:

Configuration | Vector 6 | Vector 13 | Vector 21
---|---|---|---
100 iterations, initial value 0.5, decay 0.2 |![response](output/3_2_1/alpha_exp/vec7_max_iter_100_initval_0.5_decay_0.2.png)|  ![response](output/3_2_1/alpha_exp/vec16_max_iter_100_initval_0.5_decay_0.2.png) | ![response](output/3_2_1/alpha_exp/vec22_max_iter_100_initval_0.5_decay_0.2.png)
1000 iterations, initial value 1, decay 0.2 | ![response](output/3_2_1/alpha_exp/vec7_max_iter_1000_initval_1_decay_0.2.png)| ![response](output/3_2_1/alpha_exp/vec16_max_iter_1000_initval_1_decay_0.2.png) | ![response](output/3_2_1/alpha_exp/vec22_max_iter_1000_initval_1_decay_0.2.png)


A different alpha function was tried.
This alpha function decreases the learning rate proportionately to the number of vectors left to
use.

In [5]:
# Alpha as fractional decay
params['alpha_type'] = 'per'

# Use same two values of max_iterations
for max_iterations in [100,5000]:
    size = get_size(max_iterations)
    params['max_iterations'] = max_iterations
    
    # reset matrix
    node_matrix = build_node_matrix(size,num_features)
    
    # Try the same initial values again
    for initial_value in [0.5, 1]:
        params['initial_val'] = initial_value
            
        # train the matrix with the hyper parameters
        trained_matrix = train_matrix(params, node_matrix, training_data)
        # Randomly choose 3 vectors from the validation set
        for i, test_vector in validation_set:
            # Evaluate the matrix by finding the distance between a test vector and all of the nodes
            distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
            
            response_image_path = alpha_per_path
            response_image_path += f'/vec{i}_max_iter_{max_iterations}_initval_{initial_value}.png'
            record_response(distance_matrix, response_image_path)

Again, only limited evidence of clustering was found. 

Vector 7:  

Initial value | 100 iterations | 5000 iterations
---|---|---
0.5 |![response](output/3_2_1/alpha_per/vec7_max_iter_100_initval_0.5.png)|![response](output/3_2_1/alpha_per/vec7_max_iter_5000_initval_0.5.png)
1 | ![response](output/3_2_1/alpha_per/vec7_max_iter_100_initval_1.png)|![response](output/3_2_1/alpha_per/vec7_max_iter_5000_initval_1.png)


The exercise was repeated using a Gaussian Theta function. An additional value of max_iterations was added that falls in between the two considered above.

### 3.2.2 Choosing alpha function using gaussian Theta function 

In [23]:
alpha_per_path = outputs + '/3_2_2/alpha_per'
alpha_exp_path = outputs + '/3_2_2/alpha_exp'

try_make_folder(alpha_per_path)
try_make_folder(alpha_exp_path)

#  Choose a new validation set - this will be done periodically to avoid overfitting
validation_set = [(i, vec) for (i, vec) in zip([7, 22, 16], [test_vectors[i] for i in [7, 22, 16]])]

# Set the theta function to 'gaussian'
params['gaussian'] = True

# Alpha as an exponential decay function
params['alpha_type'] = 'exp'

# Three values of max_iterations
for max_iterations in [100,1000,5000]:
    size = get_size(max_iterations)
    params['max_iterations'] = max_iterations
    
    # Reset the matrix
    node_matrix = build_node_matrix(size, num_features)
    
    # try the same initial values
    for initial_value in [0.5, 1]:
        params['initial_val'] = initial_value
        
        for decay_hlife in [0.2, 0.4]:
            params['half_life'] = decay_hlife
            
            # train the matrix with the hyper parameters
            trained_matrix = train_matrix(params, node_matrix, training_data)
            
            # Generate matrix of distances between weights and test vectors
            
            for i, test_vector in validation_set:
                # Evaluate the matrix by finding the distance between a test vector and all of the nodes
                distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
                
                # save the response
                response_image_path = alpha_exp_path
                response_image_path += f'/vec{i}_max_iter_{max_iterations}_initval_{initial_value}_decay_{decay_hlife}.png'
                record_response(distance_matrix, response_image_path)

In [24]:
# Alpha as an percentage decay

params['alpha_type'] = 'per'

# Three values of max_iterations
for max_iterations in [100,1000,5000]:
    size = get_size(max_iterations)
    params['max_iterations'] = max_iterations
    
    # resest matrix
    node_matrix = build_node_matrix(size,num_features)
    
    # Same initial values
    for initial_value in [0.5, 1]:
        params['initial_val'] = initial_value
            
        # train the matrix with the hyper parameters
        trained_matrix = train_matrix(params, node_matrix, training_data)
            
        # Iterate through validation set
        for i, test_vector in validation_set:
            # Evaluate the matrix by finding the distance between a test vector and all of the nodes
            distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
            
            # save the response
            response_image_path = alpha_per_path
            response_image_path += f'/vec{i}_max_iter_{max_iterations}_initval_{initial_value}.png'
            record_response(distance_matrix, response_image_path)   

Here we start to see some evidence of clustering, and of clearly dilineated areas of the SOM, particularly for the exponential alpha function.

Here are the responses to vector 22 with and without a Gaussian theta value:

Configuration | Non-Gaussian theta function | Gaussian theta function
---|---|---
Alpha 'exp', 100 iterations, initial value 0.5, decay 0.4 | ![example_matrix](output/3_2_1/alpha_exp/vec22_max_iter_100_initval_0.5_decay_0.4.png)  | ![example_matrix](output/3_2_2/alpha_exp/vec22_max_iter_100_initval_0.5_decay_0.4.png)  
Alpha 'per', 100 iterations, initial value 1 | ![example_matrix](output/3_2_1/alpha_per/vec22_max_iter_100_initval_1.png)  | ![example_matrix](output/3_2_2/alpha_per/vec22_max_iter_100_initval_1.png)  


From now on, the Theta function used will be Gaussian.

### 2.2.3 Choosing how to visualise responses

In tuning the hyper-parameters by inspection, it is important to be confident in the
method used to evaluate the response of the matrix to different input vectors.

Two different sets of hyper-parameters will be used to train the node matrix.
Each resulting matrix will be tested with two types of response: distance and dot product.

The responses will be measured between the trained matrix and five different test vectors.

In [8]:
# Set up directories
distance_matrix_path = outputs + '/3_2_3/distance_matrix'
dot_product_matrix_path = outputs + '/3_2_3/dot_product_matrix'
try_make_folder(distance_matrix_path)
try_make_folder(dot_product_matrix_path)

In [9]:
# Set up two different possible sets of hyper-parameters

# Pick shared parameters for both tests
shared_params = {'gaussian': True, 'max_iterations': 100, 'initial_val': 1}
size = get_size(500)

# Create two different configurations to test
unique_params_one = {'alpha_type': 'exp', 'half_life': 0.4}
unique_params_two = {'alpha_type': 'per'}

# Hold both sets of parameters in a list
paramsGrid = [{**shared_params, **unique_params_one}, {**shared_params, **unique_params_two}]

In [10]:
from featureextractionsom.functions.evaluation import generate_dot_matrix

# Randomly choose 5 vectors from the validation set
validation_set = [(i, vec) for (i, vec) in get_test_vectors(test_vectors, 5)]
print(f'Vectors tested: {[i for i,_ in validation_set]}')

# for each set of parameters, create and train a matrix of nodes
for p in range(len(paramsGrid)):
    params = paramsGrid[p]
    
    # reset and train node matrix
    node_matrix = build_node_matrix(size, num_features)
    trained_matrix = train_matrix(params, node_matrix, training_data)
    
    # Iterate through validation set
    for i, test_vector in validation_set:
        # Evaluate the matrix by finding the distance between the test vector and all of the nodes
        distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
        
        # set image name
        image_name = f'/vec{i}_matrix_{p}.png'
        
        # set file path
        distance_image_path = distance_matrix_path + image_name
        
        # Evaluate the matrix by finding the dot product between the test vector and each weight vector in the matrix
        dot_matrix = generate_dot_matrix(test_vectors[i], trained_matrix, size)
        
        # set file path
        dot_image_path = dot_product_matrix_path + image_name
        
        # record response
        record_response(distance_matrix, distance_image_path)
        record_response(dot_matrix, dot_image_path, reverse_colourscale=False)

Vectors tested: [5, 21, 12, 16, 13]


Inspecting the images, the distance matrix appears to display the output more distinctly.

Examples:

Matrix | Response measure | Vector 5 | Vector 12 | Vector 21
---|---|---|---|---
Matrix 0 | dot product | ![example](output/3_2_3/dot_product_matrix/vec5_matrix_0.png) | ![example](output/3_2_3/dot_product_matrix/vec12_matrix_0.png) | ![example](output/3_2_3/dot_product_matrix/vec21_matrix_0.png) 
Matrix 0 | distance | ![example](output/3_2_3/distance_matrix/vec5_matrix_0.png) | ![example](output/3_2_3/distance_matrix/vec12_matrix_0.png) | ![example](output/3_2_3/distance_matrix/vec21_matrix_0.png) 
Matrix 1 | dot product | ![example](output/3_2_3/dot_product_matrix/vec5_matrix_1.png) | ![example](output/3_2_3/dot_product_matrix/vec12_matrix_1.png) | ![example](output/3_2_3/dot_product_matrix/vec21_matrix_1.png) 
Matrix 1 | distance | ![example](output/3_2_3/distance_matrix/vec5_matrix_1.png) | ![example](output/3_2_3/distance_matrix/vec12_matrix_1.png) | ![example](output/3_2_3/distance_matrix/vec21_matrix_1.png) 

Because the colour scale is reversed for the figure depicting the distance matrix, yellow indicates the strongest
response for both methods. A dot product of zero indicates orthogonal vectors with no overlap, whereas a distance of zero indicates identical vectors.

Using a dot product matrix, the same node displayed a maximal response for all three vectors. The distance matrices displays a greater variation in responses and more evidence of clustering. Going forward, the distance between the test vectors and the weight vectors will be used to evaluate the 
training parameters.

### 3.2.4 Choosing the number of iterations

Inspecting all previously generated outputs, it appears that the number of iterations
used to train the matrix should be between 100 and 1000.

Too many iterations leads to overfitting; too few leads to underfitting.

Both potential alpha functions depend on the number of iterations, so a ballpark number
should be chosen and used to compare the alpha functions.

For now, an exponential alpha function with initial value 1 and decay halflife of 0.2 * number of iterations will be chosen.

In [28]:
# Set up directories
test_iterations_path = outputs + '/3_2_4'

try_make_folder(test_iterations_path)

# Set up parameters
params = {'gaussian': True, 'alpha_type': 'exp', 'initial_val': 1, 'half_life': 0.2}
possible_max_iterations = list(range(100,2001,100))

# Randomly choose 3 vectors from the validation set
validation_set = [(i, vec) for (i, vec) in get_test_vectors(test_vectors, 3)]
print(f'Vectors tested: {[i for i,_ in validation_set]}')

for iterations in possible_max_iterations:
    params['max_iterations'] = iterations
    size = get_size(iterations)
    
    # reset weights matrix
    node_matrix = build_node_matrix(size, num_features)
    
    trained_matrix = train_matrix(params, node_matrix, training_data)
        
    # Randomly choose 3 vectors from the validation set
    for i, test_vector in validation_set:      
        # Evaluate the matrix by finding the distance between the test vector and all of the nodes
        distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
        
        # set file path
        distance_image_path = test_iterations_path + f'/vec{i}_iter_{iterations}.png'
        
        # record response
        record_response(distance_matrix, distance_image_path)

Vectors tested: [4, 14, 24]


Inspecting the responses side by side, it appears that a greater number of iterations
produces smaller and clearer areas of maximal response as opposed to almost all of the matrix responding.
Above 1000 iterations, the different clusters in the matrix are clear (green) with the matching cluster to the input vector showing clearly (yellow) and the gaps and least matching clusters being blue.

Number of iterations | Test vector 4 | Test vector 14 | Test vector 24 
---|---|---|---
100 | ![response](output/3_2_4/vec4_iter_100.png) | ![response](output/3_2_4/vec14_iter_100.png) | ![response](output/3_2_4/vec24_iter_100.png)
400 | ![response](output/3_2_4/vec4_iter_400.png) | ![response](output/3_2_4/vec14_iter_400.png) | ![response](output/3_2_4/vec24_iter_400.png)
600 | ![response](output/3_2_4/vec4_iter_600.png) | ![response](output/3_2_4/vec14_iter_600.png) | ![response](output/3_2_4/vec24_iter_600.png)
800 | ![response](output/3_2_4/vec4_iter_800.png) | ![response](output/3_2_4/vec14_iter_800.png) | ![response](output/3_2_4/vec24_iter_800.png)
1000 | ![response](output/3_2_4/vec4_iter_1000.png) | ![response](output/3_2_4/vec14_iter_1000.png) | ![response](output/3_2_4/vec24_iter_1000.png)
1200 | ![response](output/3_2_4/vec4_iter_1200.png) | ![response](output/3_2_4/vec14_iter_1200.png) | ![response](output/3_2_4/vec24_iter_1200.png)
1400 | ![response](output/3_2_4/vec4_iter_1400.png) | ![response](output/3_2_4/vec14_iter_1400.png) | ![response](output/3_2_4/vec24_iter_1400.png)
1600 | ![response](output/3_2_4/vec4_iter_1600.png) | ![response](output/3_2_4/vec14_iter_1600.png) | ![response](output/3_2_4/vec24_iter_1600.png)
1800 | ![response](output/3_2_4/vec4_iter_1800.png) | ![response](output/3_2_4/vec14_iter_1800.png) | ![response](output/3_2_4/vec24_iter_1800.png)
2000 | ![response](output/3_2_4/vec4_iter_2000.png) | ![response](output/3_2_4/vec14_iter_2000.png) | ![response](output/3_2_4/vec24_iter_2000.png)

### 3.2.5 Selecting size of weight matrix

The size of the matrix has so far varied with the number of iterations according to equation $(1)$.
To assist with narrowing down the best possible number of iterations, a range of matrix sizes will be tested for given number of iterations.

In [10]:
# Set up directories
test_sizes_path = outputs + '/3_2_5'
try_make_folder(test_sizes_path)

# set up parameters and options
params = {'gaussian': True, 'alpha_type': 'exp', 'half_life':0.2, 'initial_val': 2}
possible_sizes = [10,12,15]
possible_max_iterations = list(range(100,1001,100))

# Randomly choose 5 vectors from the validation set
validation_set = [(i, vec) for (i, vec) in get_test_vectors(test_vectors, 5)]
print(f'Vectors tested: {[i for i,_ in validation_set]}')


for size in possible_sizes:
    # reset node_matrix
    node_matrix = build_node_matrix(size, num_features)
    
    for iterations in possible_max_iterations:
        params['max_iterations'] = iterations
        
        trained_matrix = train_matrix(params, node_matrix, training_data)
        
        # Iterate through validation set
        for i, test_vector in validation_set:      
            # Evaluate the matrix by finding the distance between the test vector and all of the nodes
            distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])
        
            # set file path
            distance_image_path = test_sizes_path + f'/vec{i}_iter_{iterations}_size_{size}.png'
        
            # record response
            record_response(distance_matrix, distance_image_path)

Vectors tested: [20, 8, 9, 19, 16]


parameters | Vector 8 response | Vector 9 response | Vector 19 response
---|---|---|---
size 10, 400 iterations | ![example](output/3_2_5/vec8_iter_400_size_10.png) | ![example](output/3_2_5/vec9_iter_400_size_10.png) | ![example](output/3_2_5/vec19_iter_400_size_10.png) 
size 10, 1000 iterations | ![example](output/3_2_5/vec8_iter_1000_size_10.png) | ![example](output/3_2_5/vec9_iter_1000_size_10.png) | ![example](output/3_2_5/vec19_iter_1000_size_10.png) 
size 10, 1400 iterations | ![example](output/3_2_5/vec8_iter_1400_size_10.png) | ![example](output/3_2_5/vec9_iter_1400_size_10.png) | ![example](output/3_2_5/vec19_iter_1400_size_10.png) 
size 12, 400 iterations | ![example](output/3_2_5/vec8_iter_400_size_12.png) | ![example](output/3_2_5/vec9_iter_400_size_12.png) | ![example](output/3_2_5/vec19_iter_400_size_12.png) 
size 12, 1000 iterations | ![example](output/3_2_5/vec8_iter_1000_size_12.png) | ![example](output/3_2_5/vec9_iter_1000_size_12.png) | ![example](output/3_2_5/vec19_iter_1000_size_12.png) 
size 12, 1400 iterations | ![example](output/3_2_5/vec8_iter_1400_size_12.png) | ![example](output/3_2_5/vec9_iter_1400_size_12.png) | ![example](output/3_2_5/vec19_iter_1400_size_12.png) 
size 15, 400 iterations | ![example](output/3_2_5/vec8_iter_400_size_15.png) | ![example](output/3_2_5/vec9_iter_400_size_15.png) | ![example](output/3_2_5/vec19_iter_400_size_15.png) 
size 15, 1000 iterations | ![example](output/3_2_5/vec8_iter_1000_size_15.png) | ![example](output/3_2_5/vec9_iter_1000_size_15.png) | ![example](output/3_2_5/vec19_iter_1000_size_15.png) 
size 15, 1400 iterations | ![example](output/3_2_5/vec8_iter_1400_size_10.png) | ![example](output/3_2_5/vec9_iter_1400_size_15.png) | ![example](output/3_2_5/vec19_iter_1400_size_15.png) 
size 20, 400 iterations | ![example](output/3_2_5/vec8_iter_400_size_20.png) | ![example](output/3_2_5/vec9_iter_400_size_20.png) | ![example](output/3_2_5/vec19_iter_400_size_20.png) 
size 20, 1000 iterations | ![example](output/3_2_5/vec8_iter_1000_size_20.png) | ![example](output/3_2_5/vec9_iter_1000_size_20.png) | ![example](output/3_2_5/vec19_iter_1000_size_20.png) 
size 20, 1400 iterations | ![example](output/3_2_5/vec8_iter_1400_size_20.png) | ![example](output/3_2_5/vec9_iter_1400_size_20.png) | ![example](output/3_2_5/vec19_iter_1400_size_20.png) 

The clearest distinction between clusters in the above example is size 15. A smaller matrix seems to respond almost equally across the matrix to test vectors, larger matrices display large empty gaps.

### 3.2.6 Selecting the type of alpha function

Having settled on a size of 15, using a distance matrix to evaluate results, and a Gaussian theta function, we are now in a position to choose an appropriate combination of alpha function, alpha parameter(s), and number of iterations.

In [7]:
# Set up directories
test_alphas_path = outputs + '/3_2_6'
alpha_exp_path = test_alphas_path + '/exp'
alpha_per_path = test_alphas_path + '/per'
try_make_folder(alpha_exp_path)
try_make_folder(alpha_per_path)

# Randomly choose 3 vectors from the validation set
validation_set = [(i, vec) for (i, vec) in get_test_vectors(test_vectors, 5, 11)]
print(f'Vectors tested: {[i for i,_ in validation_set]}')

# Set up parameters and options
params['gaussian']=True
possible_max_iterations = [400, 800, 1000, 1200, 1400]

# Try a broad range of initial values - large risks overfitting to small number of training vectors, small risks underfitting.
alpha_types = ['per', 'exp']
initial_values = [0.1, 0.25, 0.5, 1, 3, 5]

# reset weights matrix with size = 15
node_matrix = build_node_matrix(15,num_features)


for iterations in possible_max_iterations:
    params['iterations']=iterations
    
    # Alpha as an exponential decay function
    for alpha_type in alpha_types:
        params['alpha_type']=alpha_type 
        
        # Loop through possible initial values
        for initial_value in initial_values:
            params['initial_val'] = initial_value
            
            # set half lifes
            if alpha_type == 'exp':
                decay_hlifes = [0.2, 0.4]
                base_filepath = alpha_exp_path
            else:
                decay_hlifes = [0]   # alpha function ignores decay rate for 'per' function
                base_filepath = alpha_per_path

            for decay_hlife in decay_hlifes:
                params['half_life'] = decay_hlife

                # train the matrix with the hyper parameters
                trained_matrix = train_matrix(params, node_matrix, training_data)

                # Iterate through validation set
                for i, test_vector in validation_set:
                    # Evaluate the matrix by finding the distance between a test vector and all of the nodes
                    distance_matrix = np.array([[distance(test_vector, node) for node in row] for row in trained_matrix])

                    # set the filepath
                    response_image_path = base_filepath
                    
                    if alpha_type == 'exp':
                        response_image_path += f'/vec{i}_iterations_{iterations}_initval_{initial_value}_decay_{decay_hlife}.png'
                    else:
                        response_image_path += f'/vec{i}_iterations_{iterations}_initval_{initial_value}.png'
                        
                    # save response
                    record_response(distance_matrix, response_image_path)

Vectors tested: [11]


## Results

##### Summary - extreme values

Description | alpha type | number of iterations (train size) | initial value | $h$ (`half life`) | Vector 11 | Vector 2
---|---|---|---|---|---|---
Exponential alpha function with small train size, small initial value, short half life | exp | 400 | 0.1 | 0.2 | ![response](output/3_2_6/exp/vec11_iterations_400_initval_0.1_decay_0.2.png)|![response](output/3_2_6/exp/vec2_iterations_400_initval_0.1_decay_0.2.png)
Exponential alpha function with small train size, large initial value, short half life | exp | 400 | 5 | 0.2 | ![response](output/3_2_6/exp/vec11_iterations_400_initval_5_decay_0.2.png)|![response](output/3_2_6/exp/vec2_iterations_400_initval_5_decay_0.2.png)
Exponential alpha function with small train size, small initial value, long half life | exp | 400 | 0.1 | 0.4 | ![response](output/3_2_6/exp/vec11_iterations_400_initval_0.1_decay_0.4.png)|![response](output/3_2_6/exp/vec2_iterations_400_initval_0.1_decay_0.4.png)
Exponential alpha function with small train size, large initial value, long half life | exp | 400 | 5 | 0.4 | ![response](output/3_2_6/exp/vec11_iterations_400_initval_5_decay_0.4.png)|![response](output/3_2_6/exp/vec2_iterations_400_initval_5_decay_0.4.png)
Exponential alpha function with large train size, small initial value, short half life | exp | 1000 | 0.1 | 0.2 | ![response](output/3_2_6/exp/vec11_iterations_1000_initval_0.1_decay_0.2.png)|![response](output/3_2_6/exp/vec2_iterations_1000_initval_0.1_decay_0.2.png)
Exponential alpha function with large train size, large initial value, short half life | exp | 1000 | 5 | 0.2 | ![response](output/3_2_6/exp/vec11_iterations_1000_initval_5_decay_0.2.png)|![response](output/3_2_6/exp/vec2_iterations_1000_initval_5_decay_0.2.png)
Exponential alpha function with large train size, small initial value, long half life | exp | 1000 | 0.1 | 0.4 | ![response](output/3_2_6/exp/vec11_iterations_1000_initval_0.1_decay_0.4.png)|![response](output/3_2_6/exp/vec2_iterations_1000_initval_0.1_decay_0.4.png)
Exponential alpha function with large train size, large initial value, long half life | exp | 1000 | 5 | 0.4 | ![response](output/3_2_6/exp/vec11_iterations_1000_initval_5_decay_0.4.png)|![response](output/3_2_6/exp/vec2_iterations_1000_initval_5_decay_0.4.png)
Fractional alpha function with small train size and small initial value | per | 400 | 0.1 | - | ![response](output/3_2_6/per/vec11_iterations_400_initval_0.1.png)|![response](output/3_2_6/per/vec2_iterations_400_initval_0.1.png)
Fractional alpha function with small train size and large initial value | per | 500 | 5 | - | ![response](output/3_2_6/per/vec11_iterations_400_initval_5.png)|![response](output/3_2_6/per/vec2_iterations_400_initval_5.png)
Fractional alpha function with large train size and small initial value | per | 1000 | 0.1 | - | ![response](output/3_2_6/per/vec11_iterations_1000_initval_0.1.png)|![response](output/3_2_6/per/vec2_iterations_1000_initval_0.1.png)
Fractional alpha function with large train size and large initial value | per | 1000 | 5 | - | ![response](output/3_2_6/per/vec11_iterations_1000_initval_5.png)|![response](output/3_2_6/per/vec2_iterations_1000_initval_5.png)

Clearly, an initial value of 5 is generally far too high for the 'per' alpha function and overtrains the matrix.

The most distinct clusters arise from intermediary values. Three final candidates (including the exponential alpha function examined above) have been selected as shown below. 

alpha type | number of iterations (train size) | initial value | $h$ (`half life`) | Vector 2 | Vector 11 | Vector 16 | Vector 24
---|---|---|---|---|---|---|---
exp | 800 | 0.25 | 0.4 | ![response](output/3_2_6/exp/vec2_iterations_800_initval_0.25_decay_0.4.png)|![response](output/3_2_6/exp/vec11_iterations_800_initval_0.25_decay_0.4.png)|![response](output/3_2_6/exp/vec16_iterations_800_initval_0.25_decay_0.4.png)|![response](output/3_2_6/exp/vec24_iterations_800_initval_0.25_decay_0.4.png)
exp | 1200 | 0.5 | 0.2 | ![response](output/3_2_6/exp/vec2_iterations_1200_initval_0.5_decay_0.2.png)|![response](output/3_2_6/exp/vec11_iterations_1200_initval_0.5_decay_0.2.png)|![response](output/3_2_6/exp/vec16_iterations_1200_initval_0.5_decay_0.2.png)|![response](output/3_2_6/exp/vec24_iterations_1200_initval_0.5_decay_0.2.png)
per | 1000 | 0.1 | - | ![response](output/3_2_6/per/vec2_iterations_1000_initval_0.1.png)  | ![response](output/3_2_6/per/vec11_iterations_1000_initval_0.1.png)  | ![response](output/3_2_6/per/vec16_iterations_1000_initval_0.1.png)  | ![response](output/3_2_6/per/vec24_iterations_1000_initval_0.1.png)  
per | 800 | 0.5 | - | ![response](output/3_2_6/per/vec2_iterations_400_initval_0.5.png)  | ![response](output/3_2_6/per/vec11_iterations_400_initval_0.5.png)  | ![response](output/3_2_6/per/vec16_iterations_400_initval_0.5.png)  | ![response](output/3_2_6/per/vec24_iterations_400_initval_0.5.png)


Of these four candidates, the one that displays the greatest contrast between what looks like 4-6 clusters is the fourth one.

The selected hyperparameters are therefore:

- Size of matrix: 15 x 15
- Number of iterations: 1200
- Theta function: gaussian
- Alpha function: exponential decay with initial value 0.5 and halflife of $0.2 \times 1200 = 240$ iterations

These hyperparameters will be stored in a final SOM algorithm function in `featureextractionsom/somap` and the training data will be applied in the next notebook.

## References

Tian, J., Azarian, M. H. & Pecht, M. (2014) 'Anomaly Detection Using Self-Organizing Maps-Based K-Nearest Neighbor Algorithm, in _Proceedings of the European Conference of the Prognostics and Health Management Society_, available online at [SemanticScholar.org](https://www.semanticscholar.org/paper/Anomaly-Detection-Using-Self-Organizing-Maps-Based-Tian-Azarian/0cfcffcf796f0f2f2be202222a07584c9474541c) [Accessed 04/03/2020]