<center><h1>Classification by regional modeling</h1></center>

Classification by regional modeling consists in a five-step approach:
1. Setting the hyper-parameters. In this step, we specify the number of SOM prototypes $C$. It must be also defined as
the maximum number of regions $K_{max}$. Without any prior knowledge, we will set in this example $K_{max} = \sqrt{C}$.


2. SOM training. In order to build regional models, follow the procedure introduced by Vesanto and Alhoniemi [1].
Thus, the very first step requires training the SOM as usual, with $C$ prototypes.


3. Clustering of the SOM. The step consists in performing clustering over the $C$ SOM prototypes. Although one may
use any clustering algorithm for this step, for the sake of simplicity, we use the standard K-means algorithm in
combination with the Davies–Bouldin (DB) index. The DB index is a clustering validity measure commonly used for
finding the optimal number of clusters, but any suitable measure can be equally used (see [2]). Thus, we compute
$K = 1, 2, ... K_{max}$ partitioning of the SOM prototypes and the corresponding DB index value as well.
The optimal partitioning, represented by $K_{opt}$ partitions, is then the value of $K$ wich minimizes the DB index.


4. Partitioning SOM prototypes into regions. Once $K_{opt}$ is selected, the $r$-th cluster of SOM prototypes,
$r = 1...K_{opt}$, is composed of all weight vectors $w_i$ that are mapped onto the prototype $p_r$ of the K-means
algorithm. More formally, the set of SOM prototypes associated with the r-th prototype of the K-means algorithm
is defined as: $$W_r = \{w_i \in R^{p+q} | \|w_i-p_r\| < \|w_i-p_j\|, \forall j =1,...,K_{opt}, j\neq r \}$$


5. Mapping data points to regions. The fourth step consists in finding $K_{opt}$ data partitions, denoted by
$\{X_1\}$, $\{X_2\}$, ... , $\{X_{K_{opt}}\}$ of the training dataset by mapping each datapoint to a region
$r \in \{1, ... , K_{opt}\}$. In other words, let us denote $N_r$ as the number of data vectors in $\{X_r\}$.
Then, the partition $\{X_r\}$ is composed of those input vectors $x_{rμ}$, $μ = 1, ... , N_r$ , whose closest SOM
prototype belongs to $W_r$.


6. Building classification models over the regions. Finally, once the original dataset has been divided into $K_{opt}$
subsets (one per region), the last step consists in building $K_{opt}$ regional classification models using
$X_r$, $r = 1, ... , K_{opt}$.

* Vertebral Column
* Wall-Following
* Alzheimer


### References

[1] J. Vesanto, E. Alhoniemi, Clustering of the self-organizing map, IEEE Trans.
Neural Netw. 11 (2000) 586–600.

[2] M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques, J. Intell. Inf. Syst. 17 (2001) 107–145.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
%%time

import numpy as np 
import pandas as pd 

from math import ceil
from load_dataset import get_datasets

from devcode.utils import dummie2multilabel, scale_feat
from devcode.models.som import SOM
from devcode.models.regional_learning import RegionalModel
from devcode import run_round

from devcode.utils.metrics import DB

from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.cluster import KMeans


datasets    = get_datasets()
dataset_name='pk'

X = datasets[dataset_name]['features'].values
Y = datasets[dataset_name]['labels'].values

test_size = 0.2

# Train/Test split = 80%/20%
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=test_size)

# scaling features
X_tr_norm, X_ts_norm = scale_feat(X_train, X_test, scaleType='min-max')

#N = len(dataset['features'].index) # number of datapoints
N = len(X_tr_norm) # number of datapoints in the train split
l = ceil((5*N**.5)**.5) # side length of square grid of neurons

som = SOM(l,l)
som_params={
    'alpha0':    0.01,
    'sigma0':    1,
    'nEpochs':   1,
    'verboses':  0            
}

C = l**2 # number of SOM neurons in the 2D grid
k_values = [i for i in range(2, ceil(np.sqrt(C)))] # 2 to sqrt(C)
cluster_params={
    'n_clusters': {'metric':   DB,        # when a dictionary is pass a search begins
                   'criteria': np.argmin, # search for smallest DB score 
                   'k_values': k_values}, # around the values provided in 'k_values'
    'n_init':     10, # number of initializations
    'init':       'random', 
    #'n_jobs':     -1
}

linearModel = linear_model.LinearRegression(n_jobs=-1)

rm = RegionalModel(som, linearModel)
rm.fit(X=X_tr_norm, Y=y_train, verboses=0,
        SOM_params     = som_params,
        Cluster_params = cluster_params)

# Evaluating in the test dataset
y_pred = rm.predict(X_ts_norm)
y_pred = np.round(np.clip(y_pred, 0, 1)) # rounding prediction numbers

cm = confusion_matrix(dummie2multilabel(y_test),
                      dummie2multilabel(y_pred))
#cm = np.asarray(cm).reshape(-1) # matrix => array
acc=0
total=sum(sum(cm))
for j in range(len(cm)):
    acc += cm[j,j] # summing the diagonal
acc/=total

CPU times: total: 719 ms
Wall time: 176 ms


In [5]:
acc

0.8205128205128205