<center><h1>Regional Least Squares Support Vector Machine (R-LSSVM)</h1></center>

## Summary:
1. [Methodology](#methodology)


2. [Simulations](#simulations)
    
    2.1 [Regioanl LSSVM](#r-lssvm)

# 1. Methodology <a class="anchor" id="methodology"></a>

The approach was:

1. For 50 times:
    
    1.1 Divide the data set between train/test in stratified manner;
    
    1.2 Used 5-fold stratified cross-validation on the training set to choose best hyperparameters;
    
    1.3 Fit model in the whole train set with best hyperparameters;
    
    1.3 Make predictions in test set;
    
2. Distribution of the performance metric on train and test sets was evaluated.

# 2. Simulations <a class="anchor" id="simulations"></a>

In [1]:
%run -i "load_dataset.py" # loading dataset
%run -i "aux_func.py"     # loading auxilary function

Dataset:  Features.shape:   # of classes:
vc2c      (310, 6)          2
vc3c      (310, 6)          3
wf24f     (5456, 24)        4
wf4f      (5456, 4)         4
wf2f      (5456, 2)         4
pk        (195, 22)         2


In [2]:
# takes confusion matrix and evaluate the accuracy
def cm2acc(cm):
    acc=0
    total=sum(sum(cm))
    for j in range(cm.shape[0]):
        acc += cm[j,j] # summing the diagonal
    acc/=total
    return acc

# convert dummies to multilabel
def dummie2multilabel(X):
    N = len(X)
    X_multi = np.zeros((N,1),dtype='int')
    for i in range(N):
        temp = np.where(X[i]==1)[0] # find where 1 is found in the array
        if temp.size == 0: # is an empty array, there is no '1' in the X[i] array
            X_multi[i] = 0 # so we denote this class '0'
        else:
            X_multi[i] = temp[0] + 1 # we have +1 because 0 denote the class with an empty array
    return X_multi.T[0]

## 2.1 Regional LSSVM <a class="anchor" id="r-lssvm"></a>

Repeating `RegionalModel` class below:

In [3]:
from sklearn import linear_model
from sklearn.cluster import KMeans
from copy import copy

class BiasModel:
    'Class the implements a dummy model in case of homogenous region.'
    
    def __init__(self, class_label):
        self.class_label = class_label
        
    def predict(self, X):
        return np.vstack( [self.class_label]*len(X) )
    

class RegionalModel:
    'Class of Regional Models.'
    
    def __init__(self, SOM_class, Model_class, Cluster_class=None):
        self.SOM             = SOM_class
        self.Cluster         = Cluster_class
        self.Model           = Model_class
        self.region_labels   = []
        self.regional_models = []
        self.empty_regions   = []
        self.targets_dim_    = None
        

    def fit(self, X, Y, verboses=0, SOM_params=None, Cluster_params=None, Model_params=None):
        self.targets_dim_ = Y.shape[1] # dimension of target values
        
        # SOM training
        if SOM_params is not None:
            if verboses==1: print("Start of SOM training at {}".format(datetime.datetime.now()))
            self.SOM.fit(X=X, **SOM_params)
        
        # Cluster training
        if Cluster_params is not None:
            if verboses==1: print("Start of clustering SOM prototypes at {}".format(datetime.datetime.now()))
            
            # Search for k_opt
            k_opt = None
            if type(Cluster_params['n_clusters']) is dict: # a search is implied:
                eval_function = Cluster_params['n_clusters']['metric']
                find_best     = Cluster_params['n_clusters']['criteria']
                k_values      = Cluster_params['n_clusters']['k_values']
                
                validation_index = [0]*len(k_values)
                for i in range(len(k_values)):
                    kmeans = KMeans(n_clusters=k_values[i],
                                    n_init=10,
                                    init='random'
                                    #n_jobs=-1
                                   ).fit(self.SOM.neurons)
                    # test if number of distinct clusters == number of clusters specified
                    centroids = kmeans.cluster_centers_
                    if len(centroids) == len(np.unique(centroids,axis=0)):
                        validation_index[i] = eval_function(kmeans,self.SOM.neurons)
                    else:
                        validation_index[i] = np.NaN
                
                k_opt = k_values[find_best(validation_index)]
                if verboses==1: print("Best k found: {}".format(k_opt))
            else:
                k_opt = Cluster_params['n_clusters']
            
            params = Cluster_params.copy()
            del params['n_clusters'] # deleting unecessary param
            # real training of clustering algorithm
            self.Cluster = KMeans(n_clusters=k_opt, **params).fit(self.SOM.neurons)
            
        
        # Model training
        self.region_labels = self.regionalize(X) # finding labels of datapoints
        if verboses==1: print("Start of Model training at {}".format(datetime.datetime.now()))
        
        self.regional_models = [None]*k_opt    
        for r in range(k_opt): # for each region
            Xr = X[np.where(self.region_labels == r)[0]]
            Yr = Y[np.where(self.region_labels == r)[0]]
            
            model_r = None
            if len(Xr) == 0: # empty region
                self.empty_regions.append(r)
            
            elif len( np.unique(Yr, axis=0) ) == 1: # homogenous region, only one class
                model_r = BiasModel(np.unique(Yr, axis=0))
                
            else: # region not empty or homogeneous
                self.Model.fit(Xr,Yr)
                model_r = copy(self.Model)

            self.regional_models[r] = model_r
            
                       
                
    def regionalize(self, X):
        regions = np.zeros(len(X), dtype='int')
        for i in range(len(X)): # for each datapoint
            winner_som_idx, dist_matrix = self.SOM.get_winner(
                                          X[i], dim=1, dist_matrix=True) # find closest neuron
            region = self.Cluster.labels_[winner_som_idx] # find neuron label index in kmeans
            
            # if the region don't have a model is because it didn't have datapoints in the train set
            if region in self.empty_regions: 
                dead_neurons, = np.where(self.Cluster.labels_==region)
                dist_matrix[dead_neurons] = np.inf # taking off dead neurons from the play
                
                temp = np.argmin(dist_matrix)
                region = self.Cluster.labels_[temp]
                
            regions[i] = region 
        
        return regions
          
    
    def predict(self, X):
        # searching for a non-empty region
        regions = [i for i in range(self.Cluster.n_clusters)]
        not_empty_regions = list(set(regions) - set(self.empty_regions))
        
#         temp = self.regional_models[not_empty_regions[0]].intercept_
        predictions = np.zeros((
            len(X),           # number of samples
            self.targets_dim_ # size of output Y
        ))
        
        regions = self.regionalize(X)
        for i in range(len(X)):
            predictions[i,:] = self.regional_models[regions[i]].predict(X[i].reshape(1, -1))
        
        return predictions

In [4]:
datasets_names = ['pk', 'vc2c', 'vc3c', 'wf2f', 'wf4f', 'wf24f']
# constant hyperparameters:
test_size = 0.5
scaleType = 'min-max'
n_init = 50 # number of independent runs


# hyperparameters grid search:
gammas = np.logspace(-6.0, 6.0, num=7).tolist()
sigmas = np.logspace(-0.5, 3.0, num=5).tolist()

print("gammas = {}".format(gammas))
print("sigmas = {}".format(sigmas))

hps_cases = [
    { "gamma": gamma,
      "sigma": sigma 
    }
    for gamma in gammas
    for sigma in sigmas
]
print("# of hps_cases = {}".format(len(hps_cases)))

# vector of random states for train/test split
random_states = np.unique( 
    pd.read_csv('../Local_Modeling/simulation_results/G-LSSVM - n_init=50 - 2019-08-28 12:58:37.036007.csv',usecols=['random_state']).values
).tolist()
# random_states = np.random.randint(np.iinfo(np.int32).max, size=n_init).tolist()
cases = [
    {
         "dataset_name": dataset_name
        ,"random_state": random_state
    }
    # hyperparameters possible values
    for dataset_name in datasets_names
    for random_state in random_states
]
print(' ')
print("# of cases: {}".format(len(cases)))

gammas = [1e-06, 0.0001, 0.01, 1.0, 100.0, 10000.0, 1000000.0]
sigmas = [0.31622776601683794, 2.371373705661655, 17.78279410038923, 133.3521432163324, 1000.0]
# of hps_cases = 35
 
# of cases: 300


**Índices Baseados em Critérios de Informação:**

In [5]:
from math import log
from scipy.spatial.distance import cdist

def IC_core(X, labels_pred):
    unique_labels = np.unique(labels_pred)
    N = len(X)
    P = len(unique_labels)*X.shape[1]
    
    # Mean Squared Quantization Error
    MSQE = 0
    for label in unique_labels:
        X_cluster = X[labels_pred==label]
        cluster = np.mean(X_cluster, axis=0)
        MSQE += np.sum((X_cluster - cluster)**2)
    MSQE = (1/N)*MSQE
           
    lhs = N*log(MSQE/N) # left-hand side
    
    return (N, P, lhs)

# Final Prediction Error
def FPE(X, labels_pred):
    N, P, lhs = IC_core(X, labels_pred)
    if ((N+P)/(N-P))<=0: # not define in math the rhs
        return np.inf
    else:
        return lhs + N*log((N+P)/(N-P))

# Akaike Information Criteria
def AIC(X, labels_pred):
    N, P, lhs = IC_core(X, labels_pred)
    return lhs + 2*P

# Bayesian Information Criteria
def BIC(X, labels_pred):
    N, P, lhs = IC_core(X, labels_pred)
    return lhs + P*log(N)

# Minimum Description Length
def MDL(X, labels_pred):
    N, P, lhs = IC_core(X, labels_pred)
    return lhs + (P/2)*log(N)

In [6]:
from sklearn import metrics
import base

# https://scikit-learn.org/stable/modules/clustering.html
cluster_val_metrics = [
#     {
#         'name'    : 'Adjusted Rand Index',
#         'f'       : metrics.adjusted_rand_score,
#         'get_best': np.argmax,
#         'type'    : 'supervised'
#     },
#     {
#         'name'    : 'Adjusted Mutual Information',
#         'f'       : metrics.adjusted_mutual_info_score,
#         'get_best': np.argmax,
#         'type'    : 'supervised'
#     },
#     {
#         'name'    : 'V-measure',
#         'f'       : metrics.v_measure_score,
#         'get_best': np.argmax,
#         'type'    : 'supervised'
#     },
#     {
#         'name'    : 'Fowlkes-Mallows',
#         'f'       : metrics.fowlkes_mallows_score,
#         'get_best': np.argmax,
#         'type'    : 'supervised'
#     },
    {
        'name'    : 'Silhouette',
        'f'       : metrics.silhouette_score,
        'get_best': np.argmax,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Calinski-Harabasz',
        'f'       : metrics.calinski_harabasz_score,
        'get_best': np.argmax,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Davies-Bouldin',
        'f'       : metrics.davies_bouldin_score,
        'get_best': np.argmin,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Dunn',
        'f'       : base.dunn_fast,
        'get_best': np.argmax,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Final Prediction Error',
        'f'       : FPE,
        'get_best': np.argmin,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Akaike Information Criteria',
        'f'       : AIC,
        'get_best': np.argmin,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Bayesian Information Criteria',
        'f'       : BIC,
        'get_best': np.argmin,
        'type'    : 'unsupervised'
    },
    {
        'name'    : 'Minimum Description Length',
        'f'       : MDL,
        'get_best': np.argmin,
        'type'    : 'unsupervised'
    },
]

print("# of cluster validation metrics: {}".format(len(cluster_val_metrics)))

# of cluster validation metrics: 8


Funções para avaliação dos agrupamentos:

In [7]:
# parar com a porra dos warnings do sklearn
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

def eval_cluster(cluster_val_metrics, X, labels_true, labels_pred):
    names = [metric['name'] for metric in cluster_val_metrics]
    results = {}
    
    for metric in cluster_val_metrics:
        if metric['type']=='supervised':
            results[metric['name']] = metric['f'](labels_true, labels_pred)
        else:
            results[metric['name']] = metric['f'](X, labels_pred)
            
    return results
    
# eval_cluster(cluster_val_metrics, X_tr_norm, labels_true, labels_pred)

def get_k_opt_suggestions(X, y, ks, cluster_val_metrics):
    results = {metric['name']: [None]*len(ks) for metric in cluster_val_metrics}
    for i in range(len(ks)):
        kmeans = KMeans(n_clusters=ks[i], init='random').fit(X)
        labels_true = y.ravel()
        labels_pred = kmeans.labels_

        temp = eval_cluster(cluster_val_metrics, X, labels_true, labels_pred)
        for name in temp:
            results[name][i] = temp[name]

    suggestions = {metric['name']: np.nan for metric in cluster_val_metrics}
    for i in range(len(cluster_val_metrics)):
        metric = cluster_val_metrics[i]
        suggestions[metric['name']]  = ks[metric['get_best'](results[metric['name']])]
        
    return suggestions

## FUNCTION TO EVAL R-LSSVM

In [8]:
from lssvm import LSSVM
from som import SOM
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.cluster import KMeans
from copy import copy

from pathlib import Path

# Objective function in validation strategy
def f_o(u):
    return np.mean(u) - 2*np.std(u)

# no SOM's hyperparameter optimization was done
som_params={
            'alpha0':    0.1,
            'sigma0':    10,
            'nEpochs':   300,
            'verboses':  0            
        }

# preparando o cabeçalho:
temp = [' ']*2*len(cluster_val_metrics)
count=0
for metric in cluster_val_metrics:
    temp[count]   = "$\gamma_{opt}$ "+"[{}]".format(metric['name'])
    temp[count+1] = "$\sigma_{opt}$ "+"[{}]".format(metric['name'])
    count+=2
# print(temp)
header  = [
    "dataset_name", "random_state",
    "# empty regions", "# homogeneous regions"] + \
    ["$\gamma_{opt}$ [CV]", "$\sigma_{opt}$ [CV]"] + \
    temp +\
    ["$k_{opt}$ [CV]"] + \
    ['$k_{opt}$ '+'[{}]'.format(metric['name']) for metric in cluster_val_metrics] + \
    ['cv_score [{}]'.format(metric['name']) for metric in cluster_val_metrics] + \
    ["eigenvalues", "eigenvalues_dtype", "cm_tr", "cm_ts"]
# cm_tr.dtype and cm_ts.dtype = 'int64'
# print(header)


# Regional Least-Squares Support Vector Machine
# global optimization of hyperparameters
def eval_RLSSVM(case):
    filename = ("./temp_rlssvm/results/R-LSSVM - {}.csv".format(case)).replace(':','-')
    my_file = Path(filename)
    
    if not my_file.is_file(): # compute if it doesn't exists
        dataset_name = case['dataset_name']
        random_state = case['random_state']

        X = datasets[dataset_name]['features'].values
        Y = datasets[dataset_name]['labels'].values

        # Train/Test split
        X_train, X_test, y_train, y_test = train_test_split(X,Y, stratify=np.unique(Y,axis=1), 
                                                            test_size=test_size, random_state=random_state)
        # scaling features
        X_tr_norm, X_ts_norm = scale_feat(X_train, X_test, scaleType=scaleType)

        # solving multilabel problem in wall-following data set
        y_temp = y_train
        if y_train.ndim==2: 
            if y_train.shape[1] >= 2: y_temp = dummie2multilabel(y_train)
                
        # We first fit the SOM in the whole train set so we can get suggestion for number
        # of clusters in K-Means
        N = len(X_tr_norm)
        l = int((5*N**.5)**.5) # size of square grid of neurons
        som_tr = SOM(l,l)
        som_tr.fit(X_tr_norm, **som_params)
        
        C  = l**2 # number of SOM neurons in the 2D grid
        ks = np.arange( 2, int(C**(1/2))+1 ).tolist() # 2 to sqrt(C)
        suggestions = get_k_opt_suggestions(som_tr.neurons, 
                                            np.empty(C), 
                                            ks, cluster_val_metrics)
        unique_suggestions = np.unique(list(suggestions.values())).tolist()
        
        validation_scores = np.empty((len(unique_suggestions), 2)) # [k, cv_score] 
        best_hps_list     = [{}]*len(unique_suggestions)
        count_v=0
        for k in unique_suggestions: # para cada proposta de k_{opt}
            # 5-fold stratified cross-validation for hyperparameter optimization
            n_cases = len(hps_cases)
            cv_scores = [0]*n_cases
            for i in range(n_cases):
                skf = StratifiedKFold(n_splits=5)
                acc=[0]*5
                count=0

                # train/validation split
                for tr_index, val_index in skf.split(X_tr_norm, y_temp): 
                    x_tr, x_val = X_tr_norm[tr_index], X_tr_norm[val_index]
                    y_tr, y_val = y_train[tr_index],   y_train[val_index]

                    # train the model on the train set
                    N = len(x_tr)
                    l = int((5*N**.5)**.5) # size of square grid of neurons
                    som = SOM(l,l)
                    cluster_params={'n_clusters': k, 'n_init': 10, 'init': 'random'}

                    model_alg = LSSVM(kernel='rbf', **hps_cases[i])
                    rm = RegionalModel(som, model_alg)
                    rm.fit(X=x_tr, Y=y_tr, verboses=0,
                           SOM_params     = som_params,
                           Cluster_params = cluster_params)
                    # eval model accuracy on validation set
                    acc[count] = accuracy_score(y_val, rm.predict(x_val))
                    count+=1
                    
                # apply objective function to cv accuracies
                cv_scores[i] = f_o(acc)
                
            # the best hyperparameters are the ones that maximize the objective function
            best_hps_list[count_v]       = hps_cases[ np.argmax(cv_scores) ]
            validation_scores[count_v,:] = [k, np.amax(cv_scores)]
            count_v+=1
            
        
        # k_opt as the one with best cv_score and smallest value
        best_k = int(validation_scores[validation_scores.argmax(axis=0)[1], 0])

        # the best hyperparameters are the ones that maximize the objective function
        best_hps = best_hps_list[validation_scores.argmax(axis=0)[1]]

        # fit the model on best global hyperparameters
        cluster_params={'n_clusters': best_k, 'n_init': 10, 'init': 'random'}
        model_alg = LSSVM(kernel='rbf', **best_hps)
            rm = RegionalModel(som_tr, model_alg)
            rm.fit(X=X_tr_norm, Y=y_train, verboses=0,
                   # SOM_params=som_params, # no need as the SOM was trained in the beginning
                   Cluster_params = cluster_params)

        # make predictions and evaluate model
        y_pred_tr, y_pred_ts = rm.predict(X_tr_norm), rm.predict(X_ts_norm)
        cm_tr = confusion_matrix(dummie2multilabel(y_train),
                                 dummie2multilabel(y_pred_tr))
        cm_ts = confusion_matrix(dummie2multilabel(y_test),
                                 dummie2multilabel(y_pred_ts))

        # getting eigenvalues of kernel matrices
        n_lssvms=0 # counter for LSSVM models
        models_idx = []
        for i in range(len(rm.regional_models)):         
            if isinstance(rm.regional_models[i], LSSVM):            
                n_lssvms+=1
                models_idx.append(i)
    #     print("models_idx={}".format(models_idx))

        X_labels = rm.regionalize(X_tr_norm)
        eigvals_list = [None, np.array([np.nan])]*n_lssvms
        count=0
        for i in models_idx:
            x_region = X_tr_norm[X_labels==i]
            K = rm.regional_models[i].kernel(x_region, x_region)
            temp = np.linalg.eigvals(K)
            eigvals_list[count] = temp#.tostring()
            count+=2

        eigvals = np.concatenate( eigvals_list, axis=0 )

        k_opt                 = len(rm.regional_models)
        n_empty_regions       = len(rm.empty_regions)
        n_homogeneous_regions = 0
        for i in range(len(rm.regional_models)):         
            if isinstance(rm.regional_models[i], BiasModel):            
                n_homogeneous_regions+=1

        # Organizing suggestion of the cluster metrics
        temp = [np.nan]*2*len(cluster_val_metrics)
        count=0
        for metric in cluster_val_metrics:
            temp[count] = best_hps_list[
                np.where( validation_scores[:,0]==suggestions[metric['name']] )[0][0]]['gamma']
            temp[count+1] = best_hps_list[
                np.where( validation_scores[:,0]==suggestions[metric['name']] )[0][0]]['sigma']
            count+=2        
        
        data = np.array(
            [dataset_name, random_state,
             n_empty_regions, n_homogeneous_regions] + \
            [best_hps['gamma'], best_hps['sigma']] + \
            temp + \
            [k_opt] + \
            [value for value in list(suggestions.values())] + \
            [validation_scores[ 
                np.where( validation_scores[:,0]==suggestions[metric['name']] )[0][0] ,
                1] for metric in cluster_val_metrics ] + \
            [eigvals.tostring(), eigvals.dtype, cm_tr.tostring(), cm_ts.tostring()]
        ).reshape(1,-1)
        
        
        results_df = pd.DataFrame(data, columns=header)
        results_df.to_csv(filename,index=False) # saving results in csv file
        
        # saving clusters to .csv
        kmeans_proto = rm.Cluster.cluster_centers_
        neurons      = rm.SOM.neurons
        
        kmeans_filename  = ("./temp_rlssvm/clusters/kmeans/L-LSSVM - {}.csv".format(case)).replace(':','-')
        neurons_filename = ("./temp_rlssvm/clusters/som/L-LSSVM - {}.csv".format(case)   ).replace(':','-')
        
        pd.DataFrame(kmeans_proto).to_csv(kmeans_filename, header=None, index=None)
        pd.DataFrame(neurons).to_csv(neurons_filename, header=None, index=None)

In [10]:
%%time
import datetime
print(datetime.datetime.now())
eval_RLSSVM(cases[1])

2019-09-13 19:56:15.743788
CPU times: user 1.38 ms, sys: 0 ns, total: 1.38 ms
Wall time: 1.38 ms


In [11]:
# %%time
# from random import shuffle
# shuffle(cases) # better estimation of remaining time

from joblib import Parallel, delayed
data = Parallel(n_jobs=-1, verbose=51)( #6
    delayed(eval_RLSSVM)(case) for case in reversed(cases)
)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed: 1585.3min
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed: 1589.6min
[Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed: 1599.8min
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed: 1608.4min
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed: 2832.9min
[Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed: 2833.1min
[Parallel(n_jobs=-1)]: Done   7 tasks      | elapsed: 3220.2min
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed: 3234.3min
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed: 4396.3min
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed: 4450.9min
[Parallel(n_jobs=-1)]: Done  11 tasks      | elapsed: 4833.9min
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed: 4847.8min
[Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed: 5592.9min
[Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed: 6057.1min
[Parallel(n_jobs=-1)]: Done 

KeyboardInterrupt: 

In [None]:
# To see every column of pandas data frame on jupyter notebook
import pandas as pd
from IPython.display import display
pd.options.display.max_columns = None
# display(results_df)

import glob

path = r'./temp_rlssvm/results' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

df_results = pd.concat(li, axis=0, ignore_index=True)
df_results
# pd.read_csv("./temp/R-LSSVM - {'dataset_name': 'vc2c', 'random_state': 73470257, 'kernel': 'linear'}.csv")

How to get back the eigenvalues of each region:

In [107]:
df_results=frame
idx=0
eigvals_full = np.frombuffer(eval(df_results['eigenvalues'][idx]), 
                             dtype=df_results['eigenvalues_dtype'][idx])
# print(eigvals_full)
nan_indices = np.argwhere(np.isnan(eigvals_full))
eigvals_list = [None]*len(nan_indices)
last_nan=-1
for i in range(len(nan_indices)):
    print(nan_indices[i][0])
    eigvals_list[i] = eigvals_full[last_nan+1:nan_indices[i][0]]
    last_nan=nan_indices[i][0]
    
for eig in eigvals_list:
    print(eig)

60
[ 1.19462052e+02+0.00000000e+00j  1.01115026e+01+0.00000000e+00j
  5.69025736e+00+0.00000000e+00j  2.89735797e+00+0.00000000e+00j
  2.59291355e+00+0.00000000e+00j  1.58522926e+00+0.00000000e+00j
  1.31880473e+00+0.00000000e+00j  9.71090509e-01+0.00000000e+00j
  6.36440390e-01+0.00000000e+00j  4.61385429e-01+0.00000000e+00j
  3.16833636e-01+0.00000000e+00j  1.05823181e-01+0.00000000e+00j
  6.90840090e-02+0.00000000e+00j  3.95962680e-02+0.00000000e+00j
  1.66614842e-02+0.00000000e+00j  1.31963195e-02+0.00000000e+00j
  6.28172756e-03+0.00000000e+00j  3.58913422e-03+0.00000000e+00j
  2.67738097e-03+0.00000000e+00j  5.78927150e-04+0.00000000e+00j
  1.56135245e-06+0.00000000e+00j  1.67729601e-07+0.00000000e+00j
 -2.83437488e-15+9.46607938e-16j -2.83437488e-15-9.46607938e-16j
  2.79478689e-15+0.00000000e+00j  2.42396305e-15+0.00000000e+00j
 -1.97511462e-15+0.00000000e+00j -1.90160504e-15+0.00000000e+00j
 -1.73828225e-15+1.77847715e-16j -1.73828225e-15-1.77847715e-16j
  2.04948379e-15+0.000

How to get back `cm_tr` and `cm_ts`:

In [108]:
for i in range(len(cases)):
    print(df_results['dataset_name'][i])
    temp_tr = np.frombuffer(eval( df_results['cm_tr'][i] ),
                         dtype=df_results['cm_tr_dtype'][i])
    temp_ts = np.frombuffer(eval( df_results['cm_ts'][i] ),
                         dtype=df_results['cm_ts_dtype'][i])
    print("cm_tr:")
    print(temp_tr.reshape( int(len(temp_tr)**(1/2)) ,-1))
    print(" ")
    print("cm_ts:")
    print(temp_ts.reshape( int(len(temp_ts)**(1/2)) ,-1))
    
    print("\n")

pk
cm_tr:
[[19  5]
 [ 1 72]]
 
cm_ts:
[[19  5]
 [12 62]]


vc2c
cm_tr:
[[99  6]
 [12 38]]
 
cm_ts:
[[101   4]
 [ 19  31]]


pk
cm_tr:
[[24  0]
 [ 0 73]]
 
cm_ts:
[[18  6]
 [ 5 69]]


pk
cm_tr:
[[19  5]
 [ 1 72]]
 
cm_ts:
[[15  9]
 [ 2 72]]


pk
cm_tr:
[[24  0]
 [ 0 73]]
 
cm_ts:
[[15  9]
 [ 5 69]]


vc2c
cm_tr:
[[98  7]
 [13 37]]
 
cm_ts:
[[100   5]
 [ 21  29]]




KeyError: 6