# Table 1 

This tutorial explains the experimental setup to obtain **one row** of Table 1 in our paper.

In [1]:
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.svm import LinearSVC #Linear SVM for feture maps
from sklearn.svm import SVC #Dual SVM for Kernels
from time import time
import numpy as np

'FeatureMaps' and 'DataReader' imports are the modules that contain implementations of the proposed feature maps and the dataset importing procedures.

In [2]:
import FeatureMaps as maps
import DataReader as DR

Here, we give all of of the datasets.  A row of Table 1 can be obtained  by simply changing the last two lines below.

In [3]:
# All datasets
#datasets= [DR.australian,DR.fourclass,DR.ionosphere,DR.heart,DR.pima,DR.wprognostic,DR.bupa,DR.fertility,DR.wdiagnostic]

# All dataset names
#data_names=['Australian','Fourclass', 'Ionosphere', 'Heart', 'PIMA', 'W. Prognostic','Bupa', 'Fertility', 'W. Diagnostic' ]

datasets=[DR.australian]
data_names=['Australian']

Following code block creates lists of feature maps, kernels in the order of the columns in Table 1. When a feature map is removed from the list `mapping_functions` then, `p_vals`  list  should be arranged accordingly. Therefore, we suggest the reader to leave this code block as given.

In [4]:
# Mapping funtions
mapping_functions=[maps.linear, maps.phi_p_1,maps.phi_p_1,maps.phi_p_d,maps.phi_p_d]

# p values
# p value of the linear kernel is used as a dummy variable
p_vals= [0,1,2,1,2]

# Mapping and kernel names
kernel_names=['LIN',r'$\phi_{1,1}$',r'$\phi_{2,1}$',r'$\phi_{1,d}$',r'$\phi_{2,d}$','POL','RBF']

Here, we create parameter sets for kernels and SVM model along with the random state for splitting the datasets.

In [5]:
# Set of POL kernel degree pameters 
d_params = [2, 3, 4]

# Set of  RBF kernel gamma parameters
g_params = np.power(10.0, range(-5, 5))

# Set of C parameters (penalty parameter)
c_params = np.power(10.0, range(-5, 5))

# Random seed for spliting the dataset
random_state=42

We store all of the performance metrics in the  dictionary named 'results' in order to print the obtained results in a readable format.  The metrics are stored by regarding the dataset-feature map and kernel orders, which are assigned above.  

In [6]:
# A dictionary to store  all results
results={'Dataset':[], 'Training Acc.':[], 'Test Acc.':[], 'Training Time':[]}

We perform stratified 10-fold cross validation for each given dataset. The best performing hyperparameters are found by applying grid search with stratified two-fold cross validation on each training part. Here, we also set the cross validation objects to make sure each feature map and kernel use same training and test parts.

In [7]:
for  i in range(len(datasets)):
    X,y=datasets[i]()
    # A dictionary to store performance metrics
    average_performance_metrics = {'Kernel': [], 'Training Acc': [], 'Test Acc': [], 'Training Time': []}
    # 10-fold cros validation object 
    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=random_state)
    # 2-fold cross validation object
    inner_cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=random_state)



Following code block, performs  stratified 10-fold cross validation for the proposed feature maps including  the linear. It also finds the best  performing hyperparameters by applying grid search with stratified two-fold cross validation on each training part.

In [8]:
#suppress the convergence warning for LinearSVC
import warnings
warnings.filterwarnings("ignore")

for m in range(len(mapping_functions)):
    average_performance_metrics['Kernel'].append(kernel_names[m])
    performance_metrics = {'Train_Acc': [], 'Test_Acc': [], 'Train_Time': []}
    #begin: 10-fold
    for train_index, test_index in skf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        acc_by_param = []
        #begin: grid search
        for ci in c_params:
            all_acc = []
            #begin: two-fold
            for inner_train_index, inner_test_index in inner_cv.split(X_train, y_train):
                #begin: scaling
                scaler = StandardScaler()
                X_inner_train = scaler.fit_transform(X_train[inner_train_index])
                X_inner_test = scaler.transform(X_train[inner_test_index])
                #end: scaling
                y_inner_train, y_inner_test = y_train[inner_train_index], y_train[inner_test_index]
                clf = LinearSVC(C=ci, dual=False).fit(mapping_functions[m](X_inner_train, p=p_vals[m]), y_inner_train)
                all_acc.append(accuracy_score(y_inner_test, clf.predict(mapping_functions[m](X_inner_test, p=p_vals[m]))))
            #end: two-fold
            acc_by_param.append(np.mean(all_acc))
        #end: grid search
        #get best hyperparameters
        best_c = c_params[np.argmax(acc_by_param)]
        #begin: scaling
        scaler = StandardScaler()
        X_train_scaled=scaler.fit_transform(X_train)
        X_test_scaled=scaler.transform(X_test)
        #end: scaling
        s = time()
        XD_train = mapping_functions[m](X_train_scaled, p=p_vals[m])
        clf = LinearSVC(C=best_c, dual=False).fit(XD_train, y_train)
        performance_metrics['Train_Time'].append(round(time() - s, 4))
        performance_metrics['Train_Acc'].append(accuracy_score(y_train, clf.predict(XD_train)))
        performance_metrics['Test_Acc'].append(accuracy_score(y_test, clf.predict(mapping_functions[m](X_test_scaled, p=p_vals[m]))))
    #end: 10-fold
    average_performance_metrics['Training Acc'].append(round(100 * np.mean(performance_metrics['Train_Acc']), 2))
    average_performance_metrics['Test Acc'].append(round(100 * np.mean(performance_metrics['Test_Acc']), 2))
    average_performance_metrics['Training Time'].append( round(np.mean(performance_metrics['Train_Time']), 4))

Following code block performs the stratified 10-fold cross validation for POL kernel and finds the best performing parameters by applying grid search
with stratified two-fold cross validation.

In [9]:
average_performance_metrics['Kernel'].append('POL')
performance_metrics = {'Train_Acc': [], 'Test_Acc': [], 'Train_Time': []}
#begin: 10-fold
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    acc_by_param = {'ci': [], 'di': [], 'inner_test_acc': []}
    #begin: grid search
    for ci in c_params:
        for di in d_params:
            all_acc = []
            #begin: two-fold
            for inner_train_index, inner_test_index in inner_cv.split(X_train, y_train):
                #begin: scaling
                scaler = StandardScaler()
                X_inner_train = scaler.fit_transform(X_train[inner_train_index])
                X_inner_test = scaler.transform(X_train[inner_test_index])
                #end: scaling
                y_inner_train, y_inner_test = y_train[inner_train_index], y_train[inner_test_index]
                clf = SVC(kernel='poly', C=ci, degree=di).fit(X_inner_train, y_inner_train)
                all_acc.append(accuracy_score(y_inner_test, clf.predict(X_inner_test)))
            #end: two-fold
            acc_by_param['ci'].append(ci)
            acc_by_param['di'].append(di)
            acc_by_param['inner_test_acc'].append(np.mean(all_acc))
    #end: grid search
    #get best hyperparameters
    best_ind = np.argmax(acc_by_param['inner_test_acc'])
    best_c, best_d = acc_by_param['ci'][best_ind], acc_by_param['di'][best_ind]
    #begin: scaling
    scaler = StandardScaler()
    X_train_scale = scaler.fit_transform(X_train)
    X_test_scale = scaler.transform(X_test)
    #end: scaling
    s = time()
    clf = SVC(kernel='poly', C=best_c, degree=best_d).fit(X_train_scale, y_train)
    performance_metrics['Train_Time'].append(round(time() - s, 4))
    performance_metrics['Train_Acc'].append(accuracy_score(y_train, clf.predict(X_train_scale)))
    performance_metrics['Test_Acc'].append(accuracy_score(y_test, clf.predict(X_test_scale)))
 #end: 10-fold
average_performance_metrics['Training Acc'].append(round(100 * np.mean(performance_metrics['Train_Acc']), 2))
average_performance_metrics['Test Acc'].append(round(100 * np.mean(performance_metrics['Test_Acc']), 2))
average_performance_metrics['Training Time'].append(round(np.mean(performance_metrics['Train_Time']), 4))

Following code block performs the stratified 10-fold cross validation for RBF kernel and finds the best performing parameters by applying grid search
with stratified two-fold cross validation.

In [10]:
average_performance_metrics['Kernel'].append('RBF')
performance_metrics = {'Train_Acc': [], 'Test_Acc': [], 'Train_Time': []}
#begin: 10-fold
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    acc_by_param = {'ci': [], 'gi': [], 'inner_test_acc': []}
    #begin: grid search
    for ci in c_params:
        for gi in g_params:
            all_acc = []
            #begin: two-fold
            for inner_train_index, inner_test_index in inner_cv.split(X_train, y_train):
                #begin: scaling
                scaler = StandardScaler()
                X_inner_train = scaler.fit_transform(X_train[inner_train_index])
                X_inner_test = scaler.transform(X_train[inner_test_index])
                #end: scaling
                y_inner_train, y_inner_test = y_train[inner_train_index], y_train[inner_test_index]
                clf = SVC(kernel='rbf', C=ci, gamma=gi).fit(X_inner_train, y_inner_train)
                all_acc.append(accuracy_score(y_inner_test, clf.predict(X_inner_test)))
            #end: two-fold
            acc_by_param['ci'].append(ci)
            acc_by_param['gi'].append(gi)
            acc_by_param['inner_test_acc'].append(np.mean(all_acc))
    #end: grid search
    #get best hyperparameters
    best_ind = np.argmax(acc_by_param['inner_test_acc'])
    best_c, best_g = acc_by_param['ci'][best_ind], acc_by_param['gi'][best_ind]
    #begin: scaling
    scaler = StandardScaler()
    X_train_scale = scaler.fit_transform(X_train)
    X_test_scale = scaler.transform(X_test)
    #end: scaling
    s = time()
    clf = SVC(kernel='rbf', C=best_c, gamma=best_g).fit(X_train_scale, y_train)
    performance_metrics['Train_Time'].append(round(time() - s, 4))
    performance_metrics['Train_Acc'].append(accuracy_score(y_train, clf.predict(X_train_scale)))
    performance_metrics['Test_Acc'].append(accuracy_score(y_test, clf.predict(X_test_scale)))
#end: 10-fold
average_performance_metrics['Training Acc'].append(round(100 * np.mean(performance_metrics['Train_Acc']), 2))
average_performance_metrics['Test Acc'].append(round(100 * np.mean(performance_metrics['Test_Acc']), 2))
average_performance_metrics['Training Time'].append(round(np.mean(performance_metrics['Train_Time']), 4))

#store all results
results['Dataset'].append(data_names[i])
results['Training Acc.'].append(average_performance_metrics['Training Acc'])
results['Test Acc.'].append(average_performance_metrics['Test Acc'])
results['Training Time'].append(average_performance_metrics['Training Time'])

Following code block prints the performance metrics in a readeable format.

In [11]:
print('***** Table 1: Average Test Accuracies')
n_datasets=len(datasets)
n_kenels=len(kernel_names)
for d in range(n_datasets):
    print()
    print('***'+ results['Dataset'][d])
    temp=''
    for k in range(n_kenels):
        temp+=kernel_names[k] +'\t'+ str(results['Test Acc.'][d][k])+ '\t'+ '\t'

    print(temp)

***** Table 1: Average Test Accuracies

***Australian
LIN	86.81		$\phi_{1,1}$	86.81		$\phi_{2,1}$	86.67		$\phi_{1,d}$	86.67		$\phi_{2,d}$	86.09		POL	84.2		RBF	85.51		
