# Classifiers based on features extracted from matlab

In this notebook we use the different arff files obtained from matlab. We will use this features to obtain classifiers and test them in a cross validation process.

## A bit of set up

We need numpy and pandas for data. Pickle and gzip for read the extracted features

In [1]:
# set up Python environment: numpy for numerical routines
import numpy as np
import pandas as pd

# for store the results
import pickle
import gzip

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

In this example we only use the default linear SVM classifier from libsvm and the Gaussian NB from sklearn

In [2]:
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
from sklearn_extensions.extreme_learning_machines.elm import GenELMClassifier
from sklearn_extensions.extreme_learning_machines.random_layer import RBFRandomLayer, MLPRandomLayer


In [3]:
import sys
sys.path.append('../pycode/')
import utilsData

## Cross validation

In this case, we prepare vectors with the batches of each fold in order to test them in galgo and store the results.

* First, we split the batches in 5 folds:

In [4]:
def check_ELM(string_base):

    num_folds = 5
    nh = 1000

    # pass user defined transfer func
    sinsq = (lambda x: np.power(np.sin(x), 2.0))
    srhl_sinsq = MLPRandomLayer(n_hidden=nh, activation_func=sinsq)

    # use internal transfer funcs
    srhl_tanh = MLPRandomLayer(n_hidden=nh, activation_func='tanh')

    classifiers = [GenELMClassifier(hidden_layer=srhl_tanh),
                   GenELMClassifier(hidden_layer=srhl_sinsq)]
    
    scores = np.zeros(len(classifiers))

    for i in range(0,num_folds):

        features_train = utilsData.readARFF('../features/arff/file_{}_t2_to{:d}_training.arff'.format(string_base,i+1))
        features_test = utilsData.readARFF('../features/arff/file_{}_t2_to{:d}_test.arff'.format(string_base,i+1))

        features_train=pd.DataFrame(features_train['data'],columns=features_train['vars'])
        features_train=features_train.rename(columns=(lambda x: 'var'+str(int(x[3:])-1)))

        features_test=pd.DataFrame(features_test['data'],columns=features_test['vars'])
        features_test=features_test.rename(columns=(lambda x: 'var'+str(int(x[3:])-1)))

        for idx, clf in enumerate(classifiers):
            clf.fit(features_train.iloc[:,0:-1], features_train.iloc[:,-1])
            scores[idx] += clf.score(features_test.iloc[:,0:-1], features_test.iloc[:,-1])

    return scores/num_folds

In [7]:
list_to_pandas = []
for i in [0,2]:
    print ("database: {:d}".format(i))
    if i == 0: 
        robot = 1 
    else:
        robot = 0
    for j in [0,1]:
        print ("descriptor: {:d}".format(j))
        for k in [50,100,200,300]:
            print ("descriptor_size: {:d}".format(k))
            results = check_ELM('da{:d}_r{:d}_de{:d}_v{:d}_ci0'.format(i,robot,j,k))
            print (results)
            list_to_pandas.append([i,robot,j,k,0,results[1]])
            print ()

database: 0
descriptor: 0
descriptor_size: 50
[ 0.89033654  0.89350474]

descriptor_size: 100
[ 0.87299816  0.87000521]

descriptor_size: 200
[ 0.84694729  0.84976241]

descriptor_size: 300
[ 0.83427212  0.83620963]

descriptor: 1
descriptor_size: 50
[ 0.93328611  0.93117429]

descriptor_size: 100
[ 0.94789654  0.94939275]

descriptor_size: 200
[ 0.9515938   0.95027218]

descriptor_size: 300
[ 0.95361706  0.95432109]

database: 2
descriptor: 0
descriptor_size: 50
[ 0.84117461  0.84061062]

descriptor_size: 100
[ 0.82642798  0.82614682]

descriptor_size: 200
[ 0.79994275  0.80322918]

descriptor_size: 300
[ 0.7815341   0.78181509]

descriptor: 1
descriptor_size: 50
[ 0.92786642  0.93049584]

descriptor_size: 100
[ 0.94336436  0.94467927]

descriptor_size: 200
[ 0.9489059   0.94740267]

descriptor_size: 300
[ 0.94909277  0.9494702 ]



In [8]:
for i in [0,2]:
    print ("database: {:d}".format(i))
    if i == 0: 
        robot = 1 
        ci = 3
    else:
        robot = 0
        ci = 1
    for j in [0,1]:
        print ("descriptor: {:d}".format(j))
        for k in [50,100,200,300]:
            print ("descriptor_size: {:d}".format(k))
            results = check_ELM('da{:d}_r{:d}_de{:d}_v{:d}_ci{:d}'.format(i,robot,j,k,ci))
            print (results)
            list_to_pandas.append([i,robot,j,k,1,results[1]])
            print ()

database: 0
descriptor: 0
descriptor_size: 50
[ 0.88171132  0.87854367]

descriptor_size: 100
[ 0.86032416  0.85900436]

descriptor_size: 200
[ 0.83295181  0.82846319]

descriptor_size: 300
[ 0.80135663  0.79783442]

descriptor: 1
descriptor_size: 50
[ 0.92580492  0.91718013]

descriptor_size: 100
[ 0.94199923  0.93777504]

descriptor_size: 200
[ 0.94516856  0.94182364]

descriptor_size: 300
[ 0.94824877  0.94094352]

database: 2
descriptor: 0
descriptor_size: 50
[ 0.8517886   0.85254026]

descriptor_size: 100
[ 0.83779438  0.83892242]

descriptor_size: 200
[ 0.81807068  0.82577229]

descriptor_size: 300
[ 0.80266774  0.80924179]

descriptor: 1
descriptor_size: 50
[ 0.92739658  0.92542475]

descriptor_size: 100
[ 0.94092181  0.93669597]

descriptor_size: 200
[ 0.9420491   0.93697722]

descriptor_size: 300
[ 0.9438334  0.9365073]



In [9]:
list_to_pandas

[[0, 1, 0, 50, 0, 0.89350473717801759],
 [0, 1, 0, 100, 0, 0.87000520500920175],
 [0, 1, 0, 200, 0, 0.84976240527193081],
 [0, 1, 0, 300, 0, 0.83620963019648897],
 [0, 1, 1, 50, 0, 0.93117428725454354],
 [0, 1, 1, 100, 0, 0.94939274892646675],
 [0, 1, 1, 200, 0, 0.95027217860617286],
 [0, 1, 1, 300, 0, 0.95432108710335029],
 [2, 0, 0, 50, 0, 0.84061061531235326],
 [2, 0, 0, 100, 0, 0.82614681670735235],
 [2, 0, 0, 200, 0, 0.80322918251642306],
 [2, 0, 0, 300, 0, 0.7818150865424266],
 [2, 0, 1, 50, 0, 0.93049583551095216],
 [2, 0, 1, 100, 0, 0.94467926708521044],
 [2, 0, 1, 200, 0, 0.94740266871307688],
 [2, 0, 1, 300, 0, 0.94947020466308096],
 [0, 1, 0, 50, 1, 0.87854366630933867],
 [0, 1, 0, 100, 1, 0.85900435609698678],
 [0, 1, 0, 200, 1, 0.82846319005099667],
 [0, 1, 0, 300, 1, 0.79783442184121012],
 [0, 1, 1, 50, 1, 0.91718013204612636],
 [0, 1, 1, 100, 1, 0.93777504445945359],
 [0, 1, 1, 200, 1, 0.94182364313465483],
 [0, 1, 1, 300, 1, 0.940943516355502],
 [2, 0, 0, 50, 1, 0.85254