# Testing and evaluation
This notebook tests the models and parameters computed in [setup_feature_training.ipynb](https://github.com/sertansenturk/dlfm_makam_recognition/blob/master/setup_feature_training.ipynb) using the features also computed in the same notebook. Next, it aggregates the obtained results and evaluates the models/parameters.

That that we also apply the testing on the training recordings too, in order to obtain the training accuracy

In [1]:
%matplotlib inline
import os
import itertools
import json
import numpy as np
from matplotlib import pyplot as plt
from morty.extras.foldgenerator import FoldGenerator
from dlfm_code.tester import plot_min_peak_ratio
from fileoperations.fileoperations import get_filenames_in_dir


## Setup the cluster

In [2]:
# ipyparallel
import ipyparallel

# get the clients
clients = ipyparallel.Client()
print(clients.ids)

# create a direct view into the engines
dview = clients.direct_view()

with dview.sync_imports():
    from dlfm_code.trainer import compute_recording_distributions
    from dlfm_code.trainer import train_single
    from dlfm_code.trainer import train_multi 
    from dlfm_code.tester import search_min_peak_ratio
    from dlfm_code.tester import test
    
# dummy method to check that the cluster works properly
def dummy(x):
    return x+2
print dview.map_sync(dummy, range(0,10))
    

[0, 1, 2, 3, 4, 5, 6, 7]
importing compute_recording_distributions from dlfm_code.trainer on engine(s)
importing train_single from dlfm_code.trainer on engine(s)
importing train_multi from dlfm_code.trainer on engine(s)
importing search_min_peak_ratio from dlfm_code.tester on engine(s)
importing test from dlfm_code.tester on engine(s)
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


## Set the dataset, feature and training model paths and define the parameters

In [3]:
# paths
dataset_folder = os.path.abspath(os.path.join(
        './', 'data', 'ottoman_turkish_makam_recognition_dataset'))
    
# parameters
distribution_types = ['pd', 'pcd']
step_sizes = [7.5, 15.0, 25.0, 50.0, 100.0]
kernel_widths = [0, 7.5, 15.0, 25.0, 50.0, 100.0]
model_types = ['single', 'multi']


## Optimize *min_peak_ratio*
This parameter is used in tonic identification and joint estimation to consider the minimum peak height wrt the highest peak to be selected as a tonic candidate. We check in how many recordings the peak is among the peaks obtained above the *min_peak_ratio* and total the number of peaks that is detected above the *min_peak_ratio*. We would like to have the number of peaks as small as possible: **1)** for time and computational performance, **2)** to increase the prior probabilty of selecting the correct tonic among the selected peaks.

While checking we search the *min_peak_ratio* from 0 (i.e. no thresholding) to 1 (i.e. selecting the pitch or pitch-class of the highest peak directly as the tonic) with 0.05 hops. We consider all the *kernel_width*s, *distribution_type*s. We only consider the *step_size*s between 7.5 to 25 cents as max error introduced by the bin size (i.e. *step_size* / 2) for higher *step_size* values is close to or exceeds the distance tolerance between the annotated and the estimated tonic frequencies (25 cents) in the evaluation of tonic identification.

## Testing Parameters

From the figure, we can observe that the probability of having the tonic is among the detected peaks does not have a substatial drop from 0 to 0.35 *minimum_peak_ratio*. In the meantime there is a substantial drop in the number of detected peaks with respect to 0 when we increase the *minimum_peak_ratio*. Depending on the application any *minimum_peak_ratio* can be selected. We select *minimum_peak_ratio* as **0.15**, since our scenario can tolerate longer processing time for better accuracy.

Below we list the parameters to be optimized. Note that the training parameters are listed above.

In [None]:
# experimental setup
experiment_types = ['tonic', 'mode', 'joint']
fold_idx = np.arange(0,10,1)

# testing parameters
dis_measures = ['l1', 'l2', 'l3', 'bhat', 'dis_intersect', 'dis_corr']
k_neighbors = [1, 3, 5, 10, 15]
min_peak_ratios = [0.15]
ranks = [5]


Automatic pdb calling has been turned ON


# Testing

In [None]:
# get all the combinations. We also use the training data to obtain training accuracy later
# NOTE: There are so many of combinations so we use generators
tcombs = itertools.product(step_sizes, kernel_widths, distribution_types, model_types, 
                           fold_idx, experiment_types, dis_measures, k_neighbors, min_peak_ratios, ranks)
# eliminate cases with kernel_width less than one third of step_size
# eliminate cases where single model_type is not called with 1 nearest neighbor
tcombs = itertools.ifilter(lambda c: (c[1] == 0 or 3 * c[2] >= c[0]) and
                           (c[3] == 'multi' or c[7] == 1), tcombs)
tcombs = itertools.izip(*tcombs)
print "Running experiments"

test_result = dview.map_sync(test, *tcombs)

Running experiments


# Results