### Classification with other features
* Zweck: Validierung von Klassifikation mit anderen Features als Includes und Function Calls.
* Matrix: Aktueller Stand "current" des mozilla-central Repository
* Features: Includes, Function Calls, Definitions, Names, Conditions
* Modell: Support Vector Machine Classifier

#### Setup
* Training-Set/Test-Set: Stratified sampling auf einer Matrix (2/3 : 1/3)

#### Results
Tabellarischer Vergleich der durchschnittlichen Precision und Recall Werte für verschiedene Features bei n=5 Experimenten. Weiter werden die Anzahl extrahierter Features und die durchschnittliche Laufzeit für das Trainieren des Modells aufgelistet.

In [9]:
import matplotlib.pyplot as plt
import numpy as np
from prettytable import PrettyTable

from imports.matrix_helper import MatrixHelper
from imports.prediction_helper import PredictionHelper
from sklearn.metrics import precision_recall_curve

matrix_helper = MatrixHelper()
experiments_per_feature = 5
counter = 1

features = [('incl', 'Includes'), ('cond', 'Conditions'), ('defs', 'Definitions'), ('names', 'Names'), ('calls', 'Function Calls')]
table = PrettyTable(['Features', 'Feature count', 'Precision', 'Recall', 'Time for fitting'])

for feature in features:
    for h_type in ['current', 'history']:
        # Read pickle
        matrices = matrix_helper.load_from_parse('data/matrices/matrix_cla_' + feature[0] + '_' + h_type + '.pickle')
        
        precision_list = []
        recall_list = []
        time_list = []
        
        for i in range(experiments_per_feature):
            # Instantiate Prediction Helper Class and predict values for compare matrix with an SVM
            prediction_helper = PredictionHelper()
            prediction_helper.calculate_validation_compare_matrix(matrices, sampling_factor=(2.0/3), prediction_type='LinearSVC', crop_matrix=False)
            compare_matrix = prediction_helper.get_compare_matrix()

            # Compute Precision-Recall
            precision, recall, thresholds = precision_recall_curve(np.array(compare_matrix[:, 2], dtype='f'), np.array(compare_matrix[:, 1], dtype='f'))
            precision_list.append(precision[1])
            recall_list.append(recall[1])
            time_list.append(prediction_helper.time)

        divisor = float(experiments_per_feature)
        feature_name = "{} ({})".format(feature[1], h_type)
        precision = "{:.3f}".format(sum(precision_list)/divisor)
        recall = "{:.3f}".format(sum(recall_list)/divisor)
        time = "{:.2f}min".format(sum(time_list)/divisor)
        
        table.add_row([feature_name, len(matrices[2]), precision, recall, time])
        print(' * {} - done {}/{}'.format(feature_name, counter, 2 * len(features)))
        counter += 1

print(table)

 * Includes (current) - done 1/10
 * Includes (history) - done 2/10
 * Conditions (current) - done 3/10
 * Conditions (history) - done 4/10
 * Definitions (current) - done 5/10
 * Definitions (history) - done 6/10
 * Names (current) - done 7/10
 * Names (history) - done 8/10
 * Function Calls (current) - done 9/10
 * Function Calls (history) - done 10/10
+--------------------------+---------------+-----------+--------+------------------+
|         Features         | Feature count | Precision | Recall | Time for fitting |
+--------------------------+---------------+-----------+--------+------------------+
|    Includes (current)    |     15362     |   0.556   | 0.391  |     0.05min      |
|    Includes (history)    |     16383     |   0.669   | 0.573  |     0.06min      |
|   Conditions (current)   |     19417     |   0.657   | 0.159  |     0.06min      |
|   Conditions (history)   |     19926     |   0.750   | 0.332  |     0.06min      |
|  Definitions (current)   |     77527     |   0