# MLE - Exercise 3 - Kaggle Competition
## Andreas Kocman (se19m024)


This exercise is in the form of a Kaggle competition. A few quick details on Kaggle & the competition format:

## Kaggle
* Kaggle (https://en.wikipedia.org/wiki/Kaggle) is a platform that allows a competition for a certain data set. Participants submit their prediction on a test set, and will get automated scoring on their results, and will enter the leaderboard.
* From Kaggle, you will be able to obtain a labelled training set, and an unlabelled test set.
* You can submit multiple entries to Kaggle; for each entry, you need to provide details on how you achieved the results - which software and which version of the software, which operating system, which algorithms, and which parameter settings for these algorithms; further, any processing applied to the data before training/predicting. There is a specific "description" field when submitting, you should fill in this information there, and you also need to include this description and the actual submission file in your final submission to Moodle.
* To submit to Kaggle, you need to create a specific submission file, which contains the predictions you obtain on the test set. Computing an aggregated evaluation criterion is done automatically by Kaggle
* The format of your submission is rather simple - it is a comma-separated file, where the first column is the identifier of the item that you are predicting, and the second column is the class you are predicting for that item. The first line should include a header, and is should use the names provided in the training set. An example is below:
```
ID,class
911366,B
852781,B
89524,B
857438,B
905686,B
```
* There is a limit of 7 submissions per day; finally, you also need to select your top 7 submissions to be counted in the competition
* Before you submit, you should evaluate the classifiers "locally" on your training set, i.e. by splitting that again in a training & test set (or using cross validation), to select a number of fitting algorithms & parameters. Then re-train your best models on the full local training set, and generate the predictions for the test set.
* Evaluation in Kaggle is split in two types of leaderboards - the private and public one. Here, the data is split into 50% / 50%, and as soon as you upload, you will know your results on one of these splits.
* The final results will only be visible once the competition closes, and as it is computed on a different split, might be slightly different than what you see initially (e.g. this is similar to a training/test/validation split)
* As it is a competition, there will be bonus points for the top 3 submissions.
* As reproducible science is great, there will be additional bonus points for submissions that use a notebook within the Kaggle competition (note: this was / partially still is called a "kernel" inside the Kaggle competition; Kernel obviously was a confusing term here, as it basically refers to code being executed in the environment of Kaggle itself (e.g. a jupyter notebook, or also a python or R script), and they seem to have realized that, and renamed it). see https://www.kaggle.com/notebooks or https://www.kaggle.com/getting-started/44939. You can first work locally, and then port your code to the notebook version. In Kaggle, your notebook will initially be private. Please share it with me (mayer@ifs.tuwien.ac.at), at least, though. You can also make it public at the end of the competition, to show off :-)

## Datasets
We will use the following datasets:
* Congressional Voting: a small dataset, a good entry point for your experiments (435 instances, 16 features)
  * Kaggle page: https://www.kaggle.com/t/c04c953c596e48099d857129f53fcbdb
* Amazon reviews: a dataset with many features (10k, extracted from text), but not that many instances (~800)
  * Kaggle page: https://www.kaggle.com/t/0bd2ac297dc242478b5979d5ee772136

## Submission
The Kaggle competition will close on the day displayed in Kaggle. After that, you still have time to submit to Moodle. Your submission to Moodle shall contain:

* A brief report, containing
  * A description of the datasets, including a short analysis of the features.
  * Details on the software you used for creating your solution
  * The algorithms and parameters you tried
  * The results you obtained on the locally split training/test set
    * And a comparison to the results that you received on Kaggle - how large was the difference, did the rank of the classifiers change (i.e. the first on your training set, was it still the best on the test set on Kaggle?)
* All the code needed to obtain your results
* The solution files that you uploaded to Kaggle

# Solution

## Helper Functions for Solution and Data Analysis

In [1]:
# global Imports
import pandas as pd
import numpy as numpy

#sk learn imports
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.model_selection import cross_validate
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import make_scorer

#Data reporting
from IPython.display import display

# Global definitions:
overall_results_vote = []
overall_results_amazon = []
averaging_approach = 'macro'
zero_division_approach = 0
number_of_folds = 2
scoring = {'Accuracy': make_scorer(accuracy_score),
            'Precision': make_scorer(precision_score, average=averaging_approach, zero_division=zero_division_approach),
            'Recall': make_scorer(recall_score, average=averaging_approach, zero_division=zero_division_approach)}

# Helper functions
def parse_k_fold_results(results):
    return "m: " + str(numpy.average(results)) + " std: " + str(numpy.std(results))

def parse_argument_tuple_as_string(argumentsTuple):
    return "max Depth: " + str(argumentsTuple[0])  + \
           ", min Samples: " + str(argumentsTuple[1])

def calculate_results_holdout(classifier_used, X_train, X_test, y_train, y_test):
    classifier_used.fit(X_train, y_train)

    # predict the test set on our trained classifier
    y_test_predicted = classifier_used.predict(X_test)

    acc = metrics.accuracy_score(y_test, y_test_predicted)
    recall=metrics.recall_score(y_test, y_test_predicted)
    precision = metrics.precision_score(y_test, y_test_predicted)

    return pd.Series({
            'classifier': str(classifier_used),
            'arguments': "",
            'accuracy':acc,
            'precision':precision,
            'recall':recall
        })

def calculate_results_cross_validate(classifier_used, description_used, data, target):
   scores = cross_validate(classifier_used, data, target,
                                scoring = scoring,
                                cv = number_of_folds,
                                error_score = 0)

   return pd.Series({
            'classifier': str(classifier_used),
            'arguments': description_used,
            'mean_accuracy': numpy.average(scores.get('test_Accuracy')),
            'mean_precision': numpy.average(scores.get('test_Precision')),
            'mean_recall': numpy.average(scores.get('test_Recall')),
            'accuracy': parse_k_fold_results(scores.get('test_Accuracy')),
            'precision': parse_k_fold_results(scores.get('test_Precision')),
            'recall':parse_k_fold_results(scores.get('test_Recall'))
        })

def print_results(array, column_for_max, ascending=False):
    df = pd.DataFrame(array)
    df = df.sort_values(by=[column_for_max], ascending=False)
    display('Results', df)

    best = df.iloc[df[column_for_max].argmax()]
    display(best)

### Calculation Functions


#### k-NN Calculation

In [2]:
from sklearn import neighbors

def calculate_knn(data, target):
    knn_results = []

    n_neighbors = range(1,10,1)

    for n in n_neighbors:
        knn_classifier = neighbors.KNeighborsClassifier(n)
        description = "N = " + str(n)
        result = calculate_results_cross_validate(knn_classifier,
                                                  description,
                                                  data,
                                                  target)
        knn_results.append(result)
    return knn_results


#### Bayes Calculation

In [3]:
from sklearn import naive_bayes

def calculate_bayes(data, target):
    bayes_results = []

    alphas = numpy.arange(0.1,5,1)

    for alpha in alphas:
        classifier = naive_bayes.CategoricalNB(alpha = alpha)
        description = "Alpha = " + str(alpha)
        result = calculate_results_cross_validate(classifier,
                                                  description,
                                                  data,
                                                  target)
        bayes_results.append(result)

    return bayes_results

#### Perceptron Calculation

In [4]:
from sklearn import linear_model

def calculate_perceptron(data, target):
    perceptron_results=[]
    classifier = linear_model.Perceptron()
    description = "No additional args."
    result = calculate_results_cross_validate(classifier,
                                              description,
                                              data,
                                              target)
    perceptron_results.append(result)
    return perceptron_results

#### Decision Tree Calculation

In [5]:
from sklearn import tree
import itertools

def calculate_decision_tree(data, target):
    # Parameters for the decision tree
    max_depth_arguments = range(1, 10, 2)
    min_samples_leaf_arguments = [2,20,50,100]
    argumentTuples = list(itertools.product(max_depth_arguments,
                                            min_samples_leaf_arguments))
    decision_tree_results = []

    for argumentTuple in argumentTuples:
        max_depth = argumentTuple[0]
        min_samples_leaf = argumentTuple[1]

        classifier = tree.DecisionTreeClassifier(criterion = 'gini',
                                                 max_depth = max_depth,
                                                 min_samples_leaf = min_samples_leaf,
                                                 splitter = 'best')
        #result = calculate_results_holdout(classifier, X_train, X_test, y_train, y_test)
        result = calculate_results_cross_validate(classifier,
                                                  parse_argument_tuple_as_string(argumentTuple),
                                                  data,
                                                  target)
        decision_tree_results.append(result)
    return decision_tree_results

#### SVM Calculation

In [6]:
from sklearn import svm
import itertools

def calculate_svm(data, target):
    kernels = {"linear", "poly", "sigmoid", "rbf"}
    gamma = [0.001] #numpy.arange(0.001, 1., 0.1)
    gamma.append ("scale")
    gamma.append ("auto")
    c = range(1, 302, 100)
    argumentTuples = list(itertools.product(kernels,
                                            gamma,
                                            c))
    svm_results = []

    for argumentTuple in argumentTuples:
        kernel = argumentTuple[0]
        gamma = argumentTuple[1]
        c = argumentTuple[2]

        classifier = svm.SVC(kernel = kernel, gamma=gamma, C=c)

        #result = calculate_results_holdout(classifier, X_train, X_test, y_train, y_test)
        result = calculate_results_cross_validate(classifier,
                                                  "Kernel: " + kernel,
                                                  data,
                                                  target)
        svm_results.append(result)
    return svm_results

## Congressional Voting

In [7]:
#Read Data
votingDataLearn = pd.read_csv("data/voting/CongressionalVotingID.shuf.lrn.csv", na_values='unknown')
votingDataSolutionExample = pd.read_csv("data/voting/CongressionalVotingID.shuf.sol.ex.csv", na_values='unknown')
votingDataTest = pd.read_csv("data/voting/CongressionalVotingID.shuf.tes.csv", na_values='unknown')
display("Original Data", votingDataLearn)

#Recode values
votingDataLearn = votingDataLearn.dropna()
votingDataLearn = votingDataLearn.replace('y', 1)\
    .replace('n', 0)\
    .replace('democrat', 2)\
    .replace('republican', 3)
votingDataLearn = votingDataLearn[votingDataLearn.columns[1:17]].astype('category')

display("Recoded Data", votingDataLearn)

# Prepare a train/test set split in case of holdout calculation
X_train, X_test, y_train, y_test = train_test_split(votingDataLearn[votingDataLearn.columns[2:17]],
                                                    votingDataLearn[votingDataLearn.columns[1]], test_size=0.33)

'Original Data'

Unnamed: 0,ID,class,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-crporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,213,democrat,n,n,y,n,n,n,y,y,y,n,y,n,n,n,y,y
1,94,democrat,y,n,y,n,n,n,y,n,y,y,y,n,n,n,y,y
2,188,democrat,y,n,y,n,n,n,y,y,y,n,n,n,n,n,y,
3,61,democrat,y,y,y,n,n,,y,y,y,y,n,n,n,n,y,
4,184,democrat,,,,,,,,,y,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
213,250,democrat,y,n,y,n,n,n,y,y,,n,y,n,n,n,y,y
214,26,democrat,y,n,y,n,n,n,y,y,y,y,n,n,n,n,y,y
215,110,democrat,y,,y,n,n,n,y,y,y,n,n,n,n,n,y,
216,34,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y


'Recoded Data'

Unnamed: 0,class,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-crporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports
0,2,0,0,1,0,0,0,1,1,1,0,1,0,0,0,1
1,2,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1
5,2,0,0,1,0,0,0,1,1,1,1,0,0,1,0,1
8,3,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0
9,3,0,0,1,1,1,1,1,1,0,1,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207,3,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0
212,3,1,1,1,1,1,1,1,1,0,1,0,0,1,1,0
214,2,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1
216,3,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0


### k-NN - Congressional Vote

In [8]:
knn_results = calculate_knn(votingDataLearn[votingDataLearn.columns[2:17]],
                            votingDataLearn[votingDataLearn.columns[1]])
overall_results_vote.extend(knn_results)

print_results(knn_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
7,KNeighborsClassifier(n_neighbors=8),N = 8,0.670145,0.668046,0.586842,m: 0.6701451905626135 std: 0.06669691470054445,m: 0.6680459633246011 std: 0.11136580138128127,m: 0.5868421052631579 std: 0.03157894736842104
1,KNeighborsClassifier(n_neighbors=2),N = 2,0.669691,0.636876,0.555263,m: 0.6696914700544465 std: 0.014519056261343033,m: 0.6368760064412238 std: 0.0390499194847021,m: 0.5552631578947369 std: 0.01578947368421052
6,KNeighborsClassifier(n_neighbors=7),N = 7,0.661525,0.647057,0.592763,m: 0.661524500907441 std: 0.07531760435571688,m: 0.6470573077715935 std: 0.09528963100391669,m: 0.5927631578947368 std: 0.03881578947368419
8,KNeighborsClassifier(n_neighbors=9),N = 9,0.653055,0.662153,0.586184,m: 0.6530550514216575 std: 0.1013309134906231,m: 0.6621533613445378 std: 0.1364180672268907,m: 0.5861842105263158 std: 0.058552631578947356
2,KNeighborsClassifier(n_neighbors=3),N = 3,0.652601,0.60284,0.580921,m: 0.6526013309134906 std: 0.049153055051421646,m: 0.602839550454328 std: 0.05549378287900525,m: 0.5809210526315789 std: 0.03750000000000003
5,KNeighborsClassifier(n_neighbors=6),N = 6,0.63536,0.6208,0.541447,m: 0.6353599516031458 std: 0.06639443436176645,m: 0.6208002317113539 std: 0.1032563720622311,m: 0.5414473684210526 std: 0.024342105263157887
0,KNeighborsClassifier(n_neighbors=1),N = 1,0.635209,0.58891,0.553947,m: 0.6352087114337568 std: 0.0490018148820327,m: 0.5889097744360903 std: 0.04680451127819546,m: 0.5539473684210526 std: 0.011842105263157876
4,KNeighborsClassifier(),N = 5,0.626588,0.574744,0.548026,m: 0.6265880217785844 std: 0.05762250453720508,m: 0.5747441520467835 std: 0.057200292397660835,m: 0.5480263157894737 std: 0.03092105263157896
3,KNeighborsClassifier(n_neighbors=4),N = 4,0.626286,0.541942,0.528289,m: 0.6262855414398064 std: 0.022837265577737464,m: 0.5419421145568922 std: 0.005403653018430599,m: 0.5282894736842105 std: 0.01513157894736844


classifier                   KNeighborsClassifier(n_neighbors=8)
arguments                                                  N = 8
mean_accuracy                                           0.670145
mean_precision                                          0.668046
mean_recall                                             0.586842
accuracy          m: 0.6701451905626135 std: 0.06669691470054445
precision         m: 0.6680459633246011 std: 0.11136580138128127
recall            m: 0.5868421052631579 std: 0.03157894736842104
Name: 7, dtype: object

### Bayes - Congressional Vote

In [9]:
bayes_results = calculate_bayes(votingDataLearn[votingDataLearn.columns[2:17]],
                                votingDataLearn[votingDataLearn.columns[1]])
overall_results_vote.extend(bayes_results)

print_results(bayes_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
5,CategoricalNB(alpha=5.1),Alpha = 5.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
6,CategoricalNB(alpha=6.1),Alpha = 6.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
7,CategoricalNB(alpha=7.1),Alpha = 7.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
8,CategoricalNB(alpha=8.1),Alpha = 8.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
9,CategoricalNB(alpha=9.1),Alpha = 9.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
0,CategoricalNB(alpha=0.1),Alpha = 0.1,0.687689,0.668554,0.676316,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6685544415383124 std: 0.07536447738060642,m: 0.6763157894736842 std: 0.0736842105263158
1,CategoricalNB(alpha=1.1),Alpha = 1.1,0.687689,0.668554,0.676316,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6685544415383124 std: 0.07536447738060642,m: 0.6763157894736842 std: 0.0736842105263158
2,CategoricalNB(alpha=2.1),Alpha = 2.1,0.687689,0.668554,0.676316,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6685544415383124 std: 0.07536447738060642,m: 0.6763157894736842 std: 0.0736842105263158
3,CategoricalNB(alpha=3.1),Alpha = 3.1,0.687689,0.668554,0.676316,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6685544415383124 std: 0.07536447738060642,m: 0.6763157894736842 std: 0.0736842105263158
4,CategoricalNB(alpha=4.1),Alpha = 4.1,0.687689,0.668554,0.676316,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6685544415383124 std: 0.07536447738060642,m: 0.6763157894736842 std: 0.0736842105263158


classifier                              CategoricalNB(alpha=5.1)
arguments                                            Alpha = 5.1
mean_accuracy                                           0.696461
mean_precision                                          0.678174
mean_recall                                             0.682895
accuracy          m: 0.6964609800362976 std: 0.09301270417422869
precision         m: 0.6781739294472741 std: 0.08498396528956798
recall            m: 0.6828947368421052 std: 0.08026315789473681
Name: 5, dtype: object

### Perceptron - Congressional Vote

In [10]:
perceptron_results = calculate_perceptron(votingDataLearn[votingDataLearn.columns[2:17]],
                                          votingDataLearn[votingDataLearn.columns[1]])
overall_results_vote.extend(perceptron_results)

print_results(perceptron_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
0,Perceptron(),No additional args.,0.643527,0.470458,0.551974,m: 0.6435269207501513 std: 0.005595886267392602,m: 0.47045807453416144 std: 0.14010093167701862,m: 0.5519736842105264 std: 0.06513157894736846


classifier                                           Perceptron()
arguments                                     No additional args.
mean_accuracy                                            0.643527
mean_precision                                           0.470458
mean_recall                                              0.551974
accuracy          m: 0.6435269207501513 std: 0.005595886267392602
precision         m: 0.47045807453416144 std: 0.14010093167701862
recall             m: 0.5519736842105264 std: 0.06513157894736846
Name: 0, dtype: object

### Decision Tree - Congressional Vote

In [11]:
decision_tree_results = calculate_decision_tree(votingDataLearn[votingDataLearn.columns[2:17]],
                                                votingDataLearn[votingDataLearn.columns[1]])
overall_results_vote.extend(decision_tree_results)

print_results(decision_tree_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
0,"DecisionTreeClassifier(max_depth=1, min_sample...","max Depth: 1, min Samples: 2",0.69631,0.657906,0.652632,m: 0.6963097398669087 std: 0.07562008469449483,m: 0.6579059829059829 std: 0.08568376068376071,m: 0.6526315789473685 std: 0.08421052631578946
17,"DecisionTreeClassifier(max_depth=9, min_sample...","max Depth: 9, min Samples: 20",0.687689,0.679143,0.681579,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6791425420457678 std: 0.06444720154397576,m: 0.6815789473684211 std: 0.05526315789473685
5,"DecisionTreeClassifier(max_depth=3, min_sample...","max Depth: 3, min Samples: 20",0.687689,0.679143,0.681579,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6791425420457678 std: 0.06444720154397576,m: 0.6815789473684211 std: 0.05526315789473685
13,"DecisionTreeClassifier(max_depth=7, min_sample...","max Depth: 7, min Samples: 20",0.687689,0.679143,0.681579,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6791425420457678 std: 0.06444720154397576,m: 0.6815789473684211 std: 0.05526315789473685
9,"DecisionTreeClassifier(max_depth=5, min_sample...","max Depth: 5, min Samples: 20",0.687689,0.679143,0.681579,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6791425420457678 std: 0.06444720154397576,m: 0.6815789473684211 std: 0.05526315789473685
1,"DecisionTreeClassifier(max_depth=1, min_sample...","max Depth: 1, min Samples: 20",0.687689,0.679143,0.681579,m: 0.6876890502117362 std: 0.08424077434966726,m: 0.6791425420457678 std: 0.06444720154397576,m: 0.6815789473684211 std: 0.05526315789473685
4,"DecisionTreeClassifier(max_depth=3, min_sample...","max Depth: 3, min Samples: 2",0.678463,0.633221,0.619737,m: 0.6784633998790078 std: 0.023290986085904408,m: 0.633220818815331 std: 0.026077961672473837,m: 0.6197368421052631 std: 0.024999999999999967
11,"DecisionTreeClassifier(max_depth=5, min_sample...","max Depth: 5, min Samples: 100",0.66092,0.33046,0.5,m: 0.6609195402298851 std: 0.005747126436781602,m: 0.33045977011494254 std: 0.002873563218390801,m: 0.5 std: 0.0
18,"DecisionTreeClassifier(max_depth=9, min_sample...","max Depth: 9, min Samples: 50",0.66092,0.33046,0.5,m: 0.6609195402298851 std: 0.005747126436781602,m: 0.33045977011494254 std: 0.002873563218390801,m: 0.5 std: 0.0
15,"DecisionTreeClassifier(max_depth=7, min_sample...","max Depth: 7, min Samples: 100",0.66092,0.33046,0.5,m: 0.6609195402298851 std: 0.005747126436781602,m: 0.33045977011494254 std: 0.002873563218390801,m: 0.5 std: 0.0


classifier        DecisionTreeClassifier(max_depth=1, min_sample...
arguments                              max Depth: 1, min Samples: 2
mean_accuracy                                               0.69631
mean_precision                                             0.657906
mean_recall                                                0.652632
accuracy             m: 0.6963097398669087 std: 0.07562008469449483
precision            m: 0.6579059829059829 std: 0.08568376068376071
recall               m: 0.6526315789473685 std: 0.08421052631578946
Name: 0, dtype: object

### SVM - Congressional Vote

In [12]:
svm_results = calculate_svm(votingDataLearn[votingDataLearn.columns[2:17]],
                            votingDataLearn[votingDataLearn.columns[1]])
overall_results_vote.extend(svm_results)

print_results(svm_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
15,"SVC(C=301, gamma=0.001, kernel='sigmoid')",Kernel: sigmoid,0.669994,0.627457,0.606579,m: 0.6699939503932244 std: 0.049304295220810646,m: 0.627457264957265 std: 0.05523504273504276,m: 0.6065789473684211 std: 0.03815789473684211
1,"SVC(C=101, gamma=0.001, kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
7,"SVC(C=301, kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
11,"SVC(C=301, gamma='auto', kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
10,"SVC(C=201, gamma='auto', kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
9,"SVC(C=101, gamma='auto', kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
22,"SVC(C=201, gamma='auto', kernel='sigmoid')",Kernel: sigmoid,0.669843,0.636476,0.6375,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6364760843021713 std: 0.025917078090991197,m: 0.6375000000000001 std: 0.020394736842105243
6,"SVC(C=201, kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
5,"SVC(C=101, kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097
3,"SVC(C=301, gamma=0.001, kernel='linear')",Kernel: linear,0.669843,0.634183,0.624342,m: 0.6698427102238355 std: 0.031911675741076784,m: 0.6341831575906398 std: 0.02362415137945978,m: 0.6243421052631579 std: 0.007236842105263097


classifier              SVC(C=301, gamma=0.001, kernel='sigmoid')
arguments                                         Kernel: sigmoid
mean_accuracy                                            0.669994
mean_precision                                           0.627457
mean_recall                                              0.606579
accuracy          m: 0.6699939503932244 std: 0.049304295220810646
precision           m: 0.627457264957265 std: 0.05523504273504276
recall             m: 0.6065789473684211 std: 0.03815789473684211
Name: 15, dtype: object

### Overall Results for Congressional Vote

In [13]:
print_results(overall_results_vote, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
18,CategoricalNB(alpha=9.1),Alpha = 9.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
17,CategoricalNB(alpha=8.1),Alpha = 8.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
16,CategoricalNB(alpha=7.1),Alpha = 7.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
15,CategoricalNB(alpha=6.1),Alpha = 6.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
14,CategoricalNB(alpha=5.1),Alpha = 5.1,0.696461,0.678174,0.682895,m: 0.6964609800362976 std: 0.09301270417422869,m: 0.6781739294472741 std: 0.08498396528956798,m: 0.6828947368421052 std: 0.08026315789473681
...,...,...,...,...,...,...,...,...
80,"SVC(C=1, kernel='poly')",Kernel: poly,0.609044,0.561792,0.552632,m: 0.6090441621294616 std: 0.04007864488808227,m: 0.5617918313570488 std: 0.013306982872200224,m: 0.5526315789473684 std: 0.0
56,"SVC(C=1, kernel='sigmoid')",Kernel: sigmoid,0.609044,0.558293,0.550658,m: 0.6090441621294616 std: 0.04007864488808227,m: 0.5582931256318353 std: 0.021754664093373732,m: 0.5506578947368421 std: 0.03750000000000003
74,"SVC(C=201, gamma='auto')",Kernel: rbf,0.591349,0.530637,0.532895,m: 0.5913490623109497 std: 0.005142165759225659,m: 0.5306372549019608 std: 0.030637254901960786,m: 0.5328947368421053 std: 0.032894736842105254
73,"SVC(C=101, gamma='auto')",Kernel: rbf,0.591349,0.530637,0.532895,m: 0.5913490623109497 std: 0.005142165759225659,m: 0.5306372549019608 std: 0.030637254901960786,m: 0.5328947368421053 std: 0.032894736842105254


classifier                              CategoricalNB(alpha=9.1)
arguments                                            Alpha = 9.1
mean_accuracy                                           0.696461
mean_precision                                          0.678174
mean_recall                                             0.682895
accuracy          m: 0.6964609800362976 std: 0.09301270417422869
precision         m: 0.6781739294472741 std: 0.08498396528956798
recall            m: 0.6828947368421052 std: 0.08026315789473681
Name: 18, dtype: object

## Amazon

In [14]:
from sklearn import preprocessing
#Read Data
amazonDataLearn = pd.read_csv("data/amazon/amazon_review_ID.shuf.lrn.csv")
amazonDataSolutionExample = pd.read_csv("data/amazon/amazon_review_ID.shuf.sol.ex.csv")
amazonDataTest = pd.read_csv("data/amazon/amazon_review_ID.shuf.tes.csv")
display("Original Data", amazonDataLearn)

#Recode values
#For One Hot Encoding of Class
#amazonDataLearn = pd.concat([amazonDataLearn, pd.get_dummies(amazonDataLearn["Class"], prefix='author_',drop_first=False)], axis=1)
#amazonDataLearn.drop(['Class'],axis=1, inplace=True)
#names_target = amazonDataLearn.loc[:, amazonDataLearn.columns.str.startswith('author_')]
#amazonDataLearn[names_target.columns] = amazonDataLearn[names_target.columns].apply(lambda x: x.astype('category'))

# For Label Encoding
le = preprocessing.LabelEncoder()
le.fit(amazonDataLearn['Class'])
amazonDataLearn['Class'] = le.transform(amazonDataLearn['Class'])
amazonDataLearn['Class'] = amazonDataLearn['Class'].astype('category')

names_data = amazonDataLearn.loc[:, amazonDataLearn.columns.str.startswith('V')]
#amazonDataLearn[0:10000] = amazonDataLearn[0:10000].apply(lambda x: x.astype('int'))

display("Recoded Data", amazonDataLearn)

amazon_data = amazonDataLearn[names_data.columns]
amazon_target = amazonDataLearn["Class"]

display("Data: ", amazon_data)
display("Target: ", amazon_target)

'Original Data'

Unnamed: 0,ID,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V9992,V9993,V9994,V9995,V9996,V9997,V9998,V9999,V10000,Class
0,0,9,5,5,9,7,0,8,7,1,...,0,1,0,1,0,0,0,0,2,Power
1,1,11,9,15,15,5,11,10,1,5,...,0,0,0,0,0,0,0,0,0,Goonan
2,2,11,10,13,12,6,5,0,3,1,...,0,0,0,0,0,0,0,1,0,Merritt
3,3,18,9,7,8,8,7,12,6,7,...,0,1,0,0,0,1,0,0,1,Goonan
4,4,11,7,10,11,4,5,1,8,4,...,0,0,0,0,0,1,0,0,3,Corn
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
745,745,5,5,8,2,8,0,5,1,2,...,1,0,0,0,0,0,0,0,0,Chachra
746,746,22,13,8,14,8,11,3,6,7,...,6,0,2,0,0,2,0,0,0,Morrison
747,747,10,3,5,5,7,1,14,2,6,...,0,0,4,1,0,0,2,0,0,Sherwin
748,748,9,13,8,5,11,9,9,3,3,...,0,0,0,1,0,0,0,0,0,Blankenship


'Recoded Data'

Unnamed: 0,ID,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V9992,V9993,V9994,V9995,V9996,V9997,V9998,V9999,V10000,Class
0,0,9,5,5,9,7,0,8,7,1,...,0,1,0,1,0,0,0,0,2,40
1,1,11,9,15,15,5,11,10,1,5,...,0,0,0,0,0,0,0,0,0,19
2,2,11,10,13,12,6,5,0,3,1,...,0,0,0,0,0,0,0,1,0,33
3,3,18,9,7,8,8,7,12,6,7,...,0,1,0,0,0,1,0,0,1,19
4,4,11,7,10,11,4,5,1,8,4,...,0,0,0,0,0,1,0,0,3,14
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
745,745,5,5,8,2,8,0,5,1,2,...,1,0,0,0,0,0,0,0,0,9
746,746,22,13,8,14,8,11,3,6,7,...,6,0,2,0,0,2,0,0,0,36
747,747,10,3,5,5,7,1,14,2,6,...,0,0,4,1,0,0,2,0,0,44
748,748,9,13,8,5,11,9,9,3,3,...,0,0,0,1,0,0,0,0,0,3


'Data: '

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V9991,V9992,V9993,V9994,V9995,V9996,V9997,V9998,V9999,V10000
0,9,5,5,9,7,0,8,7,1,5,...,0,0,1,0,1,0,0,0,0,2
1,11,9,15,15,5,11,10,1,5,7,...,0,0,0,0,0,0,0,0,0,0
2,11,10,13,12,6,5,0,3,1,1,...,1,0,0,0,0,0,0,0,1,0
3,18,9,7,8,8,7,12,6,7,1,...,0,0,1,0,0,0,1,0,0,1
4,11,7,10,11,4,5,1,8,4,4,...,0,0,0,0,0,0,1,0,0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
745,5,5,8,2,8,0,5,1,2,3,...,0,1,0,0,0,0,0,0,0,0
746,22,13,8,14,8,11,3,6,7,6,...,0,6,0,2,0,0,2,0,0,0
747,10,3,5,5,7,1,14,2,6,1,...,0,0,0,4,1,0,0,2,0,0
748,9,13,8,5,11,9,9,3,3,6,...,0,0,0,0,1,0,0,0,0,0


'Target: '

0      40
1      19
2      33
3      19
4      14
       ..
745     9
746    36
747    44
748     3
749    16
Name: Class, Length: 750, dtype: category
Categories (50, int64): [0, 1, 2, 3, ..., 46, 47, 48, 49]

### k-NN Calculation - Amazon

In [15]:
knn_results_amazon = calculate_knn(amazon_data,
                                   amazon_target)
overall_results_amazon.extend(knn_results)

print_results(knn_results_amazon, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
0,KNeighborsClassifier(n_neighbors=1),N = 1,0.189333,0.221973,0.186298,m: 0.18933333333333335 std: 0.007999999999999993,m: 0.22197284432114153 std: 0.029111542224545345,m: 0.18629761904761905 std: 0.006607142857142867
8,KNeighborsClassifier(n_neighbors=9),N = 9,0.181333,0.257977,0.177044,m: 0.18133333333333335 std: 0.016,m: 0.25797659554105573 std: 0.023272033195189043,m: 0.17704365079365078 std: 0.01236904761904764
7,KNeighborsClassifier(n_neighbors=8),N = 8,0.165333,0.224061,0.161373,m: 0.16533333333333333 std: 0.010666666666666658,m: 0.22406133544227183 std: 0.02753761882072911,m: 0.16137301587301586 std: 0.010023809523809518
6,KNeighborsClassifier(n_neighbors=7),N = 7,0.16,0.215498,0.154357,m: 0.15999999999999998 std: 0.0026666666666666644,m: 0.21549759615252975 std: 0.03237130292225168,m: 0.15435714285714286 std: 0.003880952380952374
1,KNeighborsClassifier(n_neighbors=2),N = 2,0.158667,0.186586,0.153024,m: 0.15866666666666668 std: 0.0013333333333333391,m: 0.18658579634160127 std: 0.016517866335590783,m: 0.1530238095238095 std: 0.001944444444444443
2,KNeighborsClassifier(n_neighbors=3),N = 3,0.158667,0.20649,0.153956,m: 0.15866666666666668 std: 0.009333333333333332,m: 0.20649049339482103 std: 0.038039835924981974,m: 0.1539563492063492 std: 0.007710317460317459
3,KNeighborsClassifier(n_neighbors=4),N = 4,0.157333,0.21934,0.15344,m: 0.15733333333333333 std: 0.0026666666666666644,m: 0.21934005405400625 std: 0.01291024615980206,m: 0.15344047619047618 std: 0.002861111111111106
4,KNeighborsClassifier(),N = 5,0.152,0.214569,0.147246,m: 0.152 std: 0.005333333333333329,m: 0.2145693817754817 std: 0.023454336916958585,m: 0.14724603174603174 std: 0.006063492063492076
5,KNeighborsClassifier(n_neighbors=6),N = 6,0.149333,0.203144,0.143909,m: 0.14933333333333332 std: 0.0026666666666666644,m: 0.20314402311534732 std: 0.015384498598469881,m: 0.14390873015873012 std: 0.0014563492063491973


classifier                     KNeighborsClassifier(n_neighbors=1)
arguments                                                    N = 1
mean_accuracy                                             0.189333
mean_precision                                            0.221973
mean_recall                                               0.186298
accuracy          m: 0.18933333333333335 std: 0.007999999999999993
precision         m: 0.22197284432114153 std: 0.029111542224545345
recall            m: 0.18629761904761905 std: 0.006607142857142867
Name: 0, dtype: object

### Perceptron - Amazon

In [16]:
perceptron_results = calculate_perceptron(amazon_data,
                                          amazon_target)
overall_results_amazon.extend(perceptron_results)

print_results(perceptron_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
0,Perceptron(),No additional args.,0.28,0.400806,0.27125,m: 0.28 std: 0.0026666666666666783,m: 0.4008061314854301 std: 0.05513438765878295,m: 0.27125 std: 0.0009484126984126984


classifier                                          Perceptron()
arguments                                    No additional args.
mean_accuracy                                               0.28
mean_precision                                          0.400806
mean_recall                                              0.27125
accuracy                      m: 0.28 std: 0.0026666666666666783
precision         m: 0.4008061314854301 std: 0.05513438765878295
recall                     m: 0.27125 std: 0.0009484126984126984
Name: 0, dtype: object

### Decision Tree - Amazon

In [17]:
decision_tree_results = calculate_decision_tree(amazon_data,
                                                amazon_target)
overall_results_amazon.extend(decision_tree_results)

print_results(decision_tree_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
16,"DecisionTreeClassifier(max_depth=9, min_sample...","max Depth: 9, min Samples: 2",0.182667,0.173033,0.171095,m: 0.18266666666666664 std: 0.020000000000000004,m: 0.17303269855486603 std: 0.0013573241208709175,m: 0.17109523809523808 std: 0.018515873015873013
12,"DecisionTreeClassifier(max_depth=7, min_sample...","max Depth: 7, min Samples: 2",0.157333,0.123086,0.14175,m: 0.15733333333333333 std: 0.013333333333333336,m: 0.12308596857925298 std: 0.008915851256961746,m: 0.14175 std: 0.012027777777777776
13,"DecisionTreeClassifier(max_depth=7, min_sample...","max Depth: 7, min Samples: 20",0.152,0.046192,0.144262,m: 0.15200000000000002 std: 0.0026666666666666644,m: 0.04619178963450134 std: 0.0014119608665442637,m: 0.14426190476190476 std: 0.009222222222222229
17,"DecisionTreeClassifier(max_depth=9, min_sample...","max Depth: 9, min Samples: 20",0.152,0.046189,0.144262,m: 0.15200000000000002 std: 0.0026666666666666644,m: 0.04618850998565943 std: 0.0014053971454527706,m: 0.14426190476190476 std: 0.009222222222222229
9,"DecisionTreeClassifier(max_depth=5, min_sample...","max Depth: 5, min Samples: 20",0.137333,0.035608,0.130833,m: 0.13733333333333334 std: 0.0040000000000000036,m: 0.035607661229260595 std: 0.000755740552218...,m: 0.13083333333333336 std: 0.003761904761904758
8,"DecisionTreeClassifier(max_depth=5, min_sample...","max Depth: 5, min Samples: 2",0.130667,0.098178,0.117639,m: 0.13066666666666665 std: 0.0026666666666666644,m: 0.09817781104457751 std: 0.0023611111111111055,m: 0.11763888888888889 std: 0.0001388888888888...
5,"DecisionTreeClassifier(max_depth=3, min_sample...","max Depth: 3, min Samples: 20",0.1,0.019961,0.09002,m: 0.1 std: 0.012000000000000004,m: 0.01996109625485688 std: 0.004868303104232405,m: 0.09001984126984128 std: 0.015130952380952384
10,"DecisionTreeClassifier(max_depth=5, min_sample...","max Depth: 5, min Samples: 50",0.094667,0.01087,0.083333,m: 0.09466666666666668 std: 0.0013333333333333322,m: 0.01087032257422267 std: 0.0009068813991872909,m: 0.08333333333333333 std: 1.3877787807814457...
18,"DecisionTreeClassifier(max_depth=9, min_sample...","max Depth: 9, min Samples: 50",0.094667,0.01087,0.083333,m: 0.09466666666666668 std: 0.0013333333333333322,m: 0.01087032257422267 std: 0.0009068813991872909,m: 0.08333333333333333 std: 1.3877787807814457...
14,"DecisionTreeClassifier(max_depth=7, min_sample...","max Depth: 7, min Samples: 50",0.094667,0.01087,0.083333,m: 0.09466666666666668 std: 0.0013333333333333322,m: 0.01087032257422267 std: 0.0009068813991872909,m: 0.08333333333333333 std: 1.3877787807814457...


classifier        DecisionTreeClassifier(max_depth=9, min_sample...
arguments                              max Depth: 9, min Samples: 2
mean_accuracy                                              0.182667
mean_precision                                             0.173033
mean_recall                                                0.171095
accuracy           m: 0.18266666666666664 std: 0.020000000000000004
precision         m: 0.17303269855486603 std: 0.0013573241208709175
recall             m: 0.17109523809523808 std: 0.018515873015873013
Name: 16, dtype: object

### SVM - Amazon

In [18]:
svm_results = calculate_svm(amazon_data,
                            amazon_target)
overall_results_amazon.extend(svm_results)

print_results(svm_results, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
29,SVC(C=101),Kernel: rbf,0.361333,0.394952,0.346849,m: 0.36133333333333334 std: 0.0040000000000000036,m: 0.3949521508034598 std: 0.02524727704767077,m: 0.34684920634920635 std: 0.004063492063492019
30,SVC(C=201),Kernel: rbf,0.361333,0.394952,0.346849,m: 0.36133333333333334 std: 0.0040000000000000036,m: 0.3949521508034598 std: 0.02524727704767077,m: 0.34684920634920635 std: 0.004063492063492019
31,SVC(C=301),Kernel: rbf,0.361333,0.394952,0.346849,m: 0.36133333333333334 std: 0.0040000000000000036,m: 0.3949521508034598 std: 0.02524727704767077,m: 0.34684920634920635 std: 0.004063492063492019
8,"SVC(C=1, gamma='auto', kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
1,"SVC(C=101, gamma=0.001, kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
10,"SVC(C=201, gamma='auto', kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
9,"SVC(C=101, gamma='auto', kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
0,"SVC(C=1, gamma=0.001, kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
7,"SVC(C=301, kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311
6,"SVC(C=201, kernel='linear')",Kernel: linear,0.36,0.40497,0.350056,m: 0.36 std: 0.005333333333333329,m: 0.4049701959976727 std: 0.017853924326447557,m: 0.35005555555555556 std: 0.00567460317460311


classifier                                               SVC(C=101)
arguments                                               Kernel: rbf
mean_accuracy                                              0.361333
mean_precision                                             0.394952
mean_recall                                                0.346849
accuracy          m: 0.36133333333333334 std: 0.0040000000000000036
precision            m: 0.3949521508034598 std: 0.02524727704767077
recall             m: 0.34684920634920635 std: 0.004063492063492019
Name: 29, dtype: object

### Overall Results for Amazon

In [19]:
print_results(overall_results_amazon, "mean_accuracy")

'Results'

Unnamed: 0,classifier,arguments,mean_accuracy,mean_precision,mean_recall,accuracy,precision,recall
7,KNeighborsClassifier(n_neighbors=8),N = 8,0.670145,0.668046,0.586842,m: 0.6701451905626135 std: 0.06669691470054445,m: 0.6680459633246011 std: 0.11136580138128127,m: 0.5868421052631579 std: 0.03157894736842104
1,KNeighborsClassifier(n_neighbors=2),N = 2,0.669691,0.636876,0.555263,m: 0.6696914700544465 std: 0.014519056261343033,m: 0.6368760064412238 std: 0.0390499194847021,m: 0.5552631578947369 std: 0.01578947368421052
6,KNeighborsClassifier(n_neighbors=7),N = 7,0.661525,0.647057,0.592763,m: 0.661524500907441 std: 0.07531760435571688,m: 0.6470573077715935 std: 0.09528963100391669,m: 0.5927631578947368 std: 0.03881578947368419
8,KNeighborsClassifier(n_neighbors=9),N = 9,0.653055,0.662153,0.586184,m: 0.6530550514216575 std: 0.1013309134906231,m: 0.6621533613445378 std: 0.1364180672268907,m: 0.5861842105263158 std: 0.058552631578947356
2,KNeighborsClassifier(n_neighbors=3),N = 3,0.652601,0.602840,0.580921,m: 0.6526013309134906 std: 0.049153055051421646,m: 0.602839550454328 std: 0.05549378287900525,m: 0.5809210526315789 std: 0.03750000000000003
...,...,...,...,...,...,...,...,...
46,"SVC(C=1, kernel='sigmoid')",Kernel: sigmoid,0.024000,0.003471,0.019000,m: 0.024 std: 0.008,m: 0.003471282669683687 std: 0.002594089687227...,m: 0.019 std: 0.006333333333333333
53,"SVC(C=301, gamma='auto', kernel='sigmoid')",Kernel: sigmoid,0.012000,0.003046,0.009556,m: 0.012 std: 0.001333333333333334,m: 0.0030464919614317616 std: 0.00251030853654...,m: 0.009555555555555555 std: 0.000888888888888...
52,"SVC(C=201, gamma='auto', kernel='sigmoid')",Kernel: sigmoid,0.012000,0.003990,0.009556,m: 0.012 std: 0.001333333333333334,m: 0.003990187892690115 std: 0.003454004467803...,m: 0.009555555555555555 std: 0.000888888888888...
51,"SVC(C=101, gamma='auto', kernel='sigmoid')",Kernel: sigmoid,0.010667,0.001803,0.008444,m: 0.010666666666666668 std: 0.002666666666666667,m: 0.0018032835494248695 std: 0.00127234121468...,m: 0.008444444444444444 std: 0.002


classifier                   KNeighborsClassifier(n_neighbors=8)
arguments                                                  N = 8
mean_accuracy                                           0.670145
mean_precision                                          0.668046
mean_recall                                             0.586842
accuracy          m: 0.6701451905626135 std: 0.06669691470054445
precision         m: 0.6680459633246011 std: 0.11136580138128127
recall            m: 0.5868421052631579 std: 0.03157894736842104
Name: 7, dtype: object