# Comparison of Classifiers 

In this notebook, we use the `statsmodels'` implementation of `McNemar` test. This statistical tool serves to assess whether two classifiers perform "equally" well over a test set.

First, let's load the Breast Cancer Dataset. We will construct two RandomForest with 50 and 51 estimators with the hope that there is no real difference in their performance.

In [14]:
import sklearn
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from sklearn.datasets import load_iris, load_breast_cancer, load_digits
from sklearn.model_selection import train_test_split

dataset = load_breast_cancer()

data = dataset.data
target = dataset.target

Split the dataset into train and test.

Build the classifiers and print their accuracy scores over the test set. 

In [15]:
from sklearn.ensemble import RandomForestClassifier

X_train, X_test, Y_train, Y_test = train_test_split(data, target, test_size=0.2)

clf_A = RandomForestClassifier(n_estimators=50)
clf_B = RandomForestClassifier(n_estimators=51)

clf_A.fit(X_train, Y_train);
clf_B.fit(X_train, Y_train);

print clf_A.score(X_test, Y_test)
print clf_B.score(X_test, Y_test)

0.956140350877
0.947368421053


The difference here is not huge, so we expect that the McNemar test doesn't reject the null hypothesis of the two classifiers performing equally well (i.e. having the same error rate)

In [16]:
from statsmodels.sandbox.stats.runs import mcnemar

stats, pval = mcnemar(clf_A.predict(X_test), clf_B.predict(X_test), exact=True)

print "P-Value : {}".format(pval)

P-Value : 1.0


The p-value is well enough above 0.05, so we don't reject $H_0$

# Testing for different classifiers

In this case, we expect to see a difference between a RandomForest and a Naive Bayes classifier.

In [20]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import BernoulliNB

X_train, X_test, Y_train, Y_test = train_test_split(data, target, test_size=0.2)

clf_A = RandomForestClassifier(n_estimators=50)
clf_B = BernoulliNB()

clf_A.fit(X_train, Y_train);
clf_B.fit(X_train, Y_train);

print clf_A.score(X_test, Y_test)
print clf_B.score(X_test, Y_test)

0.956140350877
0.649122807018


In [21]:
from statsmodels.sandbox.stats.runs import mcnemar

stats, pval = mcnemar(clf_A.predict(X_test), clf_B.predict(X_test), exact=True)

print "P-Value : {}".format(pval)

P-Value : 2.27373675443e-13


In this case, the p-value shows that there is evidence of the two classifiers not performing identically. 