### Pitchfork Content Sandbox

This project uses a McNemar statistic to compare two classification algorithms, a naive bayes classifier and a support vector machine, on a binomial document classification task using a common corpus of music reviews.

In [2]:
# import libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, TfidfTransformer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt
import psycopg2
from statsmodels.stats.contingency_tables import mcnemar

np.set_printoptions(threshold=np.inf)
pd.option_context('display.max_colwidth', -1)
pd.options.display.max_rows = 1000
pd.options.display.max_seq_items = 5000

#### Import, explore and initially pre-process data

In [None]:
# create connection to postgres database
conn = psycopg2.connect("dbname=pitchfork_reviews")
cur = conn.cursor()

# query database
cur.execute("""
SELECT genres.genre, content.reviewid, content.content 
FROM content
INNER JOIN genres on content.reviewid = genres.reviewid;
""")

# cast to dataframe
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]

df.head(5), df.info()

# drop ~20K rows where nulls in genre columns
df = df.dropna(how='any')
df.info()

# create new column that collapses 8 non-rock genres into a single 'not_rock' category
df_2 = df['genre'].replace(['electronic', 'experimental', 'folk/country', 'global', 'jazz',
        'metal', 'pop/r&b', 'rap'], 'not_rock')

df['genre_dichot'] = df_2

df['genre_dichot'].value_counts()

# separate datasets into feature values and feature labels, respectively
data = df['content'].astype(str)
data.head(5)

df_genre = pd.DataFrame(df['genre_dichot'])
df_genre.info()

feature_names = df['genre_dichot'].astype(str)
feature_names[:5]

# converts label strings into numeric values, 0 and 1
label_encoder = LabelEncoder()
feature_names_arr = label_encoder.fit_transform(feature_names)
feature_names_arr.shape

np.vstack((feature_names_arr[:10], feature_names[:10]))

# two classes
label_encoder.classes_, len(label_encoder.classes_)

#### Partition Data

In [29]:
# split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(data, feature_names_arr, test_size=0.30, random_state=3)

len(X_train), len(X_test), len(y_train), len(y_test)

#### Vectorize data
This section turns a collection of Pitchfork music reviews into numerical feature vectors, including tokenization, counting and normalization. Bag of Words-tf-idf representation where reviews are described by word occurrences while completely ignoring the relative position information of the words in the document.

In [34]:
# converts words into vocabulary of 1000 cleaned word "tokens" represented in a sparse matrix 
#TODO (Lee) - count vect vs tfidf, ? for every document #TODO (Lee) - count vect vs tfidf
count_vect = CountVectorizer(max_features=1000, stop_words='english')

X_train_counts = count_vect.fit_transform(X_train)

X_train_counts.shape

(14223, 1000)

In [88]:
# count_vect.vocabulary_
# after fitting, the vectorizer, "count_vect" has built a dictionary of feature indices
# The index value of a word in the vocabulary is linked to its frequency in the whole training corpus.

TypeError: unhashable type: 'slice'

#### Convert to term frequency inverse document frequency (TF-IDF) using `TfidfTransformer`

In [45]:
# computes TF-IDF using TfidfTransformer
# TF-IDF downscale weights for words that occur in many documents in the corpus and are 
# therefore less informative than those that occur only in a smaller portion of the corpus.
tf_transformer = TfidfTransformer(use_idf=True).fit(X_train_counts) # fits/learns idf vector (global term weights)

tf_transformer

X_train_tfidf = tf_transformer.transform(X_train_counts) # transform count matrix to a tf-idf representation

X_train_tfidf.shape, len(y_train)

### Train naive bayes classifier to predict genres on test music reviews

In [49]:
# train Naive Bayes Classifier on training features (X_train_tfidf) and training targets (y_train)

# model = MultinomialNB()
# model.fit(X_train_tf, y_train)

In [50]:
# vectorizes X_test set, similar to above for train set EXCEPT call transform, NOT fit_transform, since fit on train set
# since they have already been fit to the training set:

# X_test_counts = count_vect.transform(X_test)

In [51]:
# X_test_counts

In [52]:
# X_test_tfidf = tf_transformer.transform(X_test_counts)

In [53]:
tf_vect = TfidfVectorizer()
tf_vect.fit(X_train)
X_train_tf = tf_vect.transform(X_train)

In [54]:
# TODO (Lee) - note here the X_train_tf convention that I think was from a Miles implementation
model = MultinomialNB()
model.fit(X_train_tf, y_train)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [55]:
X_test_tfidf = tf_vect.transform(X_test)

In [56]:
# predicted = model.predict(X_test_tfidf)
# X_new_counts = count_vect.transform(X_test)
# X_new_tfidf = TfidfTransformer.transform(X_test)

In [57]:
# when calling len() on sparse matrix - TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]
# inspect shapes - .shape[0] is for sparse matrices, and getnnz gets the count of explicitly-stored values (nonzeros)
X_test_tfidf.shape[0], X_test_tfidf.getnnz() 

(6096, 2261895)

In [58]:
# inspect shapes
X_test.shape, X_test_tfidf.shape[0]

((6096,), 6096)

In [59]:
preds = model.predict(X_test_tfidf)
probas = model.predict_proba(X_test_tfidf)

In [60]:
from collections import Counter
Counter(preds)

Counter({0: 4150, 1: 1946})

In [61]:
Counter(y_test)

Counter({0: 3282, 1: 2814})

In [62]:
# (TODO) Lee - this function not functioning
Counter(probas)

TypeError: unhashable type: 'numpy.ndarray'

In [63]:
# command to look up
np.vstack((y_test[:20], preds[:20]))

array([[0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0]])

In [64]:
for doc, category in zip(X_test[:20], preds[:20]):
    print('%r => %s' % (doc[:20], category))

'The Legendary Pink D' => 0
'Can you resist a gir' => 0
'As Generationals,\xa0Ne' => 0
'Few electronic artis' => 0
'"We tried hard to so' => 1
'Nothing about Ezra F' => 0
"Childhood isn't kids" => 1
'One of the quietly b' => 0
'Pairing R&B chops wi' => 0
"If there's one compl" => 0
'Who will finally put' => 0
'Like his contemporar' => 0
'Earlier this year, t' => 0
'For those who think ' => 1
"Last year's self-tit" => 0
'"Headphones on. The ' => 0
"It's probably asking" => 1
'In their half-decade' => 1
'The Society of Rocke' => 1
'As a co-founder of B' => 0


#### Evaluation Metrics for Naive Bayes Classifier

In [65]:
np.mean(preds == y_test)

0.6981627296587927

In [66]:
# text_clf = Pipeline([
#    ('vect', CountVectorizer()),
#    ('tfidf', TfidfTransformer()),
#    ('clf', MultinomialNB()),
# ])

In [67]:
# twenty_test = fetch_20newsgroups(subset='test',
#    categories=categories, shuffle=True, random_state=42)
# docs_test = twenty_test.data
# predicted = text_clf.predict(docs_test)
# np.mean(predicted == twenty_test.target)            


#### linear support vector machine (SVM)

In [68]:
# text_clf = Pipeline([
#   ('vect', CountVectorizer()),
#   ('tfidf', TfidfTransformer()),
#   ('clf', SGDClassifier(loss='hinge', penalty='l2',
#                         alpha=1e-3, random_state=42,
#                         max_iter=5, tol=None)),
# ])

# text_clf.fit(X_train, y_train)  

# predicted_svm = text_clf.predict(X_test)
# np.mean(predicted_svm == y_test)          

In [69]:
# tokenize and vectorize per sklearn workflow
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier(loss='hinge', penalty='l2',
                          alpha=1e-3, random_state=42,
                          max_iter=5, tol=None)),
])

In [70]:
model_svm = SGDClassifier(loss='hinge', penalty='l2',
                          alpha=1e-3, random_state=42,
                          max_iter=5, tol=None)

model_svm.fit(X_train_tf, y_train)

preds_svm = model_svm.predict(X_test_tfidf)
# probas_svm = model_svm.predict_proba(X_test_tfidf) # TODO (Lee) - issue with probas

In [71]:
np.mean(preds_svm == y_test)

0.7129265091863517

#### McNemar test

Instance,	Classifier1 Correct,	Classifier2 Correct
1			Yes						No
2			No						No
3			No						Yes
4			No						No
5			Yes						Yes
6			Yes						Yes
7			Yes						Yes
8			No						No
9			Yes						No
10			Yes						Yes

In [73]:
len(preds), len(y_test)

(6096, 6096)

In [74]:
cont_table = np.vstack((y_test, preds, preds_svm)).T
cont_table[:5]

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 1],
       [0, 0, 0],
       [1, 1, 1]])

In [75]:
cont_table[0]

array([0, 0, 0])

In [76]:
cont_table[0][0], cont_table[0][1], cont_table[0][2]

(0, 0, 0)

In [77]:
cont_table[0,0], cont_table[0,1], cont_table[0,2]

(0, 0, 0)

- both models predict correctly
- both models predict incorrectly 
- nb predicts correctly & svm predicts incorrectly
- nb predicts incorrectly & svm predicts correctly

In [79]:
# el idx 0 = both correct, el idx 1 = both incorrect
# el idx 2 = nbcorrect, svm incorrect, el idx 3 = svm correct, nb incorrect
def process_row(row):
    if row[0] == row[1] and row[0] == row[2]: # 
        result = [1,0,0,0]
    
    elif row[0] == row[1]:
        result = [0,1,0,0]
        
    elif row[0] == row[2]:
        result = [0,0,1,0]
        
    else:
        result = [0,0,0,1]
    
    return np.array(result)

In [80]:
def process_ndarray(array):
    result = sum([process_row(row) for row in array])
    return np.array([[result[0], result[2]], [result[1], result[3]]])

In [81]:
contingency_table = process_ndarray(cont_table)

In [82]:
# calculate mcnemar test
result = mcnemar(contingency_table, exact=True)

In [83]:
# summarize the finding
print('statistic=%.3f, p-value=%.3f' % (result.statistic, result.pvalue))

statistic=615.000, p-value=0.014


In [181]:
# interpret the p-value
alpha = 0.05

In [182]:
if result.pvalue > alpha:
	print('Same proportions of errors (fail to reject H0)')
else:
	print('Different proportions of errors (reject H0)')

Different proportions of errors (reject H0)


In [None]:
statistic = (Yes/No - No/Yes)^2 / (Yes/No + No/Yes)

In [165]:
(57 - 664)**2 / (57 + 664)

511.0249653259362

Where Yes/No is the count of test instances that Classifier1 got correct and Classifier2 got incorrect, and No/Yes is the count of test instances that Classifier1 got incorrect and Classifier2 got correct.

In [None]:
np.where()

In [None]:
(np.where(a>0.5,1,0)) 

In [None]:
for i in cont_table:
    if 

In [156]:
cont_table[cont_table>3]

array([8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 8, 7, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 7, 8, 8, 8, 8, 6, 8, 7, 8, 8, 5, 8, 8, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 8, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
       8, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 8, 7, 7, 8, 7, 7, 8, 7, 8, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 6, 8, 7, 8, 8, 8, 7, 8, 7, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8,
       8, 8, 8, 8, 8, 5, 8, 8, 5, 8, 7, 8, 8, 8, 8, 4, 8, 7, 8, 8, 8, 8,
       8, 8, 8, 4, 8, 7, 8, 8, 8, 8, 8, 8, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 5, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 4, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
       8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,

In [198]:
cont_table.shape

(6096, 3)

In [216]:
cont_data = {'label': list(cont_table[:, 0]), 
        'nb_pred': list(cont_table[:,1]), 
        'svm_pred': list(cont_table[:, 2])}

In [143]:
df_conts = pd.DataFrame(cont_table).astype(int)

In [144]:
df_conts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6096 entries, 0 to 6095
Data columns (total 3 columns):
0    6096 non-null int64
1    6096 non-null int64
2    6096 non-null int64
dtypes: int64(3)
memory usage: 143.0 KB


In [145]:
df_conts
df_conts.columns = ['label', 'nb_pred', 'svm_pred']
df_conts.head(5)

Unnamed: 0,label,nb_pred,svm_pred
0,0,8,8
1,6,8,8
2,0,8,8
3,0,8,0
4,8,8,8


In [146]:
df_conts['nb_bool'] = np.where(df_conts['label'] == df_conts['nb_pred'], 'True', 'False')

In [147]:
df_conts['svm_bool'] = np.where(df_conts['label'] == df_conts['svm_pred'], 'True', 'False')
df_conts.head()

Unnamed: 0,label,nb_pred,svm_pred,nb_bool,svm_bool
0,0,8,8,False,False
1,6,8,8,False,False
2,0,8,8,False,False
3,0,8,0,False,True
4,8,8,8,True,True


In [148]:
df_conts.rename(columns={"nb_pred_bool": "nb_bool"})

Unnamed: 0,label,nb_pred,svm_pred,nb_bool,svm_bool
0,0,8,8,False,False
1,6,8,8,False,False
2,0,8,8,False,False
3,0,8,0,False,True
4,8,8,8,True,True
5,8,8,8,True,True
6,8,8,8,True,True
7,7,8,7,False,True
8,8,8,8,True,True
9,3,8,8,False,False


In [149]:
df_conts['nb_bool'] = df_conts['nb_bool'].astype('bool')

In [150]:
df_conts['svm_bool'] = df_conts['svm_bool'].astype('bool')

In [151]:
# df_conts['yes_yes'] = df_conts[['nb_bool','svm_bool']].sum(axis=1) == 2
df_conts

Unnamed: 0,label,nb_pred,svm_pred,nb_bool,svm_bool
0,0,8,8,True,True
1,6,8,8,True,True
2,0,8,8,True,True
3,0,8,0,True,True
4,8,8,8,True,True
5,8,8,8,True,True
6,8,8,8,True,True
7,7,8,7,True,True
8,8,8,8,True,True
9,3,8,8,True,True


In [137]:
df_conts.head(5)
df_conts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6096 entries, 0 to 6095
Data columns (total 5 columns):
label       6096 non-null int64
nb_pred     6096 non-null int64
svm_pred    6096 non-null int64
nb_bool     6096 non-null object
svm_bool    6096 non-null object
dtypes: int64(3), object(2)
memory usage: 238.2+ KB


In [None]:
df['Age'] = df['Age'].astype(str)

In [128]:
# df_conts['new'] = df_conts.sum(axis=1).where(df_conts['nb_bool'] == df_conts['svm_bool'])

Unnamed: 0,label,nb_pred,svm_pred,nb_bool,svm_bool,new
0,0,8,8,False,False,48.0
1,6,8,8,False,False,66.0
2,0,8,8,False,False,48.0
3,0,8,0,False,True,
4,8,8,8,True,True,72.0
5,8,8,8,True,True,72.0
6,8,8,8,True,True,72.0
7,7,8,7,False,True,
8,8,8,8,True,True,72.0
9,3,8,8,False,False,57.0


In [120]:
df_conts['true_true'] = np.where(df_conts['label'].bool(True) == df_conts['nb_pred'].bool(True), 'True', 'False')

TypeError: bool() takes 1 positional argument but 2 were given

In [217]:
cont_data

{'label': [0,
  6,
  0,
  0,
  8,
  8,
  8,
  7,
  8,
  3,
  8,
  0,
  7,
  2,
  6,
  3,
  5,
  0,
  8,
  0,
  0,
  0,
  0,
  7,
  1,
  8,
  8,
  8,
  0,
  1,
  8,
  0,
  8,
  8,
  0,
  8,
  7,
  0,
  8,
  8,
  7,
  7,
  7,
  8,
  0,
  8,
  0,
  8,
  1,
  0,
  8,
  0,
  8,
  8,
  6,
  8,
  7,
  2,
  8,
  8,
  4,
  8,
  8,
  5,
  5,
  0,
  0,
  4,
  8,
  0,
  1,
  4,
  0,
  0,
  8,
  7,
  8,
  8,
  8,
  0,
  8,
  8,
  0,
  8,
  0,
  1,
  8,
  8,
  7,
  8,
  0,
  0,
  0,
  6,
  8,
  0,
  8,
  5,
  0,
  4,
  8,
  1,
  8,
  8,
  1,
  8,
  4,
  6,
  8,
  0,
  1,
  8,
  0,
  0,
  2,
  8,
  6,
  8,
  8,
  8,
  8,
  6,
  0,
  8,
  8,
  8,
  8,
  0,
  8,
  0,
  0,
  0,
  7,
  8,
  8,
  7,
  0,
  8,
  8,
  3,
  8,
  7,
  8,
  8,
  8,
  6,
  1,
  7,
  3,
  1,
  1,
  0,
  8,
  7,
  8,
  8,
  8,
  2,
  7,
  7,
  0,
  6,
  1,
  6,
  7,
  6,
  8,
  0,
  2,
  8,
  8,
  6,
  7,
  8,
  8,
  2,
  1,
  8,
  8,
  8,
  8,
  8,
  2,
  7,
  7,
  8,
  8,
  5,
  8,
  8,
  6,
  0,
  0,
  1,
  0,
  2,
  6,
  0,
 

In [199]:
counter = 0
for i in cont_table:
    for j in i:
        if j == j == j:
            counter +=1
print(counter)
        

18288


In [207]:
df_label = pd.DataFrame(cont_table)
df_label.columns = ['label', 'nb_pred', 'svm_pred']
df_label.head(5)

Unnamed: 0,label,nb_pred,svm_pred
0,0,8,8
1,6,8,8
2,0,8,8
3,0,8,0
4,8,8,8


In [None]:
df_test['yes_no']

In [201]:
df_test = pd.DataFrame()

statistic = (Yes/No - No/Yes)^2 / (Yes/No + No/Yes)

statistic = (Yes/No - No/Yes)^2 / (Yes/No + No/Yes)
Where Yes/No is the count of test instances that Classifier1 got correct and Classifier2 got incorrect, and No/Yes is the count of test instances that Classifier1 got incorrect and Classifier2 got correct.

In [146]:
def yes_no(df):
    yes_no = []
    for i, j in df.iteritems:
        if df[0] == df[1] and i[0] != df[2]:
            yes_no.append()
        return yes_no
    for key, value in df.iteritems(): 
    print(key, value) 
    print() 

In [165]:
for idx, row in df_conts.iterrows():
    yes_no = []
    counter = 0
    if row['target'] == row['nb_pred']:
        counter +=1   

0
0
0
0
1
1
1
0
1
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
0
0
1
0
1
1
0
1
0
0
1
1
0
0
0
1
0
1
0
1
0
0
1
0
1
1
0
1
0
0
1
1
0
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
1
1
1
0
1
1
0
1
0
0
1
1
0
1
0
0
0
0
1
0
1
0
0
0
1
0
1
1
0
1
0
0
1
0
0
1
0
0
0
1
0
1
1
1
1
0
0
1
1
1
1
0
1
0
0
0
0
1
1
0
0
1
1
0
1
0
1
1
1
0
0
0
0
0
0
0
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
1
1
0
0
1
1
1
1
1
0
0
0
1
1
0
1
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
1
0
0
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
0
0
1
1
1
1
0
1
0
0
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
0
0
0
0
1
0
1
0
1
0
0
1
0
1
0
1
1
1
0
0
1
0
0
0
0
0
1
1
1
1
0
1
0
0
0
0
1
0
1
0
0
1
0
0
1
1
1
1
0
1
1
0
1
0
0
0
0
0
1
0
0
1
1
0
1
1
1
0
0
0
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
1
0
1
0
1
1
1
0
1
1
0
1
0
0
1
0
0
1
0
0
0
1
1
0
1
1
1
0
0
0
1
0
1
0
0
1
0
1
1
0
0
1
0
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
1
0
0
1
1
1
1
1
1
0
1
1
0
0
1
1
1
0
1
1
0
1
1
0
0
1
1
0
0
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
1
0
1
1
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
1
1
1
0
1
0
0
0
0
0
0
1
1
1
0
0
1
1
0
0
1
1
1
1
1
0
0
1
1
0
0
0


0
0
0
1
0
0
0
1
0
0
1
1
0
0
1
0
0
1
1
0
0
0
1
0
0
1
1
1
1
1
1
0
0
0
1
0
1
1
1
1
1
1
1
0
1
1
0
1
1
1
1
1
1
0
0
0
1
0
0
0
0
1
1
1
0
0
0
0
1
1
0
0
0
1
0
1
0
0
1
1
0
1
1
1
1
1
0
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
0
1
0
1
1
1
0
1
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
1
0
1
0
1
0
1
1
0
0
1
0
0
1
0
1
1
1
1
1
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
1
0
1
1
0
1
1
0
0
1
1
0
1
0
1
0
0
0
1
1
1
1
1
1
0
1
1
0
1
1
1
1
0
0
1
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
1
1
1
1
0
1
0
0
0
0
0
0
1
1
1
1
1
0
0
1
1
0
0
1
1
1
0
0
0
1
1
0
0
1
1
0
0
0
0
0
0
1
0
1
1
0
1
1
1
1
1
1
0
1
0
0
1
1
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
1
0
1
0
0
1
1
1
1
0
1
1
0
0
1
0
1
1
0
0
0
0
0
0
0
1
0
0
0
1
0
1
1
0
1
0
0
1
1
1
1
1
0
0
1
0
0
0
1
1
0
1
0
1
1
0
0
1
0
1
1
1
0
1
1
1
1
0
0
0
0
1
0
1
0
0
1
1
0
0
1
0
1
0
0
0
1
1
0
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
1
1
0
0
1
1
1
1
0
0
1
1
1
0
0
1
0
0
0
0
1
1
1
0
0
0
0
0
1
1
0
1
0
1
1
0


In [None]:
sum = 0
    for x in l:
        sum += x
    return sum

In [113]:
df_label.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Columns: 6096 entries, 0 to 6095
dtypes: int64(6096)
memory usage: 143.0 KB


In [114]:
df_label

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,6086,6087,6088,6089,6090,6091,6092,6093,6094,6095
0,0,6,0,0,8,8,8,7,8,3,...,7,8,8,3,8,1,8,0,0,0
1,8,8,8,8,8,8,8,8,8,8,...,8,8,8,8,8,8,8,8,8,8
2,8,8,8,0,8,8,8,7,8,8,...,7,8,8,8,0,8,8,8,8,7


In [None]:
# define contingency table
table = [[4, 2],
		 [1, 3]]


1
2
3
						Classifier2 Correct,	Classifier2 Incorrect
Classifier1 Correct 	??						??
Classifier1 Incorrect 	?? 						??
In the case of the first cell in the table, we must sum the total number of test instances that Classifier1 got correct and Classifier2 got correct. For example, the first instance that both classifiers predicted correctly was instance number 5. The total number of instances that both classifiers predicted correctly was 4.

Another more programmatic way to think about this is to sum each combination of Yes/No in the results table above.

						Classifier2 Correct,	Classifier2 Incorrect
Classifier1 Correct 	Yes/Yes					Yes/No
Classifier1 Incorrect 	No/Yes 					No/No
1
2
3
						Classifier2 Correct,	Classifier2 Incorrect
Classifier1 Correct 	Yes/Yes					Yes/No
Classifier1 Incorrect 	No/Yes 					No/No
The results organized into a contingency table are as follows:

						Classifier2 Correct,	Classifier2 Incorrect
Classifier1 Correct 	4						2
Classifier1 Incorrect 	1 						3
1
2
3
						Classifier2 Correct,	Classifier2 Incorrect
Classifier1 Correct 	4						2
Classifier1 Incorrect 	1 						3


In [None]:
# calculate mcnemar test
result = mcnemar(table, exact=True)

# summarize the finding
print('statistic=%.3f, p-value=%.3f' % (result.statistic, result.pvalue))

# interpret the p-value
alpha = 0.05

if result.pvalue > alpha:
    print('Same proportions of errors (fail to reject H0)')
else:
    print('Different proportions of errors (reject H0)')