### Cohen's Kappa

Variables

treatment response - positive (+) or negative (-) (Dichotomous)
there are 2 variables - before and after treatment responses which are applied on a sample of 9 subjects

Objective
To assess the effectiveness of a new medical treatment in treating a rare medical condition

Choice of Method
we use Cohen's Kappa , since the data given is a series of paired data representing presence or absence of disease before and after treatment on each individual.
And since the categories are measured on a nominal scale and we are comparing ratings of patients by 2 scenarios , cohen's kappa measure is sufficient

Hypothesis

* Null Hypothesis: There is perfect agreement between before and after treatment conditions,i.e., there is no significant effectiveness of the drug in treating the disease.
K = 1
* Alternative hypothesis : There is no perfect agreement between before and after treatment conditions
K < 1

Level of Significance
we are testing the hypothesis at 0.05 level of significance

Assumptions

* ratings are nominal.
* the categories are mutually exclusive and exhaustive
* ratings are independent - ratings do not influence each others ratings
* raters are consistent, meaning that they rate the same items in the same way over time

In [12]:
import pandas as pd
data = pd.DataFrame({
    'before': ['+','+','+','-','+','+','+','-','-'],
    'after':['+','+','+','-','-','-','-','-','-']})
data

Unnamed: 0,before,after
0,+,+
1,+,+
2,+,+
3,-,-
4,+,-
5,+,-
6,+,-
7,-,-
8,-,-


In [13]:
# create some data 
before = pd.Categorical(['+','+','+','-','+','+','+','-','-'],  
                         categories=['-', '+']) 
  
after = pd.Categorical(['+','+','+','-','-','-','-','-','-'],  
                         categories=['-','+']) 

pd.crosstab(before, after,margins=True,rownames=['Before'], colnames=['After']) 

After,-,+,All
Before,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-,3,0,3
+,3,3,6
All,6,3,9


'+' represents absence of disease

'-' represents presence of disease

In [14]:
from sklearn.metrics import cohen_kappa_score

#define array of ratings for both raters
rater1 = ['+','+','+','-','+','+','+','-','-']
rater2 = ['+','+','+','-','-','-','-','-','-']

#calculate Cohen's Kappa
cohen_kappa_score(rater1, rater2)

0.4

In [15]:
cohen_kappa_score(rater1,rater2,weights='linear')

0.4

Interpretation

since kappa value k = 0.4 < 1, we reject the null hypothesis at 0.05 level of significance and conclude that there is significant effectiveness in the response of drug in treating the disease.

### Try mcnemar test on above example

### Fleiss' Kappa

In [33]:
from statsmodels.stats import inter_rater as irr
df = pd.read_csv("fliess.csv")

In [34]:
import numpy as np
tbl = df.to_numpy()
tbl.T

array([[3, 3, 3, 4, 5, 5, 2, 3, 5, 2, 2, 6, 1, 5, 2, 2, 1, 2, 4, 3],
       [3, 6, 4, 6, 2, 4, 2, 4, 3, 3, 2, 3, 3, 3, 2, 2, 1, 3, 3, 4],
       [2, 1, 4, 4, 3, 2, 1, 6, 1, 1, 1, 2, 3, 3, 1, 1, 3, 3, 2, 2]])

In [39]:
dats, cats = irr.aggregate_raters(tbl)
dats

array([[0, 1, 2, 0, 0, 0],
       [1, 0, 1, 0, 0, 1],
       [0, 0, 1, 2, 0, 0],
       [0, 0, 0, 2, 0, 1],
       [0, 1, 1, 0, 1, 0],
       [0, 1, 0, 1, 1, 0],
       [1, 2, 0, 0, 0, 0],
       [0, 0, 1, 1, 0, 1],
       [1, 0, 1, 0, 1, 0],
       [1, 1, 1, 0, 0, 0],
       [1, 2, 0, 0, 0, 0],
       [0, 1, 1, 0, 0, 1],
       [1, 0, 2, 0, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [1, 2, 0, 0, 0, 0],
       [1, 2, 0, 0, 0, 0],
       [2, 0, 1, 0, 0, 0],
       [0, 1, 2, 0, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0]])

In [42]:
kappa = irr.fleiss_kappa(dats, method='fleiss')
kappa

-0.04107648725212462

Interpretation: From the Kappa Value it is clear that, there is no agreement between the raters since the value is less than zero.