The **Cohen's Kappa Score measures** the level of agreement between two raters (or classifiers) who are assigning categorical labels to a set of items, while accounting for the agreement that would be expected by chance.

### Definition

Let's say we have:
- A set of $n$ items.
- Two raters (or classifiers) assigning each item to one of $k$ possible categories.

Define:
- $O=$ observed agreement = proportion of cases where the two raters agree.
- $E=$ expected agreement $=$ proportion of agreement that would be expected by random chance.

Goal:
We want a measure that tells us:
- How much better the agreement is compared to random chance.

The observed agreement is simply the proportion of cases where the two raters agree:

$$
O=\frac{\text { Number of agreements }}{n}
$$

where:
- $n=$ total number of items.
- Number of agreements = number of items where both raters assign the same category.

### **Expected Agreement by Chance**

The expected agreement $E$ is computed based on the marginal probabilities that each rater assigns to a category.

Let:
- $p_i^A=$ proportion of items that Rater A assigns to category $i$.
- $p_i^B=$ proportion of items that Rater B assigns to category $i$.

The chance agreement for category $i$ is $p_i^A \cdot p_i^B$ (because if both raters are choosing randomly, the probability they both pick the same category $i$ is the product of their independent probabilities).

Thus, the total expected agreement is:

$$
E=\sum_{i=1}^k p_i^A \cdot p_i^B
$$

where:
- $k=$ number of categories.

### **Cohen's Kappa Formula**

Cohen's Kappa measures how much better the agreement is compared to random chance, scaled between -1 and 1 :

$$
\kappa=\frac{O-E}{1-E}
$$

where:
- $O=$ observed agreement.
- $E=$ expected agreement by chance.

Interpretation:
- $\kappa=1 \rightarrow$ Perfect agreement.
- $\kappa=0 \rightarrow$ Agreement is no better than chance.
- $\kappa<0 \rightarrow$ Worse than chance (systematic disagreement).

In [2]:
from sklearn.metrics import cohen_kappa_score
y1 = ["negative", "positive", "negative", "neutral", "positive"]
y2 = ["negative", "positive", "negative", "neutral", "negative"]
cohen_kappa_score(y1, y2)

np.float64(0.6875)

In [3]:
cohen_kappa_score(y1, y2, weights='quadratic')

np.float64(0.5)