# Computing inter-annotator agreement for sentence classification tasks

<br><a target="_blank" href="https://colab.research.google.com/github/haukelicht/advanced_text_analysis/blob/main/notebooks/annotation/compute_ica_pledge_classification.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We want to compute to what extent annotators' sentence-level classifications agree with each other.
Specifically, following Krippendorff ([2022](https://doi.org/10.4135/9781071878781.n13)) and others (see Neuendorff, [2017](https://doi.org/10.4135/9781071802878.n6)), we want to compute a _chance-adjusted_ inter-annotator agreement metric that adjusts for the probability that an agreement arises by chance.


### Setup

In [16]:
# check if on colab
COLAB = True
try:
    import google.colab
except:
    COLAB=False

if COLAB:
    # shallow clone of current state of main branch 
    !git clone --branch main --single-branch --depth 1 --filter=blob:none https://github.com/haukelicht/advanced_text_analysis.git
    # make repo root findable for python
    import sys
    sys.path.append("/content/advanced_content_analysis/")
    
    # install required packages
    !pip install krippendorff==0.8.1

In [17]:
from pathlib import Path

import pandas as pd
from scipy.stats import entropy

from krippendorff import alpha as k_alpha

base_path = Path("/content/advanced_text_analysis/" if COLAB else "../../")
data_path = base_path / "data" / "labeled" / "fornaciari_we_2021"

## Read the annotations

In [18]:
# TODO: change `"llms"` to the name of your group's folder
annotations_path = data_path / "annotations" / "classification" / "group3"

# list all annotation files produced by doccano 
#  (each records annotations by one annotator)
fps = list(annotations_path.glob('*.csv'))

# read the annoations into a long-format DataFrame
annotations = pd.concat({fp.stem: pd.read_csv(fp) for fp in fps}, ignore_index=False).reset_index(level=0, names=['annotator'])

# list unique annotators
annotations.annotator.unique().tolist()

['luisa.kutlar', 'johanneskuhling', 'lopatina']

In [20]:
annotations[annotations.annotator=='luisa.kutlar']

Unnamed: 0,annotator,id,text,split_,text_id,metadata__year,metadata__party,label
0,luisa.kutlar,133935,10 . Protecting Indians overseas from exploita...,2,773,2014,INC,No Pledge
1,luisa.kutlar,133936,An ‘Extremely Backward Communities Development...,2,534,2009,BJP,Pledge
2,luisa.kutlar,133937,"Similarly, almost all remaining households hav...",2,313,2019,BJP,No Pledge
3,luisa.kutlar,133938,We will expand this initiative further to take...,2,444,2019,BJP,Pledge
4,luisa.kutlar,133939,"In this sacred endeavour, the Congress has joi...",2,19,2004,INC,No Pledge
5,luisa.kutlar,133940,The Indian National Congress has always stood ...,2,249,2009,INC,Pledge
6,luisa.kutlar,133941,We are committed to annulling Article 35A of t...,2,101,2019,BJP,Pledge
7,luisa.kutlar,133942,"A suitable law, enabling micro - credit operat...",2,518,2004,BJP,Pledge
8,luisa.kutlar,133943,"Education at all stages — primary, secondary a...",2,253,2009,INC,Pledge
9,luisa.kutlar,133944,The Indian National Congress is committed to p...,2,630,2014,INC,Pledge


In [5]:
annotations.value_counts(['annotator', 'label']).unstack(fill_value=0)

label,No Pledge,Pledge
annotator,Unnamed: 1_level_1,Unnamed: 2_level_1
johanneskuhling,24,26
lopatina,28,22
luisa.kutlar,26,24


## Compute inter-annotator agreement

In [6]:
tmp = annotations[['annotator', 'text_id', 'label']].copy()
tmp['label'] = (tmp['label'].str.lower()=='pledge').astype(int)
tmp = tmp.pivot_table(index='annotator', columns='text_id', values='label').fillna(0).astype(int)
k_alpha(tmp.values, level_of_measurement='nominal')

0.6413966049382716

Krippendorff (cited in Neuendorff, 2017) names the following standards

- Rely only on variables with reliabilities above α = .800.
- Consider variables with reliabilities between α = .667 and α = .800 only for drawing tentative conclusions.

## Sentence-level disagreement analysis

In [7]:
def compute_entropy(x: pd.Series) -> float:
    """Compute entropy of a value counts series."""
    return entropy(x.value_counts(normalize=True).values, base=2)

n_annotators = 4
print("perfect agreement")
print(f" - on positive label: entropy = {compute_entropy(pd.Series(['Pledge']*n_annotators)):.03f}")
print(f" - on negative label: entropy = {compute_entropy(pd.Series(['No Pledge']*n_annotators)):.03f}")
print('some agreement:')
print(f" - on positive label: entropy = {compute_entropy(pd.Series(['Pledge']*(n_annotators-1) + ['No Pledge']*1)):.03f}")
print(f" - on negative label: entropy = {compute_entropy(pd.Series(['Pledge']*1 + ['No Pledge']*(n_annotators-1))):.03f}")
print(f"no agreement (tied):  entropy = {compute_entropy(pd.Series(['Pledge', 'No Pledge']*int(n_annotators/2))):.03f}")

perfect agreement
 - on positive label: entropy = 0.000
 - on negative label: entropy = 0.000
some agreement:
 - on positive label: entropy = 0.811
 - on negative label: entropy = 0.811
no agreement (tied):  entropy = 1.000


In [8]:
entropies = annotations.groupby('text_id').agg({'label': compute_entropy}).reset_index().rename(columns={'label': 'entropy'})

In [9]:
entropies.entropy.value_counts(dropna=False)

entropy
0.000000    35
0.918296    13
Name: count, dtype: int64

- There is perfect agreement in 39 out of 50 cases.
- There is some disagreement in 7 cases.
- There is complete disagreement in 4 cases.


#### poor agreement instances

In [10]:
instances = entropies.query('entropy > 0 and entropy < 1').merge(annotations[['text_id', 'text']].drop_duplicates(), on='text_id').sort_values('text_id')
instances = instances.merge(annotations.groupby('text_id').agg({'label': list}).reset_index())
instances

Unnamed: 0,text_id,entropy,text,label
0,36,0.918296,We reach out to the minorities and even at the...,"[Pledge, No Pledge, No Pledge]"
1,150,0.918296,The recently established National Security Cou...,"[No Pledge, No Pledge, Pledge]"
2,164,0.918296,a . We will enact central legislation on the S...,"[Pledge, No Pledge, Pledge]"
3,197,0.918296,Deployment of broadband in every village would...,"[No Pledge, Pledge, No Pledge]"
4,241,0.918296,3 . Announce a detailed Jobs Agenda to ensure ...,"[No Pledge, Pledge, Pledge]"
5,249,0.918296,The Indian National Congress has always stood ...,"[Pledge, No Pledge, No Pledge]"
6,343,0.918296,"There shall be a special survey, which will be...","[Pledge, Pledge, No Pledge]"
7,354,0.918296,The BJP will set up an experts committee to de...,"[Pledge, Pledge, No Pledge]"
8,503,0.918296,MGNREGA will also be harnessed to support the ...,"[No Pledge, Pledge, No Pledge]"
9,630,0.918296,The Indian National Congress is committed to p...,"[Pledge, No Pledge, Pledge]"


#### bad agreement

In [15]:
instances = entropies.query('entropy == 1.0').merge(annotations[['text_id', 'text']].drop_duplicates(), on='text_id').sort_values('text_id')
instances = instances.merge(annotations.groupby('text_id').agg({'label': list}).reset_index())
instances

Unnamed: 0,text_id,entropy,text,label
