# Computing inter-annotator agreement for sentence classification tasks

<br><a target="_blank" href="https://colab.research.google.com/github/haukelicht/advanced_text_analysis/blob/main/notebooks/annotation/compute_ica_pledge_classification.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We want to compute to what extent annotators' sentence-level classifications agree with each other.
Specifically, following Krippendorff ([2022](https://doi.org/10.4135/9781071878781.n13)) and others (see Neuendorff, [2017](https://doi.org/10.4135/9781071802878.n6)), we want to compute a _chance-adjusted_ inter-annotator agreement metric that adjusts for the probability that an agreement arises by chance.


### Setup

In [20]:
# check if on colab
COLAB = True
try:
    import google.colab
except:
    COLAB=False

if COLAB:
    # shallow clone of current state of main branch 
    !git clone --branch main --single-branch --depth 1 --filter=blob:none https://github.com/haukelicht/advanced_text_analysis.git
    # make repo root findable for python
    import os, sys
    sys.path.append(f"{os.getcwd()}/advanced_content_analysis/")
    
    # install required packages
    !pip install krippendorff==0.8.1

In [None]:
from pathlib import Path

import pandas as pd
from scipy.stats import entropy

from krippendorff import alpha as k_alpha

data_path = "data/labeled/fornaciari_we_2021"
data_path = ("/content/advanced_text_analysis/" if COLAB else "../../") + data_path
data_path = Path(data_path)

## Read the annotations

In [22]:
# TODO: change `"llms"` to the name of your group's folder
annotations_path = data_path / "annotations" / "classification" / "group3"

# list all annotation files produced by doccano 
#  (each records annotations by one annotator)
fps = list(annotations_path.glob('*.csv'))

# read the annoations into a long-format DataFrame
annotations = pd.concat({fp.stem: pd.read_csv(fp) for fp in fps}, ignore_index=False).reset_index(level=0, names=['annotator'])

# list unique annotators
annotations.annotator.unique().tolist()

['luisa.kutlar', 'johanneskuhling', 'lopatina']

In [25]:
annotations.value_counts(['annotator', 'label']).unstack(fill_value=0)

label,No Pledge,Pledge
annotator,Unnamed: 1_level_1,Unnamed: 2_level_1
johanneskuhling,24,26
lopatina,28,22
luisa.kutlar,26,24


## Compute inter-annotator agreement

In [31]:
tmp = annotations[['annotator', 'text_id', 'label']].copy()
tmp['label'] = (tmp['label'].str.lower()=='pledge').astype(int)
tmp = tmp.pivot_table(index='annotator', columns='text_id', values='label').fillna(0).astype(int)
k_alpha(tmp.values, level_of_measurement='nominal')

0.6413966049382716

Krippendorff (cited in Neuendorff, 2017) names the following standards

- Rely only on variables with reliabilities above α = .800.
- Consider variables with reliabilities between α = .667 and α = .800 only for drawing tentative conclusions.

## Sentence-level disagreement analysis

In [None]:
def compute_entropy(x: pd.Series) -> float:
    """Compute entropy of a value counts series."""
    return entropy(x.value_counts(normalize=True).values, base=2)

print("perfect agreement")
print(f" - on positive label: entropy = {compute_entropy(pd.Series(['Pledge']*4)):.03f}")
print(f" - on negative label: entropy = {compute_entropy(pd.Series(['No Pledge']*4)):.03f}")
print('some agreement:')
print(f" - on positive label: entropy = {compute_entropy(pd.Series(['Pledge']*3 + ['No Pledge']*1)):.03f}")
print(f" - on negative label: entropy = {compute_entropy(pd.Series(['Pledge']*1 + ['No Pledge']*3)):.03f}")
print(f"no agreement (tied):  entropy = {compute_entropy(pd.Series(['Pledge', 'No Pledge']*2)):.03f}")

perfect agreement
 - on positive label: entropy = 0.000
 - on negative label: entropy = 0.000
some agreement:
 - on positive label: entropy = 0.811
 - on negative label: entropy = 0.811
no agreement (tied):  entropy = 1.000


In [7]:
entropies = annotations.groupby('text_id').agg({'label': compute_entropy}).reset_index().rename(columns={'label': 'entropy'})

In [8]:
entropies.entropy.value_counts(dropna=False)

entropy
0.000000    39
0.811278     7
1.000000     4
Name: count, dtype: int64

- There is perfect agreement in 39 out of 50 cases.
- There is some disagreement in 7 cases.
- There is complete disagreement in 4 cases.


#### poor agreement instances

In [None]:
instances = entropies.query('entropy > 0 and entropy < 1').merge(annotations[['text_id', 'text']].drop_duplicates(), on='text_id').sort_values('text_id')
print(*instances.text.tolist(), sep='\n')

We will also introduce a multi - purpose identity card for all citizens .
More specialist battalions will be raised and positioned in key locations across the country .
A raw material use policy will be unveiled in the mines sector .
A national programme will be launched, in cooperation with State Governments, to provide bicycles to girls from Below Poverty Line Families who attend school .
Make potable drinking water available to all thus reducing water - borne diseases, which will automatically translate into Diarrhoea - free India .
Reservations for the poor among ‘Forward Classes’ will be introduced after receiving recommendations of the Commission set up for this purpose .
The number of courts and the number of judges will be doubled in five years for quicker judicial process .


#### bad agreement

In [9]:
instances = entropies.query('entropy == 1.0').merge(annotations[['text_id', 'text']].drop_duplicates(), on='text_id').sort_values('text_id')
print(*instances.text.tolist(), sep='\n')

India’s indigenous thorium technology programme will be expedited and given all financial assistance, correcting the grievous wrong done by the UPA Government .
Immediately after forming the governments in Chhattisgarh, Madhya Pradesh and Rajasthan, as promised, the 3 Congress Governments waived the loans of farmers .
New middle - level technical institutes in clusters where, for example, weavers and artisans are concentrated, will be started .
The Congress will identify those environmental management functions that could be delegated to the states and local bodies .
