# Agreement Study

We randomly selected 3 cases of medium length (70-150 sentences) for double annotation to assess agreement of the revised annotation schema and guidelines. Annotation for this study was conducted by Michael and Nathan.

This notebook covers:

- Some high level stats about the annotations and disagreements
- IAA metrics, including F1 and Gamma
- Qualitative analysis of the disagreements

Cases for agreement study:

- 12625853_mixed_alito
- 12628561_ootc_sotomayor
- 12625931_dissent_thomas

## Setup


In [4]:
# Allows for seamless use of updated src

%load_ext autoreload
%autoreload 2

# Switch to top of curiam directory for easier paths
%cd ../../..


/home/mkranzlein/michael/dev/curiam


In [5]:
import os

from curiam import agreement

from curiam.inception import tsv_processing

## Summary Stats


In [6]:
agreement_folder = "data/full_scale/agreement_study"

# These are list of opinions which are list of sentences which are lists of tokens
# eg opinions_m[0][0][0] is the 0-th token of the 0-th sentence of the 0-th opinion in the agreement study.
opinions = [tsv_processing.process_opinion_file(f"{agreement_folder}/michael/{filename}")
          for filename in os.listdir(f"{agreement_folder}/michael")
          if filename.endswith(".tsv")]

opinions_n = [tsv_processing.process_opinion_file(f"{agreement_folder}/nathan/{filename}")
          for filename in os.listdir(f"{agreement_folder}/nathan")
          if filename.endswith(".tsv")]

# Set 4th column of each token to Nathan's annotation
# Each token now has the format: [sentence_num, tok_str, michael_annotation, nathan_annotation]
for i, opinion in enumerate(opinions):
    for j, sentence in enumerate(opinion):
        for k, token in enumerate(sentence):
            token.append(opinions_n[i][j][k][2])


### How many sentences?


In [193]:
sum([len(opinion) for opinion in opinions])

332


### How many tokens?


In [194]:
token_total = sum([len(sentence) for opinion in opinions for sentence in opinion])
print(token_total)

9109


### How many tokens received at least one label?

In [195]:
def get_token_coverage(sentence, annotation_column):
    return sum([1 if token[annotation_column] != "_" else 0 for token in sentence])

coverage_m = sum([get_token_coverage(sentence, 2) for opinion in opinions for sentence in opinion])
coverage_n = sum([get_token_coverage(sentence, 3) for opinion in opinions for sentence in opinion])

print(f"Tokens with at least one label:")
print(f"Michael: {coverage_m} ({coverage_m/token_total*100:.2f}%)")
print(f"Nathan: {coverage_n} ({coverage_n/token_total*100:.2f}%)")

Tokens with at least one label:
Michael: 4616 (50.68%)
Nathan: 4275 (46.93%)



### How many spans did each annotator annotate?



## Agreement

### Agreement Overall



#### Gamma


In [1]:
from pygamma_agreement import Continuum
from pyannote.core import Segment
from pygamma_agreement import CombinedCategoricalDissimilarity

In [27]:
def get_opinion_gamma(opinion):
    continuum = Continuum()
    offset = 0
    for sentence in opinion:
        annotations_m = tsv_processing.get_annotations(sentence, annotation_column=2)
        annotations_n = tsv_processing.get_annotations(sentence, annotation_column=3)
        for annotation in annotations_m:
            category, start, end = annotation[0], annotation[1], annotation[2]
            continuum.add("m", Segment(start+offset, end+offset+1), category)
        for annotation in annotations_n:
            category, start, end = annotation[0], annotation[1], annotation[2]
            continuum.add("n", Segment(start+offset, end+offset+1), category)
        offset += len(sentence)
    gamma_list = []
    dissim = CombinedCategoricalDissimilarity(alpha=1, beta=1)
    gamma_results = continuum.compute_gamma(dissim)
    return gamma_results.gamma

In [30]:
for opinion in opinions:
    print(get_opinion_gamma(opinion))

0.8060260297658612
0.8241309632062263
0.8666907699474498



#### P, R, F1

### Agreement By Category

## Qualitative Analysis