# Agreement Study

We randomly selected 3 cases of medium length (70-150 sentences) for double annotation to assess agreement of the revised annotation schema and guidelines. Annotation for this study was conducted by Michael and Nathan.

This notebook covers:

- Some high level stats about the annotations and disagreements
- IAA metrics, including F1 and Gamma
- Qualitative analysis of the disagreements

Cases for agreement study:

- 12625853_mixed_alito
- 12628561_ootc_sotomayor
- 12625931_dissent_thomas

## Setup


In [1]:
# Allows for seamless use of updated src

%load_ext autoreload
%autoreload 2

# Switch to top of curiam directory for easier paths
%cd ../../..


/home/mkranzlein/michael/dev/curiam


In [3]:
import os

from curiam import agreement

from curiam.inception import tsv_processing

## Summary Stats


In [178]:
agreement_folder = "data/full_scale/agreement_study"

# These are list of opinions which are list of sentences which are lists of tokens
# eg opinions_m[0][0][0] is the 0-th token of the 0-th sentence of the 0-th opinion in the agreement study.
opinions = [tsv_processing.process_opinion_file(f"{agreement_folder}/michael/{filename}")
          for filename in os.listdir(f"{agreement_folder}/michael")
          if filename.endswith(".tsv")]

opinions_n = [tsv_processing.process_opinion_file(f"{agreement_folder}/nathan/{filename}")
          for filename in os.listdir(f"{agreement_folder}/nathan")
          if filename.endswith(".tsv")]

# Set 4th column of each token to Nathan's opinion
# Each token now has the format: [sentence_num, tok_str, michael_annotation, nathan_annotation]
for i, opinion in enumerate(opinions):
    for j, sentence in enumerate(opinion):
        for k, token in enumerate(sentence):
            token.append(opinions_n[i][j][k][2])


### How many sentences?


In [30]:
sum([len(opinion) for opinion in opinions_m])

332


### How many tokens?


In [9]:
token_total = sum([len(sentence) for opinion in opinions for sentence in opinion])
print(token_total)

9109


### How many tokens received at least one label?

In [26]:
def get_token_coverage(sentence, annotation_column):
    return sum([1 if token[annotation_column] != "_" else 0 for token in sentence])

coverage_m = sum([get_token_coverage(sentence, 2) for opinion in opinions for sentence in opinion])
coverage_n = sum([get_token_coverage(sentence, 3) for opinion in opinions for sentence in opinion])

print(f"Tokens with at least one label:")
print(f"Michael: {coverage_m} ({coverage_m/token_total*100:.2f}%)")
print(f"Nathan: {coverage_n} ({coverage_n/token_total*100:.2f}%)")

Tokens with at least one label:
Michael: 4616 (50.68%)
Nathan: 4275 (46.93%)



### How many spans did each annotator annotate?



## Agreement

### Agreement Overall



#### Gamma


In [192]:
tsv_processing.get_annotations(opinions[0][60],3)

[['Legal Source', 1, 1],
 ['Metalinguistic Cue', 5, 5],
 ['Legal Source', 10, 14],
 ['Direct Quote', 19, 60],
 ['Direct Quote', 25, 36],
 ['Direct Quote', 42, 59]]


#### P, R, F1

### Agreement By Category

## Qualitative Analysis