# Task 1 - Congruency

Do we agree on whether there are any errors in the descriptions? This notebook provides an analysis.

In [2]:
import json
from sklearn.metrics import cohen_kappa_score, accuracy_score

# These two functions are necessary to display the images in the notebook.
from IPython.core.display import display, HTML

## Loading the data

First load the data. This code also contains a function to display images with their generated captions, so that we can inspect them from inside the notebook.

In [3]:
with open('./val_images.txt') as f:
    images = {str(i):line for i,line in enumerate(f)}  

with open('./satyrid/19-sept-2016-error-analysis.dev.txt') as f:
    descriptions = {str(i):line for i,line in enumerate(f)}

def display_image(i):
    "Display the image."
    image = images[i]
    description = descriptions[i]
    image_path = './static/images/' + image
    html= """<center>
              {} <b>image</b>: {}<br/>
              <img src={} width="300px"><br/>
              {}<br/>
              </center>""".format(i, image, image_path, description)
    display(HTML(html))

def load_json(filename):
    "Load and return the JSON file."
    with open(filename) as f:
        return json.load(f)

def get_annotation_lists(a,b):
    "Get two lists of labels for annotation dicts A and B. (A should be the shortest one.)"
    a_list = []
    b_list = []
    ids = []
    for key,value in a.items():
        ids.append(key)
        a_list.append(value)
        try:
            b_list.append(b[key])
        except KeyError:
            a_list.pop()
            ids.pop()
    return a_list, b_list, ids

congruency_desmond = load_json('./annotations_desmond/congruency_data.json')
incongruent_desmond = load_json('./annotations_desmond/incongruent_categorized.json')

# This image was in the guidelines. 
# Desmond annotated an additional image so as to keep the annotation honest.
del congruency_desmond["82"]
del incongruent_desmond["82"]


congruency_emiel = load_json('./annotations_emiel/congruency_data_final.json') 
incongruent_emiel = load_json('./annotations_emiel/incongruent_final.json') 

## Compute inter-annotator agreement score

Below we compute the inter annotator agreement score (Cohen's kappa) and the accuracy (percent overlap). The kappa score indicates substantial agreement, and the accuracy is very high. Since we have double-annotated 100 examples, we can deduce that there are 9 descriptions where the annotators disagree.

In [4]:
labels_desmond, labels_emiel, ids = get_annotation_lists(congruency_desmond, congruency_emiel)

kappa = cohen_kappa_score(labels_desmond, labels_emiel)
accuracy = accuracy_score(labels_desmond, labels_emiel)

print("Kappa score of:", kappa, "with an accuracy of:", accuracy)

Kappa score of: 0.674855491329 with an accuracy of: 0.91


The kappa score is much lower than the accuracy because the majority class (incongruent) strongly outnumbers the minority class. Here are the absolute numbers:

In [5]:
num_incongruent_desmond = labels_desmond.count('incongruent')
num_incongruent_emiel = labels_emiel.count('incongruent')
num_both = len([1 for a,b in zip(labels_desmond,labels_emiel) if a == b and a == 'incongruent'])

print(num_incongruent_desmond, num_incongruent_emiel, num_both)

81 86 79


In [6]:
for judgment_a, judgment_b, i in zip(*get_annotation_lists(congruency_desmond, congruency_emiel)):
    if judgment_a != judgment_b:
        display_image(i)
        display(HTML('<center>D: {} - E: {}</center>'.format(judgment_a, judgment_b)))
        if judgment_a == 'incongruent':
            display(HTML('<center>{}</center>'.format(incongruent_desmond[i])))
        else:
            display(HTML('<center>Reason for incongruent: {}</center>'.format(incongruent_emiel[i])))