In [1]:
%matplotlib inline
import datetime
import json
import pathlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import toolz
from IPython.display import Image, HTML

# Measuring Specificity

We use paired comparisons to analyze specificity and accuracy. For a target image $x$ and a fixed set of imposter images $Y$, the **specific accuracy** of a caption is the fraction of comparisons that chose $x$. 

We start with our dataset of paired comparisons.

In [2]:
captions = [
    "exactly how are both the dog and the person going to fit on that skateboard?",
    "the dark haired dog is trying to ride on the skateboard.",
    "a person in shorts and a black dog both have one foot on a skateboard.",
    "a dog with a black head and black legs and ears standing up has one black paw on a black skateboard with white wheels and a guy with black and white shoes and white socks has one foot on the skateboard also and there are bikes and other people in the background"
]

In [3]:
alternatives = 'dog-and-guy-on-skateboard just-dog-on-skateboard guy-on-skateboard-holding-dog dog-and-guy-next-to-skateboard'.split()
target = alternatives[0]
imposters = alternatives[1:]
applies_to = [
    'dog-and-guy-on-skateboard dog-and-guy-next-to-skateboard'.split(),
    'just-dog-on-skateboard'.split(),
    'dog-and-guy-on-skateboard'.split(),
    'dog-and-guy-on-skateboard just-dog-on-skateboard guy-on-skateboard-holding-dog dog-and-guy-next-to-skateboard'.split()
]
applies_to = {cap: tgts for cap, tgts in zip(captions, applies_to)}
applies_to

{'exactly how are both the dog and the person going to fit on that skateboard?': ['dog-and-guy-on-skateboard',
  'dog-and-guy-next-to-skateboard'],
 'the dark haired dog is trying to ride on the skateboard.': ['just-dog-on-skateboard'],
 'a person in shorts and a black dog both have one foot on a skateboard.': ['dog-and-guy-on-skateboard'],
 'a dog with a black head and black legs and ears standing up has one black paw on a black skateboard with white wheels and a guy with black and white shoes and white socks has one foot on the skateboard also and there are bikes and other people in the background': ['dog-and-guy-on-skateboard',
  'just-dog-on-skateboard',
  'guy-on-skateboard-holding-dog',
  'dog-and-guy-next-to-skateboard']}

In [4]:
import random
random.seed(0)
pairs = [[target, imposter] for imposter in imposters]
for pair in pairs:
    random.shuffle(pair)
pairs

[['dog-and-guy-on-skateboard', 'just-dog-on-skateboard'],
 ['dog-and-guy-on-skateboard', 'guy-on-skateboard-holding-dog'],
 ['dog-and-guy-next-to-skateboard', 'dog-and-guy-on-skateboard']]

In [5]:
def fake_answer_pairs_for_caption(applies, pairs):
    outcomes = []
    for a, b in pairs:
        choices = []
        if a in applies:
            choices.append(0)
        if b in applies:
            choices.append(1)
        if len(choices) == 0:
            choices = [0, 1]
        outcomes.append(random.choice(choices))
    return outcomes
fake_answer_pairs_for_caption(applies_to[captions[0]], pairs)

[0, 0, 1]

In [6]:
fake_comparisons_data = []
for caption in captions:
    for annotator in range(5):
        for pair, outcome in zip(pairs, fake_answer_pairs_for_caption(applies_to[caption], pairs)):
            picked = pair[outcome]
            fake_comparisons_data.append(dict(
                caption=caption,
                annotator=annotator,
                pair=pair,
                picked=picked))

In [7]:
data = pd.DataFrame(fake_comparisons_data)
len(data)

60

In [8]:
data['picked_correct'] = data['picked'] == 'dog-and-guy-on-skateboard'
data.groupby('caption').picked_correct.mean().sort_values()

caption
a dog with a black head and black legs and ears standing up has one black paw on a black skateboard with white wheels and a guy with black and white shoes and white socks has one foot on the skateboard also and there are bikes and other people in the background    0.333333
the dark haired dog is trying to ride on the skateboard.                                                                                                                                                                                                                 0.400000
exactly how are both the dog and the person going to fit on that skateboard?                                                                                                                                                                                             0.933333
a person in shorts and a black dog both have one foot on a skateboard.                                                                                                    

# Final analyses

We find a main effect of writing condition on outcome specificity.

In [9]:
%load_ext rpy2.ipython

In [14]:
results = pd.DataFrame([
    dict(participant=participant, condition=condition)
    for participant in 'abc def ghi'.split() for condition in 'general specific norecs'.split()
])
results['participant'] = results['participant'].astype('category')
results['condition'] = results['condition'].astype('category')
results['specificity'] = np.random.randn(len(results))

In [15]:
%%R
#install.packages("ARTool")
library(ARTool)

In [16]:
%%R -i results
summary(results)

    condition participant  specificity     
 general :3   abc:3       Min.   :-2.3097  
 norecs  :3   def:3       1st Qu.:-0.9336  
 specific:3   ghi:3       Median :-0.2099  
                          Mean   :-0.3794  
                          3rd Qu.: 0.6537  
                          Max.   : 1.0500  


In [17]:
%%R -i results
transformed <- art(specificity ~ condition + (1|participant), data=results)
summary(transformed)
anova(transformed)

Analysis of Variance of Aligned Rank Transformed Data

Table Type: Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) 
Model: Mixed Effects (lmer)
Response: art(specificity)

             F Df Df.res    Pr(>F)   
1 condition 31  2      4 0.0036731 **
---
Signif. codes:   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
