Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This directory contains the pluraSR200 dataset. If you use this dataset, please cite: Sen et al., "Turkers, Scholars, “Arafat” and “Peace”: Cultural Communities and Algorithmic Gold Standards", CSCW 2015. [ pdf ]. The paper also details the survey methodology used for the dataset.

General concept pairs

The "general" datasets contain general concept-pairs from WordSim 353 (50 questions). Responses from virtual workers on Amazon's Mechanical Turk ("turkers") and scholars are separated into different files. Each file contains three four-delimited fields: concept 1, concept 2, the mean response, and the number of subjects. The responses are on a one to five point scale.

The information below lists the filenames, along with statistics describing the number of responses for each concept pair.

    general-turker.txt n=50 (sum=461 min=6, mean=9.22, median=9.00, max=13)
    general-scholar.txt n=50 (sum=887 min=12, mean=17.74, median=18.00, max=27)

Domain-specific concept pairs

The "specific" dataset contains responses for concept-pairs from the fields of history, psychology, and biology. There are 50 pairs in each field, for a total of 150 pairs. The files below contain responses (in the same format as above) from turkers, scholars who are experts in the field ("scholar-expert") and other scholars.

    specific-turker.txt n=150 (sum=2218 min=7, mean=14.79, median=15.00, max=20)
    specific-scholar.txt n=150 (sum=3086 min=10, mean=20.57, median=21.00, max=28)
    specific-scholar-expert.txt n=150 (sum=1295 min=5, mean=8.63, median=9.00, max=13)

In addition, we have broken out the scholar-expert responses by subject:

    specific-history.txt n=50 (sum=434 min=5, mean=8.68, median=8.00, max=13)
    specific-biology.txt n=50 (sum=512 min=8, mean=10.24, median=10.00, max=13)
    specific-psychology.txt n=50 (sum=349 min=5, mean=6.98, median=7.00, max=10)