This notebook normalizes human judged ground truth from various originality and creativity scoring studies.

This is an assortment of studies, with different demographics, goals, and test setups. It is most appropriate for supervised learning of automated scoring, where we're not necessarily trying to learn about the participants but about the *human judges* - how they interpret the originality scoring task in general.

In [None]:
%load_ext autoreload
%autoreload 2

In [3]:
from ocsai.data import download_from_description, prep_general
import pandas as pd

## Dumas et al 2020

In [4]:
desc = {
    "name": "dod20",
    "test_type": "uses",
    "meta": {
        "inline": "Dumas et al 2020",
        "download": {"url": "https://osf.io/download/u3yv4/", "extension": "csv"}
    },
    "null_marker": "!!!",
    "column_mappings": {},
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0], index_col=0)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Dumas et al 2020*

Replacing !!! with NaN in response column


- name: dod20
- no_of_prompts: 10
- no_of_participants: 92
- no_of_data_points: 5490
- prompts: ['book', 'bottle', 'brick', 'fork', 'pants', 'rope', 'shoe', 'shovel', 'table', 'tire']
- ICC2k: 0.85
- ICC2k_CI: 0.79-0.88
- ICC3k: 0.87
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3008,uses,dod20,What is a surprising use for a BOTTLE?,bottle,utensil holder,dod20_bottle-74d036,3.0,dod2041,3,eng,4,0.0
442,uses,dod20,What is a surprising use for a TIRE?,tire,to make a swing set from a tree,dod20_tire-ee61be,2.5,dod202,1,eng,4,0.57735


## Silvia et al 2009

In [5]:
desc = {
    "name": "snbmo09",
    "test_type": "uses",
    "meta": {
        "inline": "Silvia et al. 2009",
        "citation": "Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015",
        "download": {"url": "https://osf.io/download/qdrv8/", "ext": "csv"}
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "language": "eng",
}
fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0])
df['prompt'] = df.task.apply(lambda x: x.split('_')[-1])
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2009*

Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015

- Renaming columns {'subject': 'participant', 'response_order': 'response_num'}

Dropping 10 unrated items


- name: snbmo09
- no_of_prompts: 3
- no_of_participants: 202
- no_of_data_points: 4099
- prompts: ['brick', 'knife', 'box']
- ICC2k: 0.69
- ICC2k_CI: 0.57-0.77
- ICC3k: 0.76
- rater_cols: ['rater_1', 'rater_2', 'rater_3', 'rater_4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2063,uses,snbmo09,What is a surprising use for a KNIFE?,knife,cut meat,snbmo09_2_knife-c5e94e,1.0,snbmo09103,2,eng,4,0.0
3680,uses,snbmo09,What is a surprising use for a KNIFE?,knife,hammer,snbmo09_2_knife-ec0c55,2.5,snbmo09179,3,eng,4,0.57735


## Hass 2017

This study looked at uses for *bottle* and *brick*. There were 54 participants after data cleaning.

Rating was on a 5-point scale. For verification, their reported inter-rater reliability was ICC2k was 0.80 for brick and 0.78 for bottle, which is about what we see below.

The rating data was stoplisted, so I need to reconstruct the original responses here.

In [6]:
desc = {
    "name": 'hass17',
    "test_type": "uses",
    "meta": {
        "inline": "Hass 2017",
        "citation": "Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039",
        "url": "https://osf.io/ng598",
        "download": [
            {"url": 'https://osf.io/download/mcykr/', "ext": "xlsx"}, # rater scores
            {"url": 'https://osf.io/download/27bx8/', "ext": "xlsx"},  # responses 1
            {"url": 'https://osf.io/download/rzvyd/', "ext": "xlsx"}  # responses 2
        ],
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "rater_cols": ['r1','r2','r3'],
    "language": "eng",
}

(ratings_fname, responses_fname, responses2_fname) = download_from_description(desc, '../data/raw')

# custom parsing specific to this dataset
all_ratings = []
for sheet, prompt in [('br_exp1', 'brick'),('br_exp2', 'brick'),('bot_exp1', 'bottle'),('bot_exp2', 'bottle')]:
    data = pd.read_excel(ratings_fname, sheet_name=sheet) #.rename(columns={'subject':'participant','response_order':'response_num'})
    data['prompt'] = prompt
    all_ratings.append(data)
hassratings = pd.concat(all_ratings).rename(columns={'response':'cleaned'})
participants = pd.concat([pd.read_excel(responses_fname), pd.read_excel(responses2_fname)])

# melt original responses to long, reconstructe the cleaned columns, then join with ratings
long_part = participants.melt(id_vars='ID', value_name='response').rename(columns={'ID':'participant'})
long_part = long_part[long_part.variable.str.contains('resp') & ~long_part.variable.str.contains('time')].dropna()
long_part[['prompt', 'response_num']] = long_part.variable.str.split('_', expand=True)

long_part.loc[long_part.prompt.str.contains('resp1'), 'prompt'] = 'bottle'
long_part.loc[long_part.prompt.str.contains('resp2'), 'prompt'] = 'brick'
long_part.sample(10)

import nltk
nltk.download('punkt')
nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stops = stopwords.words('english')
# not sure which list the study used, so just adjust based on testing
stops += ['could']
stops = [w for w in stops if w not in ['can']]
stops = set(stops)

def stop_clean(x):
    x = x.lower()
    x = x.replace('i.e.', 'e') # quirk of the tokenization in original study
    for c in list("/\\'-()"):
        x = x.replace(c, '')
    words = [word for word in word_tokenize(x) if word not in stops]
    return " ".join(words)

long_part['cleaned'] = long_part.response.apply(stop_clean)
hass07 = long_part.merge(hassratings, how='left', on=['prompt', 'cleaned'])

cleaned = prep_general(hass07, **desc, save_dir='../data/datasets')
cleaned.sample(2)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Loading *Hass 2017*

Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039

- Renaming columns {'subject': 'participant', 'response_order': 'response_num'}

- name: hass17
- no_of_prompts: 2
- no_of_participants: 57
- no_of_data_points: 1093
- prompts: ['bottle', 'brick']
- ICC2k: 0.79
- ICC2k_CI: 0.75-0.82
- ICC3k: 0.8
- rater_cols: ['r1', 'r2', 'r3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
17,uses,hass17,What is a surprising use for a BOTTLE?,bottle,storage,hass17_bottle-220d00,1.666,hass1712,1,eng,3,1.154701
35,uses,hass17,What is a surprising use for a BOTTLE?,bottle,Break it,hass17_bottle-b776dd,1.333,hass1723,1,eng,3,0.57735


## Silvia et al 2008

This was the order of creativity tasks:

1. Please list all of the creative, unusual uses for a brick that you can think of.
2. Please list all of the creative, unusual instances of things that are round that you can think of.
3. Imagine that people no longer needed to sleep. Please list creative, unusual consequences that would follow.
4. Please list all of the creative, unusual uses for a knife that you can think of.
5. Please list all of the creative, unusual instances of things that will make a noise that you can think of.
6. Imagine that everyone shrank to 12 inches tall. Please list creative, unusual consequences that would follow.

Numbers 1 and 4 are AUT.



In [7]:
# Support .sav files
import pyreadstat
desc = {
    "name": "setal08",
    "meta": {
        "inline": "Silvia et al. 2008",
        "citation": "Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68",
        "url": "https://osf.io/dh7ey/",
        "download": {"url": "https://files.osf.io/v1/resources/4ketx/providers/osfstorage/5dd70d1f83135e000ec3c242/?zip=",
                    "extension": "zip",
                    "archive_files": ['DT_Responses_PACA_2008_Study_2.sav']
                    }
    },
    "column_mappings": {
        "subject":"participant",
        "order":"response_num"
        },
    "replace_values": {
        "prompt": {
            1: "brick",
            2: "round",
            3: "no sleep",
            4: "knife",
            5: "noise",
            6: "shrank"
        },
    },
    "question_mappings": {
        "brick": "What is a surprising use for a BRICK?",
        "round": "What is a surprising thing that is ROUND?", 
        "no sleep": "What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?", 
        "knife": "What is a surprising use for a KNIFE?",
        "noise": "What is a surprising thing that makes a NOISE?",
        "shrank": "What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?"
    },
    "type_mappings": {
        "brick": "uses",
        "round": "uses",
        "no sleep": "consequences",
        "knife": "uses",
        "noise": "instances",
        "shrank": "consequences"
    },
    "range": [1, 5],
    "language": "eng",
}

# Download data
fnames = download_from_description(desc, '../data/raw', extension='zip')

# Some manual cleanup
df, meta = pyreadstat.read_sav(fnames[0])
# all three are mapped from task
for col in ['prompt', 'type', 'question']:
    df[col] = df['task'].astype(int)
df['subject'] = df['subject'].astype(int)
# doublecheck - burczak reported ICC2k as 0.48 for uses
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2008*

Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68

- Renaming columns {'subject': 'participant', 'order': 'response_num'}

- Inferring questions {'brick': 'What is a surprising use for a BRICK?', 'round': 'What is a surprising thing that is ROUND?', 'no sleep': 'What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?', 'knife': 'What is a surprising use for a KNIFE?', 'noise': 'What is a surprising thing that makes a NOISE?', 'shrank': 'What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?'}

- Inferring types {'brick': 'uses', 'round': 'uses', 'no sleep': 'consequences', 'knife': 'uses', 'noise': 'instances', 'shrank': 'consequences'}

Dropping 37 unrated items


- name: setal08
- no_of_prompts: 6
- no_of_participants: 242
- no_of_data_points: 11490
- prompts: ['brick', 'round', 'no sleep', 'knife', 'noise', 'shrank']
- ICC2k: 0.43
- ICC2k_CI: 0.22-0.57
- ICC3k: 0.54
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
9882,uses,setal08,What is a surprising thing that is ROUND?,round,cake,setal08_2.0-9ade8b,1.0,setal08207,8.0,eng,3,0.0
515,uses,setal08,What is a surprising use for a BRICK?,brick,weapon,setal08_1.0-cd380a,1.667,setal0813,9.0,eng,3,0.57735


## Hofelich Mohr, Sell, and Lindsay 2016

In [8]:
desc = {
    "name": "hmsl",
    "meta": {
        "inline": "Hofelich Mohr et al. 2016",
        "citation": "Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736",
        "url": "https://doi.org/10.1177/0894439315588736",
        "download": {
            "url": "https://conservancy.umn.edu/bitstream/handle/11299/172116/HMSL_CSV%20Data%20Files.zip?sequence=28&isAllowed=y",
            "extension": "zip",
            "archive_files": ['HMSL_Originality_scores_all.csv']   
        }
    },
    "null_marker": 11,
    "column_mappings": {'Item': 'prompt', 'QLogin_1':'participant'},
    "rater_cols": ['J1_Rating','J2_Rating','J3_Rating','J4_Rating'],
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
# Doublecheck ICC2k - burczak paper had icc2k=0.67
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Hofelich Mohr et al. 2016*

Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736

- Renaming columns {'Item': 'prompt', 'QLogin_1': 'participant'}

Replacing 11 with NaN in response column
Dropping 23 unrated items


- name: hmsl
- no_of_prompts: 2
- no_of_participants: 638
- no_of_data_points: 3843
- prompts: ['paperclip', 'brick']
- ICC2k: 0.67
- ICC2k_CI: 0.53-0.75
- ICC3k: 0.74
- rater_cols: ['J1_Rating', 'J2_Rating', 'J3_Rating', 'J4_Rating']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2481,uses,hmsl,What is a surprising use for a PAPERCLIP?,paperclip,String together to make a chain,hmsl_paperclip-a33819,1.5,hmslgEOtW7kc,2.0,eng,4,0.57735
2209,uses,hmsl,What is a surprising use for a PAPERCLIP?,paperclip,make a necklace,hmsl_paperclip-5c76b5,1.75025,hmslw4r5kjlA,3.0,eng,4,0.5


## Datasets used by Beaty and Johnson 2021

From SemDis paper:

- Study 1 was re-analysis of AUT responses from Beaty et al., 2018 to see if ensemble approaches work better. Two tests: `box` and `rope`
   - according to their paper, using additive composition was slightly negative correlation, while multiplicative 'results revealed a large correlation between latent semantic distance and human ratings:$r=.91$, p<.001'. This uses a model that weighs the factors, but is (I think) tailored to the dataset without held out data.

- Study 2 was re-analysis of results from Silvia et al. 2017, also on box and rope 
- Study 3 was brick - yet again - via Beaty and Silvia 2012
- Study 4 and 5- Heinen and Johnson (2018) - were noun matching, not relevant here

In [9]:
desc = {
    "name": "bj21",
    "meta": {
        "inline": "Beaty and Johnson 2021",
        "citation": "Beaty, R. E., & Johnson, D. R. (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. Behavior Research Methods, 53(2), 757–780. https://doi.org/10.3758/s13428-020-01453-w",
        "url": "https://doi.org/10.3758/s13428-020-01453-w",
        "download": {
            "url": "https://files.osf.io/v1/resources/gz4fc/providers/osfstorage/5e45b6c73e86a800be6e662e/?zip=",
            "extension": "zip",
            "archive_files": ['Study 1/s1_data_long.xlsx',
                              'Study 2/s2_data_long.xlsx',
                              'Study 3/s3_data_long.xlsx']   
        }
    },
    "column_mappings": {'id':'participant', 'item':'prompt'},
    "range": [1, 5],
    "language": "eng",
}

substudies = [
    {
        "name": "betal18",
        "meta": {
            "inline": "Beaty et al., 2018",
            "citation": "Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115"
        }
    },
    {
        "name": "snb17",
        "meta": {
            "inline": "Silvia et al., 2017",
            "citation": "Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101"
        }
    },
    {
        "name": "bs12",
        "meta": {
            "inline": "Beaty & Silvia, 2012",
            "citation": "Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171"
        }
    },

]

fnames = download_from_description(desc, '../data/raw')
# the data comes from past studies, so we'll rename the files
# individually to their original studies
for fname,substudy in zip(fnames, substudies):
    new_desc = desc.copy()
    new_desc.update(substudy)
    df = pd.read_excel(fname)
    cleaned = prep_general(df, **new_desc, save_dir='../data/datasets')
    display(cleaned.sample(2))


### Loading *Beaty et al., 2018*

Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: betal18
- no_of_prompts: 2
- no_of_participants: 171
- no_of_data_points: 2918
- prompts: ['box', 'rope']
- ICC2k: 0.82
- ICC2k_CI: 0.77-0.85
- ICC3k: 0.84
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2127,uses,betal18,What is a surprising use for a ROPE?,rope,handcuffs,betal18_rope-cd23af,1.0,betal182071,,eng,4,0.0
511,uses,betal18,What is a surprising use for a BOX?,box,a table,betal18_box-9c09ed,1.0,betal182054,,eng,4,0.0


### Loading *Silvia et al., 2017*

Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: snb17
- no_of_prompts: 2
- no_of_participants: 142
- no_of_data_points: 2372
- prompts: ['box', 'rope']
- ICC2k: 0.67
- ICC2k_CI: 0.55-0.75
- ICC3k: 0.72
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1906,uses,snb17,What is a surprising use for a ROPE?,rope,to tie something with,snb17_rope-e27768,1.0,snb1797,,eng,3,0.0
1304,uses,snb17,What is a surprising use for a ROPE?,rope,learn to lasso,snb17_rope-29cfd2,1.333,snb1715,,eng,3,0.57735


### Loading *Beaty & Silvia, 2012*

Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: bs12
- no_of_prompts: 1
- no_of_participants: 133
- no_of_data_points: 1807
- prompts: ['brick']
- ICC2k: 0.72
- ICC2k_CI: 0.56-0.8
- ICC3k: 0.78
- rater_cols: ['br_rater1', 'br_rater2', 'br_rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
604,uses,bs12,What is a surprising use for a BRICK?,brick,step,bs12_brick-cd8449,1.0,bs1242,,eng,3,0.0
1107,uses,bs12,What is a surprising use for a BRICK?,brick,Bases for monuments.,bs12_brick-f8d86f,1.0,bs1281,,eng,3,0.0


## MOTES Pilot

MOTES is related to the "Measuring Original Thinking in Elementary Students: A Text-Mining Approach" (IES #R305A200519). This data is related to a high stakes test and is limited to research access. If you're a creativity research, please reach out to request it from <peter.organisciak@du.edu> and/or <selcuk.acar@unt.edu>.

In [10]:
desc = {
    "name": "motesp",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "rater_cols": ['D', 'K', 'T'],
    "range": [1, 7],
    "replace_values": {
        "prompt": {
            "lightbulb": "light bulb"
        },
        "question": {
            "What is a surprising use for a LIGHTBULB?": "What is a surprising use for a LIGHT BULB?"
        }
    },
    "language": "eng",
}
df = pd.read_csv('../data/raw/motesp_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

- name: motesp
- no_of_prompts: 29
- no_of_participants: 35
- no_of_data_points: 963
- prompts: ['backpack', 'ball', 'bottle', 'hat', 'light bulb', 'pencil', 'shoe', 'sock', 'spoon', 'toothbrush', 'big', 'cold', 'fun', 'red', 'smelly', 'soft', 'tasty', 'wet', 'aliens landed', 'kid president', 'rain soda', 'teacher read minds', 'time travel', 'friend phone', 'library', 'playground', 'school bus', 'sleepover', 'teacher talking']
- ICC2k: 0.73
- ICC2k_CI: 0.66-0.78
- ICC3k: 0.75
- rater_cols: ['D', 'K', 'T']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
152,uses,motesp,What is a surprising use for a LIGHT BULB?,light bulb,Different colored light bulbs.,motesp_g1_lightbulb-223e84,2.555333,motesp28JB,,eng,3,0.57735
120,uses,motesp,What is a surprising use for a HAT?,hat,use the hat inside out,motesp_g1_hat-0cc028,2.555333,motesp28JB,,eng,3,0.57735


## MOTES

This is the post-pilot data. As with the pilot data, this dataset is available on request. Please reach out!

In [11]:
desc = {
    "name": "motesf",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "column_mappings": {'ID':'participant'},
    "null_marker": -999,
    "rater_cols": ["Kscore", "Hscore", "Cscore", "Tscore", "Mscore"],
    "replace_values": {
        "prompt": {
            "lightbulb": "light bulb"
        },
        "question": {
            "What is a surprising use for a LIGHTBULB?": "What is a surprising use for a LIGHT BULB?"
        }
    },
    "range": [1, 5], # different scale than motesp pilot
    "language": "eng"
}
# data was already reshaped to long format for previous study
df = pd.read_csv('../data/raw/motesf_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

- Renaming columns {'ID': 'participant'}

Replacing -999 with NaN in response column


- name: motesf
- no_of_prompts: 24
- no_of_participants: 386
- no_of_data_points: 8563
- prompts: ['ball', 'sock', 'pencil', 'spoon', 'light bulb', 'hat', 'bottle', 'toothbrush', 'smelly', 'soft', 'red', 'frozen', 'wet', 'huge', 'fun', 'tasty', 'school bus', 'games', 'library', 'lecture', 'phone', 'rain', 'closet', 'lunchroom']
- ICC2k: 0.79
- ICC2k_CI: 0.69-0.85
- ICC3k: 0.84
- rater_cols: ['Kscore', 'Hscore', 'Cscore', 'Tscore', 'Mscore']
- no_of_raters: 5




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1249,uses,motesf,What is a surprising use for a SPOON?,spoon,put a spoon in a can,motesf_spoon-9faa43,2.1998,motesfb73ce3,7,eng,5,0.447214
8640,completion,motesf,"Complete this sentence in a surprising way: ""W...",lunchroom,I was in trouble because I was throwing food.,motesf_lunchroom-02e142,2.3996,motesf41f1f1,2,eng,5,0.547723


## Multilingual semantic distance

From: Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.


In the paper, the *exact* provenance of subdata isn't specified, so needs further investigation to determine why additional citations are needed:

> We received 30 datasets, with a combined sample size of 6,522, reflecting data from 22 labs and 12 languages: Arabic, Chinese, Dutch, English, Farsi, French German, Hebrew, Italian, Polish, Russian, and Spanish (see Figure 1). Several datasets came from published studies, whereas others have not been used for publication.

For verification, here are the published ICC values (which look to be the ICC3k values):

| Dataset  | Raters | ICC  |
|----------|--------|------|
| Arabic1  | 1      | N/A  |
| Chinese1 | 4      | 0.49 |
| Chinese2 | 4      | 0.64 |
| Dutch1   | 2      | 0.81 |
| Dutch2   | 2      | 0.94 |
| Dutch3   | 2      | 0.87 |
| Dutch4   | 3      | 0.85 |
| English1 | 4      | 0.84 |
| English2 | 4      | 0.77 |
| English3 | 3      | 0.56 |
| English4 | 3      | 0.72 |
| English5 | 3      | 0.78 |
| English6 | 3      | 0.64 |
| Farsi1   | 3      | 0.69 |
| Farsi2   | 3      | 0.75 |
| French1  | 4      | 0.8  |
| French2  | 3      | 0.64 |
| French3  | 3      | 0.75 |
| French4  | 3      | 0.8  |
| German1  | 4      | 0.71 |
| German2  | 3      | 0.78 |
| German3  | 3      | 0.86 |
| Hebrew1  | 45     | 0.88 |
| Italian1 | 2      | 0.89 |
| Italian2 | 2      | 0.88 |
| Polish1  | 3      | 0.82 |
| Polish2  | 2      | 0.6  |
| Russian1 | 3      | 0.72 |
| Russian2 | 3      | 0.79 |
| Spanish1 | 3      | 0.74 |

In [12]:
desc = {
    "name": "multiaut",
    "test_type": "uses",
    "meta": {
        "inline": "Patterson et al., 2023",
        "citation": "Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.",
        "url": "https://doi.org/10.1037/aca0000618",
        "download": {
            "url": "https://files.osf.io/v1/resources/5cy9n/providers/github/processed-data/?zip=",
            "extension": "zip",
            # Excluded for redundancy
            # english6 is the same as setal08
            # english5 is the same as bs12
            # english4 is the same as snb17
            # english1 is the same as betal18
            "archive_files": [
                "arabic1.csv", "dutch4.csv", "german3.csv", "russian1.csv",
                "chinese1.csv", "french2.csv", "hebrew1.csv", "russian2.csv",
                "chinese2.csv", "english2.csv", "french3.csv", "italian1.csv", "spanish1.csv",
                "dutch1.csv", "english3.csv", "french4.csv", "italian2.csv",
                "dutch2.csv", "german1.csv", "polish1.csv",
                "dutch3.csv", "german2.csv" #"polish2.csv" parsed separately
            ]
        }
    },
    "column_mappings": {'ID': 'participant', 'order':'response_num'}
}

fnames = download_from_description(desc, '../data/raw')

In [13]:
# There's a problem with the rater 2 column in the processed polish2.csv, so re-process from raw data
polish2_desc = {
    "name": "multiaut_polish2",
    "test_type": "uses",
    "meta": {
        "inline": "Patterson et al., 2023",
        "citation": "Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.",
        "url": "https://doi.org/10.1037/aca0000618",
        "download": {
            "url": "https://osf.io/download/8xr4d",
            "extension": "xlsx",
        }
    },
    "column_mappings": {'item': 'prompt', 'ID': 'participant', 'order':'response_num'},
    'replace_values': {
        "response_num": { ('p' + str(x)): x for x in range(1, 21) }
    },
    "language": "pol",
    "rater_cols": ['coder1','coder2','coder3','coder4'],
    "range": [1, 7],
}

polish_fname = download_from_description(polish2_desc, '../data/raw')[0]
df = pd.read_excel(polish_fname)

In [14]:
# additional info specific to subset datasets
dutch_replace_vals = {
    "prompt": {
        "brick": "backstreen",
        "fork": "vork",
            "towel": "handoek"
    }
}
subsets = {
    'arabic1': {
        'range': [1, 5],
        'replace_values': {
            "prompt": {
                'Tin cans': 'علب الصفيح'
            }
        }
    },
    'chinese2': { 'range': [1, 5] },
    'italian2': { 'range': [1, 5] },
    "french3": { "range": [1, 5] },
    "english3": { "range": [1, 5] },
    "dutch1": {
        "replace_values": dutch_replace_vals
    },
    "dutch2": {
        "replace_values": dutch_replace_vals
    },
    "dutch3": {
        "replace_values": dutch_replace_vals
    },
    "dutch4": {
        "replace_values": dutch_replace_vals
    },
}

language_to_iso = {
    'arabic': 'ara', 'chinese': 'chi', 'dutch': 'dut', 'english': 'eng',
    'farsi': 'per', 'french': 'fre', 'german': 'ger', 'hebrew': 'heb',
    'italian': 'ita', 'polish': 'pol', 'russian': 'rus', 'spanish': 'spa'
}
for fname in [polish_fname] + fnames:
    # language is automatically detected from filename
    subset_desc = desc.copy()
    print(fname.stem)
    if fname.stem == 'multiaut_polish2_0':
        subset_desc.update(polish2_desc)
    else:
        subset_desc['language'] = language_to_iso[fname.stem[:-1]]
        subset_desc['name'] = desc['name'] + '_' + fname.stem
        if subsets.get(fname.stem):
            subset_desc.update(subsets[fname.stem])

    if fname.stem == 'english2':
        print("NOTE: english2 had an encoding error. Opening and re-saving it seems to fix it.")
    if fname.suffix == '.csv':
        df = pd.read_csv(fname, encoding='utf-8')
    elif fname.suffix == '.xlsx':
        df = pd.read_excel(fname)
    cleaned = prep_general(df, **subset_desc, save_dir='../data/datasets')
    display(cleaned.sample(2))

multiaut_polish2_0


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'item': 'prompt', 'ID': 'participant', 'order': 'response_num'}

WARNING: ICC has an undefined error for this dataset

- name: multiaut_polish2
- no_of_prompts: 3
- no_of_participants: 497
- no_of_data_points: 3054
- prompts: ['sznurek', 'puszka', 'cegła']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['coder1', 'coder2', 'coder3', 'coder4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1215,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,jako ozdoba jako flakon,multiaut_polish2_puszka-9f07e7,2.666667,multiaut_polish277646,1,pol,2,0.707107
1843,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,jako pojemnik na przybory kuchenne,multiaut_polish2_puszka-709c72,2.0,multiaut_polish2438680,1,pol,2,0.707107


arabic1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- name: multiaut_arabic1
- no_of_prompts: 1
- no_of_participants: 160
- no_of_data_points: 1524
- prompts: ['علب الصفيح']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rater1']
- no_of_raters: 1




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
504,uses,multiaut_arabic1,ما هو الاستخدام المفاجئ لـ علب الصفيح؟,علب الصفيح,منظم للأحذية,multiaut_arabic1_علب الصفيح-8abc7c,4.0,multiaut_arabic168,8,ara,1,
1095,uses,multiaut_arabic1,ما هو الاستخدام المفاجئ لـ علب الصفيح؟,علب الصفيح,حصالة,multiaut_arabic1_علب الصفيح-b629bd,1.0,multiaut_arabic1141,1,ara,1,


dutch4


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch4
- no_of_prompts: 1
- no_of_participants: 99
- no_of_data_points: 2414
- prompts: ['backstreen']
- ICC2k: 0.84
- ICC2k_CI: 0.82-0.86
- ICC3k: 0.85
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2013,uses,multiaut_dutch4,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,Een put van bouwen,multiaut_dutch4_backstreen-aa75ec,2.3336,multiaut_dutch422,,dut,3,0.57735
471,uses,multiaut_dutch4,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,Huis bouwen,multiaut_dutch4_backstreen-343967,1.8,multiaut_dutch475,,dut,3,0.0


german3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_german3
- no_of_prompts: 16
- no_of_participants: 51
- no_of_data_points: 8065
- prompts: ['Axt', 'Trompete', 'Erbse', 'Tisch', 'Flöte', 'Zange', 'Gurke', 'Bett', 'Tomate', 'Geige', 'Schaufel', 'Schrank', 'Paprika', 'Stuhl', 'Trommel', 'Säge']
- ICC2k: 0.85
- ICC2k_CI: 0.83-0.87
- ICC3k: 0.86
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3762,uses,multiaut_german3,Was ist eine überraschende Verwendung für eine...,Tisch,sich drunter verstecken,multiaut_german3_Tisch-923514,2.0,multiaut_german326,55,ger,3,0.0
6880,uses,multiaut_german3,Was ist eine überraschende Verwendung für eine...,Erbse,einfrieren,multiaut_german3_Erbse-84327a,1.0,multiaut_german348,99,ger,3,0.0


russian1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_russian1
- no_of_prompts: 2
- no_of_participants: 111
- no_of_data_points: 1728
- prompts: ['газета', 'деревянная линейка']
- ICC2k: 0.72
- ICC2k_CI: 0.69-0.74
- ICC3k: 0.72
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1019,uses,multiaut_russian1,Какое удивительное применение для ДЕРЕВЯННОЙ Л...,деревянная линейка,Гриф гитары,multiaut_russian1_деревянная линейка-50202f,4.0,multiaut_russian135,,rus,3,0.0
1056,uses,multiaut_russian1,Какое удивительное применение для ДЕРЕВЯННОЙ Л...,деревянная линейка,Доставать,multiaut_russian1_деревянная линейка-30b62e,2.333,multiaut_russian1108,,rus,3,0.57735


chinese1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0, 5)

- name: multiaut_chinese1
- no_of_prompts: 122
- no_of_participants: 466
- no_of_data_points: 14176
- prompts: ['积木', '漏斗', '轮胎', '盘子', '皮带', '塑料袋', '西瓜', '牙签', '锅', '蚊帐', '头发', '纸盒', '铁链', '硬币', '字典', '报纸', '靴子', '易拉罐', '玉米', '南瓜', '耳机', '喇叭', '领带', '扑克', '手套', '银行卡', '袜子', '镊子', '卫生纸', '蛋清', '小米', '耳机线', '鞋带', '发簪', '柳树', '生姜', '红酒', '白酒', '木头', '火柴', '纽扣', '图钉', '吸管', '牙膏', '船桨', '面团', '西红柿', '荷叶', '磁铁', '弹弓', '钉子', '光盘', '毛巾', '橡皮擦', '球拍', '鹅卵石', '贝壳', '椰子', '花生', '核桃', '铃铛', '酒瓶', '西瓜皮', '冰块', '马来貘', '咖啡', '擀面杖', '戒指', '气球', '芦荟', '橄榄油', '花瓣', '土豆', '曲别针', '浴缸', '茶壶', '灌木', '无花果', '韭菜', '花椒', '纸巾', '皮筋', '黄金', '音响', '画像', '狐狸', '发带', '纸杯', '棉签', '柿子', '白纸', '杯子', '窗帘', '蛋糕', '地图', '风车', '夹子', '胶水', '蜡烛', '毛笔', '墨水', '铅笔', '钳子', '扇子', '勺子', '梳子', '柳条', '水壶', '算盘', '蛋壳', '台灯', '围巾', '温度计', '相机', '香蕉', '牙刷', '钥匙', '衣架', '砖头', '笛子', '酸奶', '吹风机']
- ICC2k: 0.48
- ICC2k_CI: 0.44-0.51
- ICC3k: 0.49
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2029,uses,multiaut_chinese1,手套的一个令人惊讶的用途是什么？,手套,沙包,multiaut_chinese1_手套-feff54,3.1998,multiaut_chinese11068,,chi,4,0.957427
12153,uses,multiaut_chinese1,杯子的一个令人惊讶的用途是什么？,杯子,做头饰,multiaut_chinese1_杯子-3b795d,2.6,multiaut_chinese11406,,chi,4,0.816497


french2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

Dropping 47 unrated items


- name: multiaut_french2
- no_of_prompts: 1
- no_of_participants: 82
- no_of_data_points: 449
- prompts: ['chapeau']
- ICC2k: 0.52
- ICC2k_CI: 0.25-0.68
- ICC3k: 0.64
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
252,uses,multiaut_french2,Quel est un usage surprenant pour un CHAPEAU?,chapeau,Un Ã©ventail,multiaut_french2_chapeau-a566d6,3.0,multiaut_french22794,,fre,3,2.0
422,uses,multiaut_french2,Quel est un usage surprenant pour un CHAPEAU?,chapeau,Bateau,multiaut_french2_chapeau-f1113a,4.334,multiaut_french23049,,fre,3,1.154701


hebrew1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

WARNING: ICC has an undefined error for this dataset

- name: multiaut_hebrew1
- no_of_prompts: 10
- no_of_participants: 51
- no_of_data_points: 2027
- prompts: ['סכין', 'נעל', 'עיפרון', 'עיתון', 'מברג', 'קולב', 'צמיג', 'מטאטא', 'כיסא', 'כרית']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4', 'rater5', 'rater6', 'rater7', 'rater8', 'rater9', 'rater10', 'rater11', 'rater12', 'rater13', 'rater14', 'rater15', 'rater16', 'rater17', 'rater18', 'rater19', 'rater20', 'rater21', 'rater22', 'rater23', 'rater24', 'rater25', 'rater26', 'rater27', 'rater28', 'rater29', 'rater30', 'rater31', 'rater32', 'rater33', 'rater34', 'rater35', 'rater36', 'rater37', 'rater38', 'rater39', 'rater40', 'rater41', 'rater42', 'rater43', 'rater44', 'rater45']
- no_of_raters: 45




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
47,uses,multiaut_hebrew1,מהו שימוש מפתיע לסכין?,סכין,מחורר דפים,multiaut_hebrew1_סכין-568b04,3.124875,multiaut_hebrew11904,1,heb,8,0.991031
749,uses,multiaut_hebrew1,מהו שימוש מפתיע לעיתון?,עיתון,להכין מודעה,multiaut_hebrew1_עיתון-53cd65,1.24975,multiaut_hebrew1392,2,heb,8,0.707107


russian2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_russian2
- no_of_prompts: 1
- no_of_participants: 45
- no_of_data_points: 370
- prompts: ['картонная коробка']
- ICC2k: 0.78
- ICC2k_CI: 0.74-0.82
- ICC3k: 0.79
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
102,uses,multiaut_russian2,Какое удивительное применение для КАРТОННОЙ КО...,картонная коробка,"Из них можно сделать макеты города, а также ка...",multiaut_russian2_картонная коробка-5273d7,2.333,multiaut_russian212,,rus,3,0.57735
93,uses,multiaut_russian2,Какое удивительное применение для КАРТОННОЙ КО...,картонная коробка,Из разрезанной большой коробки можно сделать к...,multiaut_russian2_картонная коробка-ec400f,3.0,multiaut_russian211,,rus,3,0.0


chinese2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

WARNING: 3 out-of-range values clipped from rater1

- name: multiaut_chinese2
- no_of_prompts: 2
- no_of_participants: 217
- no_of_data_points: 1302
- prompts: ['筷子', '易拉罐']
- ICC2k: 0.73
- ICC2k_CI: 0.64-0.79
- ICC3k: 0.77
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
283,uses,multiaut_chinese2,筷子有什么令人惊讶的用途？,筷子,拧绳子的工具,multiaut_chinese2_筷子-ad6fa8,2.9995,multiaut_chinese2148,,chi,4,1.414214
60,uses,multiaut_chinese2,筷子有什么令人惊讶的用途？,筷子,发簪,multiaut_chinese2_筷子-067672,2.0,multiaut_chinese2111,,chi,4,0.816497


english2
NOTE: english2 had an encoding error. Opening and re-saving it seems to fix it.


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_english2
- no_of_prompts: 2
- no_of_participants: 182
- no_of_data_points: 3723
- prompts: ['rope', 'box']
- ICC2k: 0.71
- ICC2k_CI: 0.59-0.78
- ICC3k: 0.77
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1331,uses,multiaut_english2,What is a surprising use for a ROPE?,rope,lasso,multiaut_english2_rope-3e5508,2.0,multiaut_english263,,eng,4,0.816497
2177,uses,multiaut_english2,What is a surprising use for a ROPE?,rope,climb a mountain or tree,multiaut_english2_rope-52f338,1.75025,multiaut_english2106,,eng,4,0.5


french3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

WARNING: 11 out-of-range values clipped from rater3

Dropping 10 unrated items


- name: multiaut_french3
- no_of_prompts: 2
- no_of_participants: 277
- no_of_data_points: 2181
- prompts: ['ceinture', 'brouette']
- ICC2k: 0.73
- ICC2k_CI: 0.68-0.77
- ICC3k: 0.75
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1355,uses,multiaut_french3,Quel est un usage surprenant pour une CEINTURE?,ceinture,Faire Brancard,multiaut_french3_ceinture-150c95,3.667,multiaut_french3174,,fre,3,0.57735
22,uses,multiaut_french3,Quel est un usage surprenant pour une CEINTURE?,ceinture,En bracelet autour du poignet,multiaut_french3_ceinture-360c7d,1.666,multiaut_french34,,fre,3,1.154701


italian1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_italian1
- no_of_prompts: 6
- no_of_participants: 151
- no_of_data_points: 4269
- prompts: ['Attaccapanni', 'Barile', 'Bottiglia di plastica', 'Lampadina', 'Libro', 'Sedia']
- ICC2k: 0.89
- ICC2k_CI: 0.89-0.9
- ICC3k: 0.89
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
299,uses,multiaut_italian1,Qual è un uso sorprendente per un BARILE?,Barile,portafiori,multiaut_italian1_Barile-9f68ae,2.0,multiaut_italian112,,ita,2,0.0
3551,uses,multiaut_italian1,Qual è un uso sorprendente per una BOTTIGLIA D...,Bottiglia di plastica,posacenere,multiaut_italian1_Bottiglia di plastica-bddfee,2.0,multiaut_italian11046,,ita,2,0.0


spanish1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_spanish1
- no_of_prompts: 1
- no_of_participants: 491
- no_of_data_points: 2735
- prompts: ['ladrillo']
- ICC2k: 0.57
- ICC2k_CI: 0.18-0.75
- ICC3k: 0.75
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1590,uses,multiaut_spanish1,¿Cuál es un uso sorprendente para un LADRILLO?,ladrillo,mesas,multiaut_spanish1_ladrillo-75c5d1,2.0,multiaut_spanish1239,4,spa,3,0.0
519,uses,multiaut_spanish1,¿Cuál es un uso sorprendente para un LADRILLO?,ladrillo,decoraciÃ³n,multiaut_spanish1_ladrillo-3b1d46,2.666,multiaut_spanish179,3,spa,3,1.154701


dutch1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch1
- no_of_prompts: 4
- no_of_participants: 633
- no_of_data_points: 10549
- prompts: ['backstreen', 'vork', 'paperclip', 'handoek']
- ICC2k: 0.81
- ICC2k_CI: 0.81-0.82
- ICC3k: 0.81
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
6992,uses,multiaut_dutch1,Wat is een verrassend gebruik voor een VORK?,vork,sla opscheppen,multiaut_dutch1_vork-8511f9,1.8,multiaut_dutch1123,,dut,2,0.0
8866,uses,multiaut_dutch1,Wat is een verrassend gebruik voor een paperclip?,paperclip,paier bij elkaar houden,multiaut_dutch1_paperclip-c67e7d,1.8,multiaut_dutch1778,,dut,2,0.0


english3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

WARNING: 13 out-of-range values clipped from rater3

Dropping 11 unrated items


- name: multiaut_english3
- no_of_prompts: 2
- no_of_participants: 209
- no_of_data_points: 3225
- prompts: ['box', 'rope']
- ICC2k: 0.42
- ICC2k_CI: 0.16-0.58
- ICC3k: 0.56
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
85,uses,multiaut_english3,What is a surprising use for a ROPE?,rope,climb a building,multiaut_english3_rope-2bdc71,1.666,multiaut_english381817,,eng,3,1.154701
1250,uses,multiaut_english3,What is a surprising use for a ROPE?,rope,as a hair-tie,multiaut_english3_rope-1c4944,1.333,multiaut_english382889,,eng,3,0.57735


french4


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_french4
- no_of_prompts: 2
- no_of_participants: 238
- no_of_data_points: 2332
- prompts: ['brouette', 'ceinture']
- ICC2k: 0.79
- ICC2k_CI: 0.78-0.81
- ICC3k: 0.8
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1680,uses,multiaut_french4,Quel est un usage surprenant pour une BROUETTE?,brouette,Fauteuil,multiaut_french4_brouette-b5dd4e,1.333,multiaut_french4170,,fre,3,0.57735
884,uses,multiaut_french4,Quel est un usage surprenant pour une BROUETTE?,brouette,Utiliser comme un parapluie,multiaut_french4_brouette-044ae4,3.666,multiaut_french492,,fre,3,1.154701


italian2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

WARNING: 7 out-of-range values clipped from rater2

Dropping 3 unrated items


- name: multiaut_italian2
- no_of_prompts: 21
- no_of_participants: 80
- no_of_data_points: 6895
- prompts: ['guanto', 'lampadina', 'libro', 'martello', 'mattone', 'cappello', 'cestino', 'coltello', 'cucchiaio', 'graffetta', 'accendino', 'accetta', 'appendino', 'aspirapolvere', 'banana', 'barattolo', 'bicicletta', 'borsa', 'botte', 'bottiglietta', 'capello']
- ICC2k: 0.88
- ICC2k_CI: 0.87-0.89
- ICC3k: 0.88
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
5579,uses,multiaut_italian2,Qual è un uso sorprendente per un APPENDINO?,appendino,arma letale,multiaut_italian2_appendino-c9db17,3.0,multiaut_italian2125,,ita,2,0.0
2419,uses,multiaut_italian2,Qual è un uso sorprendente per una BOTTE?,botte,tavolo da giardino,multiaut_italian2_botte-5dbd9c,2.0,multiaut_italian231,,ita,2,0.0


dutch2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch2
- no_of_prompts: 2
- no_of_participants: 111
- no_of_data_points: 1640
- prompts: ['backstreen', 'paperclip']
- ICC2k: 0.94
- ICC2k_CI: 0.93-0.95
- ICC3k: 0.94
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1019,uses,multiaut_dutch2,Wat is een verrassend gebruik voor een paperclip?,paperclip,Haarschuifje,multiaut_dutch2_paperclip-f62c96,3.4,multiaut_dutch21144,,dut,2,0.0
861,uses,multiaut_dutch2,Wat is een verrassend gebruik voor een paperclip?,paperclip,als architecturele inspiratie gebruiken,multiaut_dutch2_paperclip-649503,4.2,multiaut_dutch21209,,dut,2,0.0


german1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 3.0)

- name: multiaut_german1
- no_of_prompts: 3
- no_of_participants: 298
- no_of_data_points: 8116
- prompts: ['konservendose', 'messer', 'haarfoehn']
- ICC2k: 0.7
- ICC2k_CI: 0.68-0.72
- ICC3k: 0.71
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2131,uses,multiaut_german1,Was ist eine überraschende Verwendung für ein ...,messer,fisch entschuppen,multiaut_german1_messer-7e4431,1.666667,multiaut_german186,,ger,4,0.57735
5314,uses,multiaut_german1,Was ist eine überraschende Verwendung für ein ...,messer,Ein herz in Baum schnitzen,multiaut_german1_messer-dd27e0,2.333333,multiaut_german1197,,ger,4,0.0


polish1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 7)

- name: multiaut_polish1
- no_of_prompts: 2
- no_of_participants: 791
- no_of_data_points: 7415
- prompts: ['puszka', 'cegła']
- ICC2k: 0.82
- ICC2k_CI: 0.81-0.83
- ICC3k: 0.82
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1054,uses,multiaut_polish1,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,jako pojemnik,multiaut_polish1_puszka-e01cc6,1.0,multiaut_polish1103610,p2,pol,3,0.0
4639,uses,multiaut_polish1,Jakie jest zaskakujące zastosowanie dla CEGŁY?,cegła,nowa dyscyplina sportu,multiaut_polish1_cegła-48e9d5,2.555333,multiaut_polish180793,p2,pol,3,0.57735


dutch3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0, 5)

- name: multiaut_dutch3
- no_of_prompts: 1
- no_of_participants: 111
- no_of_data_points: 1004
- prompts: ['backstreen']
- ICC2k: 0.86
- ICC2k_CI: 0.79-0.89
- ICC3k: 0.87
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
623,uses,multiaut_dutch3,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,gebruiken als stempel,multiaut_dutch3_backstreen-544604,4.2,multiaut_dutch31454,,dut,2,0.0
33,uses,multiaut_dutch3,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,plein,multiaut_dutch3_backstreen-0de49d,1.4,multiaut_dutch31281,,dut,2,0.707107


german2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_german2
- no_of_prompts: 3
- no_of_participants: 154
- no_of_data_points: 3530
- prompts: ['Büroklammer', 'Mülltüte', 'Seil']
- ICC2k: 0.71
- ICC2k_CI: 0.54-0.8
- ICC3k: 0.77
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1399,uses,multiaut_german2,Was ist eine überraschende Verwendung für eine...,Büroklammer,um einen sehr kleinen Knopf zu drücken,multiaut_german2_Büroklammer-096dfd,2.667,multiaut_german21649MAAP161,,ger,3,0.57735
1350,uses,multiaut_german2,Was ist eine überraschende Verwendung für eine...,Büroklammer,Ohrring,multiaut_german2_Büroklammer-93af12,1.667,multiaut_german21645RGJU170,,ger,3,0.57735


## TransDis

A Chinese-language AUT dataset.

In [15]:
desc = {
    "name": "transdis",
    "test_type": "uses",
    "meta": {
        "inline": "Yang et al., 2023",
        "citation": "Yang, T., Zhang, Q., Sun, Z., & Hou, Y. (2023). Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach. arXiv preprint arXiv:2306.14790.",
        "url": "https://arxiv.org/abs/2306.14790",
        "download": [{
            "url": "https://osf.io/download/3fk8y", 
            "extension": "xlsx"
            }, {
            "url": "https://osf.io/download/mcwtu", 
            "extension": "xlsx"
            }],
    },
    "null_marker": "NA",
    "column_mappings": {'Item': 'prompt', 'Response': 'response',
                        'ParticipantID': 'participant'},
    "range": [0, 4],
    "rater_cols": ['Originality_Rater1', 'Originality_Rater2'],
    "language":"chi"
}

fnames = download_from_description(desc, '../data/raw')
df = pd.concat([pd.read_excel(fname) for fname in fnames])
# number each participant's responses in order (based on responseID)
df['response_num'] = df.groupby('ParticipantID').cumcount() + 1
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)


### Loading *Yang et al., 2023*

Yang, T., Zhang, Q., Sun, Z., & Hou, Y. (2023). Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach. arXiv preprint arXiv:2306.14790.

- Renaming columns {'Item': 'prompt', 'Response': 'response', 'ParticipantID': 'participant'}

Replacing NA with NaN in response column
Dropping 4 unrated items


- name: transdis
- no_of_prompts: 4
- no_of_participants: 350
- no_of_data_points: 8007
- prompts: ['床单', '筷子', '拖鞋', '牙刷']
- ICC2k: 0.67
- ICC2k_CI: 0.6-0.73
- ICC3k: 0.7
- rater_cols: ['Originality_Rater1', 'Originality_Rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3853,uses,transdis,拖鞋的一个令人惊讶的用途是什么？,拖鞋,花盆,transdis_拖鞋-09554f,3.0,transdis99,10,chi,2,0.0
3130,uses,transdis,筷子有什么令人惊讶的用途？,筷子,做秤杆,transdis_筷子-f95270,3.0,transdis280,8,chi,2,0.0


## DiStefano, Beaty, Patterson, 2023 (Metaphors)

Based on the paper, semantic distance with BERT DSI correlated with human ratings  $r=.42$ on the held out dataset; GPT-2 at $r=.70$, and RoBERTa at $r=.72$.

In [16]:
desc = {
    "name": "dbc23",
    "test_type": "metaphors",
    "meta": {
        "inline": "DiStefano, Beaty, Patterson, 2023",
        "citation": "DiStefano, P. V., Patterson, J. D., & Beaty, R. (2023). Automatic Scoring of Metaphor Creativity with Large Language Models. https://doi.org/10.31234/osf.io/6jtxb",
        "url": "https://arxiv.org/abs/2306.14790",
        "download": [{
            "url": "https://osf.io/download/mr5a3", 
            "extension": "csv"
            }],
    },
    "column_mappings": {'item': 'prompt', 'Story': 'response',
                        'ID': 'participant'},
    # the released dataset is already merged among the raters
    "rater_cols": ['rating'],
    "language":"eng",
    "question_mappings": {
        'boring class': 'Think of the most boring high-school or college class you’ve ever had. What was it like to sit through?',
        'gross food': 'Think about the most disgusting thing you ever ate or drank. What was it like to eat or drink it?',
        'bad movie': 'Think about the worst movie or TV show you have ever seen. What was it like to watch it?',
        'messy room': 'Think of the messiest room that you’ve ever had to live in. What was it like to live there?'
        }
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *DiStefano, Beaty, Patterson, 2023*

DiStefano, P. V., Patterson, J. D., & Beaty, R. (2023). Automatic Scoring of Metaphor Creativity with Large Language Models. https://doi.org/10.31234/osf.io/6jtxb

- Renaming columns {'item': 'prompt', 'Story': 'response', 'ID': 'participant'}

- Inferring questions {'boring class': 'Think of the most boring high-school or college class you’ve ever had. What was it like to sit through?', 'gross food': 'Think about the most disgusting thing you ever ate or drank. What was it like to eat or drink it?', 'bad movie': 'Think about the worst movie or TV show you have ever seen. What was it like to watch it?', 'messy room': 'Think of the messiest room that you’ve ever had to live in. What was it like to live there?'}

- Inferred range of original data: (-1.675025493, 4.115607135)

- name: dbc23
- no_of_prompts: 4
- no_of_participants: 1546
- no_of_data_points: 4589
- prompts: ['boring class', 'gross food', 'bad movie', 'messy room']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rating']
- no_of_raters: 1




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1803,metaphors,dbc23,Think about the worst movie or TV show you hav...,bad movie,watching that movie was like watching poorly ...,dbc23_bad movie-642cc5,3.705507,dbc237066,,eng,1,
1191,metaphors,dbc23,Think of the most boring high-school or colleg...,boring class,That class was like taking a shit boring but ...,dbc23_boring class-fb200a,3.056157,dbc235059,,eng,1,


## Haas 2018

This data uses online Mechanical Turk judges to re-judge Silvia et al. 2008. While the quality is likely to be lower, the position taken with Ocsai is that more raters is better, augmenting the btter trained raters.

In [17]:
desc = {
    "name": "h18",
    "meta": {
        "inline": "Hass, Rivera, Silvia 2018",
        "citation": "Hass, R. W., Rivera, M., & Silvia, P. J. (2018). On the Dependability and Feasibility of Layperson Ratings of Divergent Thinking. Frontiers in Psychology, 9. https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01343",
        "url": "https://doi.org/10.3389/fpsyg.2018.01343",
        "download": [{
            "url": "https://osf.io/download/p2b9c", 
            "extension": "csv"
            }],
    },
    "null_marker": " ",
    "column_mappings": {'subject': 'participant', 'task':'prompt',
                        'order':'response_num'},
    "range": [1, 5],
    "rater_cols": ['rater1', 'rater2', 'rater3'],
    "replace_values": {
        "prompt": {
            'Brick': "brick",
            'Round Things': "round",
            'Sleep': "no sleep",
            'Knife': "knife",
            'Make Noises': "noise",
            '12 Inches': "shrank"
        },
    },
    "question_mappings": {
        "brick": "What is a surprising use for a BRICK?",
        "round": "What is a surprising thing that is ROUND?", 
        "no sleep": "What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?", 
        "knife": "What is a surprising use for a KNIFE?",
        "noise": "What is a surprising thing that makes a NOISE?",
        "shrank": "What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?"
    },
    "type_mappings": {
        "brick": "uses",
        "round": "uses",
        "no sleep": "consequences",
        "knife": "uses",
        "noise": "instances",
        "shrank": "consequences"
    },
    "language":"eng"
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Hass, Rivera, Silvia 2018*

Hass, R. W., Rivera, M., & Silvia, P. J. (2018). On the Dependability and Feasibility of Layperson Ratings of Divergent Thinking. Frontiers in Psychology, 9. https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01343

- Renaming columns {'subject': 'participant', 'task': 'prompt', 'order': 'response_num'}

- Inferring questions {'brick': 'What is a surprising use for a BRICK?', 'round': 'What is a surprising thing that is ROUND?', 'no sleep': 'What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?', 'knife': 'What is a surprising use for a KNIFE?', 'noise': 'What is a surprising thing that makes a NOISE?', 'shrank': 'What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?'}

- Inferring types {'brick': 'uses', 'round': 'uses', 'no sleep': 'consequences', 'knife': 'uses', 'noise': 'instances', 'shrank': 'consequences'}

Replacing   with NaN in response column
Dropping 37 unrated items


- name: h18
- no_of_prompts: 6
- no_of_participants: 242
- no_of_data_points: 11490
- prompts: ['shrank', 'brick', 'knife', 'noise', 'round', 'no sleep']
- ICC2k: 0.43
- ICC2k_CI: 0.22-0.57
- ICC3k: 0.54
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2856,consequences,h18,What would be a surprising consequence if PEOP...,no sleep,the world will be chaos,h18_no sleep-17cbd0,1.0,h1864,1,eng,3,0.0
10121,uses,h18,What is a surprising use for a BRICK?,brick,to juggle with,h18_brick-d5917d,2.334,h18213,8,eng,3,1.154701


# Summary of Stats

(Also check for redundancy)

In [18]:
import duckdb
conn = duckdb.connect("../data/datasets/stats_db.duckdb")
stats = pd.read_sql('select * from stats', conn)
# sort to check for redundancy
stats.sort_values('no_of_data_points')



Unnamed: 0,name,no_of_prompts,no_of_participants,no_of_data_points,prompts,ICC2k,ICC2k_CI,ICC3k,rater_cols,no_of_raters
22,multiaut_russian2,1,45,370,[картонная коробка],0.78,0.74-0.82,0.79,"[rater1, rater2, rater3]",3
20,multiaut_french2,1,82,449,[chapeau],0.52,0.25-0.68,0.64,"[rater1, rater2, rater3]",3
12,motesp,29,35,963,"[backpack, ball, bottle, hat, light bulb, penc...",0.73,0.66-0.78,0.75,"[D, K, T]",3
35,multiaut_dutch3,1,111,1004,[backstreen],0.86,0.79-0.89,0.87,"[rater1, rater2]",2
6,hass17,2,57,1093,"[bottle, brick]",0.79,0.75-0.82,0.8,"[r1, r2, r3]",3
23,multiaut_chinese2,2,217,1302,"[筷子, 易拉罐]",0.73,0.64-0.79,0.77,"[rater1, rater2, rater3, rater4]",4
15,multiaut_arabic1,1,160,1524,[علب الصفيح],,,,[rater1],1
32,multiaut_dutch2,2,111,1640,"[backstreen, paperclip]",0.94,0.93-0.95,0.94,"[rater1, rater2]",2
18,multiaut_russian1,2,111,1728,"[газета, деревянная линейка]",0.72,0.69-0.74,0.72,"[rater1, rater2, rater3]",3
11,bs12,1,133,1807,[brick],0.72,0.56-0.8,0.78,"[br_rater1, br_rater2, br_rater3]",3


In [19]:
pd.read_csv('../data/datasets/multiaut_polish2.csv').sort_values('target', ascending=False)

Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1815,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,do zbudowania tłoka,multiaut_polish2_puszka-7374fb,5.000000,multiaut_polish2423427,1,pol,2,0.000000
1295,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,jako masażer do stóp,multiaut_polish2_puszka-30158a,5.000000,multiaut_polish291370,6,pol,2,0.000000
1908,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,można z kilku puszek zrobi pas kuloodporny,multiaut_polish2_puszka-006bb5,5.000000,multiaut_polish2450763,4,pol,2,0.000000
1084,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,Odpowiesnio przygotowana pozwala na konstrukcj...,multiaut_polish2_puszka-bc3c91,4.666667,multiaut_polish230277,3,pol,2,0.707107
1642,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,jak się włoży głośniczek do środka to wzmacnia...,multiaut_polish2_puszka-b71196,4.666667,multiaut_polish2372153,5,pol,2,0.707107
...,...,...,...,...,...,...,...,...,...,...,...,...
1358,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,Jako popielniczkę,multiaut_polish2_puszka-c0cba8,1.000000,multiaut_polish2105584,1,pol,2,0.000000
1353,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,popielniczka,multiaut_polish2_puszka-80b0eb,1.000000,multiaut_polish2104698,1,pol,2,0.000000
1351,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,popielniczkę,multiaut_polish2_puszka-16da0d,1.000000,multiaut_polish2104259,3,pol,2,0.000000
2718,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla CEGŁY?,cegła,do budowy,multiaut_polish2_cegła-4d8edd,1.000000,multiaut_polish2386928,1,pol,3,0.000000


In [20]:
from pathlib import Path
all = pd.concat([pd.read_csv(x) for x in Path('../data/datasets/').glob('*csv')])
print("Total number of rows", len(all))
display(all['type'].value_counts())

Total number of rows 162872


uses            140193
instances         8572
consequences      6561
metaphors         4589
completion        2957
Name: type, dtype: int64

In [22]:
all.groupby(['src', 'type']).size().sort_values(ascending=False)

src                type        
multiaut_chinese1  uses            14176
multiaut_dutch1    uses            10549
multiaut_german1   uses             8116
multiaut_german3   uses             8065
transdis           uses             8007
multiaut_polish1   uses             7415
multiaut_italian2  uses             6895
setal08            uses             5582
h18                uses             5582
dod20              uses             5490
dbc23              metaphors        4589
multiaut_italian1  uses             4269
snbmo09            uses             4099
hmsl               uses             3843
multiaut_english2  uses             3723
multiaut_german2   uses             3530
multiaut_english6  uses             3425
multiaut_english3  uses             3225
h18                consequences     3198
setal08            consequences     3198
multiaut_polish2   uses             3054
betal18            uses             2918
motesf             uses             2913
                   instan