This notebook normalizes human judged ground truth from various originality and creativity scoring studies.

This is an assortment of studies, with different demographics, goals, and test setups. It is most appropriate for supervised learning of automated scoring, where we're not necessarily trying to learn about the participants but about the *human judges* - how they interpret the originality scoring task in general.

In [217]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [218]:
from ocsai.data import download_from_description, prep_general
import pandas as pd

## Dumas et al 2020

In [186]:
desc = {
    "name": "dod20",
    "test_type": "uses",
    "meta": {
        "inline": "Dumas et al 2020",
        "download": {"url": "https://osf.io/download/u3yv4/", "extension": "csv"}
    },
    "null_marker": "!!!",
    "column_mappings": {},
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0], index_col=0)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Dumas et al 2020*

Replacing !!! with NaN in response column


- name: dod20
- no_of_prompts: 10
- no_of_participants: 92
- no_of_data_points: 5490
- prompts: ['book', 'bottle', 'brick', 'fork', 'pants', 'rope', 'shoe', 'shovel', 'table', 'tire']
- ICC2k: 0.85
- ICC2k_CI: 0.79-0.88
- ICC3k: 0.87
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
743,uses,dod20,What is a surprising use for BOTTLE?,bottle,vase,dod20_bottle-8db151,2.5,dod2081,6,eng,4,0.57735
1133,uses,dod20,What is a surprising use for ROPE?,rope,hanging a swing. use old rope for hanging tree...,dod20_rope-bb9f17,2.75025,dod2010,0,eng,4,1.258306


## Silvia et al 2009

In [187]:
desc = {
    "name": "snbmo09",
    "test_type": "uses",
    "meta": {
        "inline": "Silvia et al. 2009",
        "citation": "Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015",
        "download": {"url": "https://osf.io/download/qdrv8/", "ext": "csv"}
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "language": "eng",
}
fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0])
df['prompt'] = df.task.apply(lambda x: x.split('_')[-1])
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2009*

Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015

- Renaming columns {'subject': 'participant', 'response_order': 'response_num'}

Dropping 10 unrated items


- name: snbmo09
- no_of_prompts: 3
- no_of_participants: 202
- no_of_data_points: 4099
- prompts: ['brick', 'knife', 'box']
- ICC2k: 0.69
- ICC2k_CI: 0.57-0.77
- ICC3k: 0.76
- rater_cols: ['rater_1', 'rater_2', 'rater_3', 'rater_4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1553,uses,snbmo09,What is a surprising use for BOX?,box,hold food,snbmo09_3_box-f37fae,1.0,snbmo0974,14,eng,4,0.0
3599,uses,snbmo09,What is a surprising use for BOX?,box,for a person to live in,snbmo09_3_box-52ab3b,1.5,snbmo09175,2,eng,4,0.57735


## Hass 2017

This study looked at uses for *bottle* and *brick*. There were 54 participants after data cleaning.

Rating was on a 5-point scale. For verification, their reported inter-rater reliability was ICC2k was 0.80 for brick and 0.78 for bottle, which is about what we see below.

The rating data was stoplisted, so I need to reconstruct the original responses here.

In [188]:
desc = {
    "name": 'hass17',
    "test_type": "uses",
    "meta": {
        "inline": "Hass 2017",
        "citation": "Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039",
        "url": "https://osf.io/ng598",
        "download": [
            {"url": 'https://osf.io/download/mcykr/', "ext": "xlsx"}, # rater scores
            {"url": 'https://osf.io/download/27bx8/', "ext": "xlsx"},  # responses 1
            {"url": 'https://osf.io/download/rzvyd/', "ext": "xlsx"}  # responses 2
        ],
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "rater_cols": ['r1','r2','r3'],
    "language": "eng",
}

(ratings_fname, responses_fname, responses2_fname) = download_from_description(desc, '../data/raw')

# custom parsing specific to this dataset
all_ratings = []
for sheet, prompt in [('br_exp1', 'brick'),('br_exp2', 'brick'),('bot_exp1', 'bottle'),('bot_exp2', 'bottle')]:
    data = pd.read_excel(ratings_fname, sheet_name=sheet) #.rename(columns={'subject':'participant','response_order':'response_num'})
    data['prompt'] = prompt
    all_ratings.append(data)
hassratings = pd.concat(all_ratings).rename(columns={'response':'cleaned'})
participants = pd.concat([pd.read_excel(responses_fname), pd.read_excel(responses2_fname)])

# melt original responses to long, reconstructe the cleaned columns, then join with ratings
long_part = participants.melt(id_vars='ID', value_name='response').rename(columns={'ID':'participant'})
long_part = long_part[long_part.variable.str.contains('resp') & ~long_part.variable.str.contains('time')].dropna()
long_part[['prompt', 'response_num']] = long_part.variable.str.split('_', expand=True)

long_part.loc[long_part.prompt.str.contains('resp1'), 'prompt'] = 'bottle'
long_part.loc[long_part.prompt.str.contains('resp2'), 'prompt'] = 'brick'
long_part.sample(10)

import nltk
nltk.download('punkt')
nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stops = stopwords.words('english')
# not sure which list the study used, so just adjust based on testing
stops += ['could']
stops = [w for w in stops if w not in ['can']]
stops = set(stops)

def stop_clean(x):
    x = x.lower()
    x = x.replace('i.e.', 'e') # quirk of the tokenization in original study
    for c in list("/\\'-()"):
        x = x.replace(c, '')
    words = [word for word in word_tokenize(x) if word not in stops]
    return " ".join(words)

long_part['cleaned'] = long_part.response.apply(stop_clean)
hass07 = long_part.merge(hassratings, how='left', on=['prompt', 'cleaned'])

cleaned = prep_general(hass07, **desc, save_dir='../data/datasets')
cleaned.sample(2)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Loading *Hass 2017*

Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039

- Renaming columns {'subject': 'participant', 'response_order': 'response_num'}

- name: hass17
- no_of_prompts: 2
- no_of_participants: 57
- no_of_data_points: 1093
- prompts: ['bottle', 'brick']
- ICC2k: 0.79
- ICC2k_CI: 0.75-0.82
- ICC3k: 0.8
- rater_cols: ['r1', 'r2', 'r3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
798,uses,hass17,What is a surprising use for BRICK?,brick,build a house,hass17_brick-1ad7cb,1.0,hass1753,3,eng,3,0.0
505,uses,hass17,What is a surprising use for BOTTLE?,bottle,put coins in it,hass17_bottle-8680ec,2.0,hass1747,9,eng,3,0.0


## Silvia et al 2008

This was the order of creativity tasks:

1. Please list all of the creative, unusual uses for a brick that you can think of.
2. Please list all of the creative, unusual instances of things that are round that you can think of.
3. Imagine that people no longer needed to sleep. Please list creative, unusual consequences that would follow.
4. Please list all of the creative, unusual uses for a knife that you can think of.
5. Please list all of the creative, unusual instances of things that will make a noise that you can think of.
6. Imagine that everyone shrank to 12 inches tall. Please list creative, unusual consequences that would follow.

Numbers 1 and 4 are AUT.



In [189]:
# Support .sav files
import pyreadstat
desc = {
    "name": "setal08",
    "meta": {
        "inline": "Silvia et al. 2008",
        "citation": "Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68",
        "url": "https://osf.io/dh7ey/",
        "download": {"url": "https://files.osf.io/v1/resources/4ketx/providers/osfstorage/5dd70d1f83135e000ec3c242/?zip=",
                    "extension": "zip",
                    "archive_files": ['DT_Responses_PACA_2008_Study_2.sav']
                    }
    },
    "column_mappings": {
        "subject":"participant",
        "order":"response_num"
        },
    "replace_values": {
        "prompt": {
            1: "brick",
            2: "round",
            3: "no sleep",
            4: "knife",
            5: "noise",
            6: "shrank"
        },
    },
    "question_mappings": {
        "brick": "What is a surprising use for a BRICK?",
        "round": "What is a surprising thing that is ROUND?", 
        "no sleep": "What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?", 
        "knife": "What is a surprising use for a KNIFE?",
        "noise": "What is a surprising thing that makes a NOISE?",
        "shrank": "What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?"
    },
    "type_mappings": {
        "brick": "uses",
        "round": "uses",
        "no sleep": "consequences",
        "knife": "uses",
        "noise": "instances",
        "shrank": "consequences"
    },
    "range": [1, 5],
    "language": "eng",
}

# Download data
fnames = download_from_description(desc, '../data/raw', extension='zip')

# Some manual cleanup
df, meta = pyreadstat.read_sav(fnames[0])
# all three are mapped from task
for col in ['prompt', 'type', 'question']:
    df[col] = df['task'].astype(int)
df['subject'] = df['subject'].astype(int)
# doublecheck - burczak reported ICC2k as 0.48 for uses
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2008*

Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68

- Renaming columns {'subject': 'participant', 'order': 'response_num'}

- Inferring questions {'brick': 'What is a surprising use for a BRICK?', 'round': 'What is a surprising thing that is ROUND?', 'no sleep': 'What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?', 'knife': 'What is a surprising use for a KNIFE?', 'noise': 'What is a surprising thing that makes a NOISE?', 'shrank': 'What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?'}

- Inferring types {'brick': 'uses', 'round': 'uses', 'no sleep': 'consequences', 'knife': 'uses', 'noise': 'instances', 'shrank': 'consequences'}

Dropping 37 unrated items


- name: setal08
- no_of_prompts: 6
- no_of_participants: 242
- no_of_data_points: 11490
- prompts: ['brick', 'round', 'no sleep', 'knife', 'noise', 'shrank']
- ICC2k: 0.43
- ICC2k_CI: 0.22-0.57
- ICC3k: 0.54
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
10764,instances,setal08,What is a surprising thing that makes a NOISE?,noise,laying on a tanning bed,setal08_5.0-2f2e7d,1.667,setal08225,12.0,eng,3,0.57735
9629,uses,setal08,What is a surprising thing that is ROUND?,round,heart valves,setal08_2.0-bc56af,2.0,setal08203,9.0,eng,3,1.0


## Hofelich Mohr, Sell, and Lindsay 2016

In [190]:
desc = {
    "name": "hmsl",
    "meta": {
        "inline": "Hofelich Mohr et al. 2016",
        "citation": "Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736",
        "url": "https://doi.org/10.1177/0894439315588736",
        "download": {
            "url": "https://conservancy.umn.edu/bitstream/handle/11299/172116/HMSL_CSV%20Data%20Files.zip?sequence=28&isAllowed=y",
            "extension": "zip",
            "archive_files": ['HMSL_Originality_scores_all.csv']   
        }
    },
    "null_marker": 11,
    "column_mappings": {'Item': 'prompt', 'QLogin_1':'participant'},
    "rater_cols": ['J1_Rating','J2_Rating','J3_Rating','J4_Rating'],
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
# Doublecheck ICC2k - burczak paper had icc2k=0.67
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Hofelich Mohr et al. 2016*

Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736

- Renaming columns {'Item': 'prompt', 'QLogin_1': 'participant'}

Replacing 11 with NaN in response column
Dropping 23 unrated items


- name: hmsl
- no_of_prompts: 2
- no_of_participants: 638
- no_of_data_points: 3843
- prompts: ['paperclip', 'brick']
- ICC2k: 0.67
- ICC2k_CI: 0.53-0.75
- ICC3k: 0.74
- rater_cols: ['J1_Rating', 'J2_Rating', 'J3_Rating', 'J4_Rating']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3555,uses,hmsl,What is a surprising use for BRICK?,brick,supporting a shelf,hmsl_brick-99132d,2.24975,hmsly42hKMsQ,2.0,eng,4,0.5
848,uses,hmsl,What is a surprising use for BRICK?,brick,As a door stop,hmsl_brick-a7fca3,1.24975,hmslWMPE0i6n,2.0,eng,4,0.5


## Datasets used by Beaty and Johnson 2021

From SemDis paper:

- Study 1 was re-analysis of AUT responses from Beaty et al., 2018 to see if ensemble approaches work better. Two tests: `box` and `rope`
   - according to their paper, using additive composition was slightly negative correlation, while multiplicative 'results revealed a large correlation between latent semantic distance and human ratings:$r=.91$, p<.001'. This uses a model that weighs the factors, but is (I think) tailored to the dataset without held out data.

- Study 2 was re-analysis of results from Silvia et al. 2017, also on box and rope 
- Study 3 was brick - yet again - via Beaty and Silvia 2012
- Study 4 and 5- Heinen and Johnson (2018) - were noun matching, not relevant here

In [191]:
desc = {
    "name": "bj21",
    "meta": {
        "inline": "Beaty and Johnson 2021",
        "citation": "Beaty, R. E., & Johnson, D. R. (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. Behavior Research Methods, 53(2), 757–780. https://doi.org/10.3758/s13428-020-01453-w",
        "url": "https://doi.org/10.3758/s13428-020-01453-w",
        "download": {
            "url": "https://files.osf.io/v1/resources/gz4fc/providers/osfstorage/5e45b6c73e86a800be6e662e/?zip=",
            "extension": "zip",
            "archive_files": ['Study 1/s1_data_long.xlsx',
                              'Study 2/s2_data_long.xlsx',
                              'Study 3/s3_data_long.xlsx']   
        }
    },
    "column_mappings": {'id':'participant', 'item':'prompt'},
    "range": [1, 5],
    "language": "eng",
}

substudies = [
    {
        "name": "betal18",
        "meta": {
            "inline": "Beaty et al., 2018",
            "citation": "Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115"
        }
    },
    {
        "name": "snb17",
        "meta": {
            "inline": "Silvia et al., 2017",
            "citation": "Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101"
        }
    },
    {
        "name": "bs12",
        "meta": {
            "inline": "Beaty & Silvia, 2012",
            "citation": "Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171"
        }
    },

]

fnames = download_from_description(desc, '../data/raw')
# the data comes from past studies, so we'll rename the files
# individually to their original studies
for fname,substudy in zip(fnames, substudies):
    new_desc = desc.copy()
    new_desc.update(substudy)
    df = pd.read_excel(fname)
    cleaned = prep_general(df, **new_desc, save_dir='../data/datasets')
    display(cleaned.sample(2))


### Loading *Beaty et al., 2018*

Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: betal18
- no_of_prompts: 2
- no_of_participants: 171
- no_of_data_points: 2918
- prompts: ['box', 'rope']
- ICC2k: 0.82
- ICC2k_CI: 0.77-0.85
- ICC3k: 0.84
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
955,uses,betal18,What is a surprising use for BOX?,box,punching bag,betal18_box-6c9947,2.5,betal182102,,eng,4,0.57735
957,uses,betal18,What is a surprising use for BOX?,box,Hat,betal18_box-624790,1.0,betal182104,,eng,4,0.0


### Loading *Silvia et al., 2017*

Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: snb17
- no_of_prompts: 2
- no_of_participants: 142
- no_of_data_points: 2372
- prompts: ['box', 'rope']
- ICC2k: 0.67
- ICC2k_CI: 0.55-0.75
- ICC3k: 0.72
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
622,uses,snb17,What is a surprising use for BOX?,box,Recreate a scene from your favorite movie or b...,snb17_box-48137f,2.333,snb1778,,eng,3,0.57735
529,uses,snb17,What is a surprising use for BOX?,box,Trash Can,snb17_box-38f92a,1.333,snb1761,,eng,3,0.57735


### Loading *Beaty & Silvia, 2012*

Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171

- Renaming columns {'id': 'participant', 'item': 'prompt'}

- name: bs12
- no_of_prompts: 1
- no_of_participants: 133
- no_of_data_points: 1807
- prompts: ['brick']
- ICC2k: 0.72
- ICC2k_CI: 0.56-0.8
- ICC3k: 0.78
- rater_cols: ['br_rater1', 'br_rater2', 'br_rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
425,uses,bs12,What is a surprising use for BRICK?,brick,cruise control,bs12_brick-611408,2.333,bs1229,,eng,3,0.57735
89,uses,bs12,What is a surprising use for BRICK?,brick,You can use to make door knobs but it will be ...,bs12_brick-17d8c2,3.333,bs129,,eng,3,0.57735


## MOTES Pilot

MOTES is related to the "Measuring Original Thinking in Elementary Students: A Text-Mining Approach" (IES #R305A200519). This data is related to a high stakes test and is limited to research access. If you're a creativity research, please reach out to request it from <peter.organisciak@du.edu> and/or <selcuk.acar@unt.edu>.

In [192]:
desc = {
    "name": "motesp",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "rater_cols": ['D', 'K', 'T'],
    "range": [1, 7],
    "replace_values": {
        "prompt": {
            "lightbulb": "light bulb"
        },
        "question": {
            "What is a surprising use for a LIGHTBULB?": "What is a surprising use for a LIGHT BULB?"
        }
    },
    "language": "eng",
}
df = pd.read_csv('../data/raw/motesp_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

- name: motesp
- no_of_prompts: 29
- no_of_participants: 35
- no_of_data_points: 963
- prompts: ['backpack', 'ball', 'bottle', 'hat', 'light bulb', 'pencil', 'shoe', 'sock', 'spoon', 'toothbrush', 'big', 'cold', 'fun', 'red', 'smelly', 'soft', 'tasty', 'wet', 'aliens landed', 'kid president', 'rain soda', 'teacher read minds', 'time travel', 'friend phone', 'library', 'playground', 'school bus', 'sleepover', 'teacher talking']
- ICC2k: 0.73
- ICC2k_CI: 0.66-0.78
- ICC3k: 0.75
- rater_cols: ['D', 'K', 'T']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
540,instances,motesp,What is a surprising example of something TASTY?,tasty,"Ice cream with macaroons, lollipops, skittles,...",motesp_g2_tasty-b810e2,2.555333,motesp13OO,,eng,3,0.57735
277,uses,motesp,What is a surprising use for a SPOON?,spoon,Use it as a small sword,motesp_g1_spoon-f6636d,3.222,motesp17ZR,,eng,3,1.527525


## MOTES

This is the post-pilot data. As with the pilot data, this dataset is available on request. Please reach out!

In [193]:
desc = {
    "name": "motesf",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "column_mappings": {'ID':'participant'},
    "null_marker": -999,
    "rater_cols": ["Kscore", "Hscore", "Cscore", "Tscore", "Mscore"],
    "replace_values": {
        "prompt": {
            "lightbulb": "light bulb"
        },
        "question": {
            "What is a surprising use for a LIGHTBULB?": "What is a surprising use for a LIGHT BULB?"
        }
    },
    "range": [1, 5], # different scale than motesp pilot
    "language": "eng"
}
# data was already reshaped to long format for previous study
df = pd.read_csv('../data/raw/motesf_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

- Renaming columns {'ID': 'participant'}

Replacing -999 with NaN in response column


- name: motesf
- no_of_prompts: 24
- no_of_participants: 386
- no_of_data_points: 8563
- prompts: ['ball', 'sock', 'pencil', 'spoon', 'light bulb', 'hat', 'bottle', 'toothbrush', 'smelly', 'soft', 'red', 'frozen', 'wet', 'huge', 'fun', 'tasty', 'school bus', 'games', 'library', 'lecture', 'phone', 'rain', 'closet', 'lunchroom']
- ICC2k: 0.79
- ICC2k_CI: 0.69-0.85
- ICC3k: 0.84
- rater_cols: ['Kscore', 'Hscore', 'Cscore', 'Tscore', 'Mscore']
- no_of_raters: 5




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
6803,completion,motesf,"Complete this sentence in a surprising way: ""W...",library,they yelled banana,motesf_library-b37f15,2.3996,motesf63923f,1,eng,5,0.547723
7836,completion,motesf,"Complete this sentence in a surprising way: ""I...",rain,my shoes got dirty from the mud,motesf_rain-50438f,2.1998,motesf621bf6,6,eng,5,0.447214


## Multilingual semantic distance

From: Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.


In the paper, the *exact* provenance of subdata isn't specified, so needs further investigation to determine why additional citations are needed:

> We received 30 datasets, with a combined sample size of 6,522, reflecting data from 22 labs and 12 languages: Arabic, Chinese, Dutch, English, Farsi, French German, Hebrew, Italian, Polish, Russian, and Spanish (see Figure 1). Several datasets came from published studies, whereas others have not been used for publication.

For verification, here are the published ICC values (which look to be the ICC3k values):

| Dataset  | Raters | ICC  |
|----------|--------|------|
| Arabic1  | 1      | N/A  |
| Chinese1 | 4      | 0.49 |
| Chinese2 | 4      | 0.64 |
| Dutch1   | 2      | 0.81 |
| Dutch2   | 2      | 0.94 |
| Dutch3   | 2      | 0.87 |
| Dutch4   | 3      | 0.85 |
| English1 | 4      | 0.84 |
| English2 | 4      | 0.77 |
| English3 | 3      | 0.56 |
| English4 | 3      | 0.72 |
| English5 | 3      | 0.78 |
| English6 | 3      | 0.64 |
| Farsi1   | 3      | 0.69 |
| Farsi2   | 3      | 0.75 |
| French1  | 4      | 0.8  |
| French2  | 3      | 0.64 |
| French3  | 3      | 0.75 |
| French4  | 3      | 0.8  |
| German1  | 4      | 0.71 |
| German2  | 3      | 0.78 |
| German3  | 3      | 0.86 |
| Hebrew1  | 45     | 0.88 |
| Italian1 | 2      | 0.89 |
| Italian2 | 2      | 0.88 |
| Polish1  | 3      | 0.82 |
| Polish2  | 2      | 0.6  |
| Russian1 | 3      | 0.72 |
| Russian2 | 3      | 0.79 |
| Spanish1 | 3      | 0.74 |

In [220]:
desc = {
    "name": "multiaut",
    "test_type": "uses",
    "meta": {
        "inline": "Patterson et al., 2023",
        "citation": "Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.",
        "url": "https://doi.org/10.1037/aca0000618",
        "download": {
            "url": "https://files.osf.io/v1/resources/5cy9n/providers/github/processed-data/?zip=",
            "extension": "zip",
            # Excluded for redundancy
            # english5 is the same as bs12
            # english4 is the same as snb17
            # english1 is the same as betal18
            "archive_files": [
                "arabic1.csv", "dutch4.csv", "english6.csv", "german3.csv", "russian1.csv",
                "chinese1.csv", "french2.csv", "hebrew1.csv", "russian2.csv",
                "chinese2.csv", "english2.csv", "french3.csv", "italian1.csv", "spanish1.csv",
                "dutch1.csv", "english3.csv", "french4.csv", "italian2.csv",
                "dutch2.csv", "german1.csv", "polish1.csv",
                "dutch3.csv", "german2.csv", "polish2.csv"
            ]
        }
    },
    "column_mappings": {'ID': 'participant', 'order':'response_num'}
}


In [216]:
fnames = download_from_description(desc, '../data/raw')

# additional info specific to subset datasets
dutch_replace_vals = {
    "prompt": {
        "brick": "backstreen",
        "fork": "vork",
            "towel": "handoek"
    }
}
subsets = {
    'arabic1': {
        'replace_values': {
            "prompt": {
                'Tin cans': 'علب الصفيح'
            }
        }
    },
    "dutch1": {
        "replace_values": dutch_replace_vals
    },
    "dutch2": {
        "replace_values": dutch_replace_vals
    },
    "dutch3": {
        "replace_values": dutch_replace_vals
    },
    "dutch4": {
        "replace_values": dutch_replace_vals
    },
}

language_to_iso = {
    'arabic': 'ara', 'chinese': 'chi', 'dutch': 'dut', 'english': 'eng',
    'farsi': 'per', 'french': 'fre', 'german': 'ger', 'hebrew': 'heb',
    'italian': 'ita', 'polish': 'pol', 'russian': 'rus', 'spanish': 'spa'
}
for fname in fnames:
    # language is automatically detected from filename
    subset_desc = desc.copy()
    print(fname.stem)
    subset_desc['language'] = language_to_iso[fname.stem[:-1]]
    subset_desc['name'] = desc['name'] + '_' + fname.stem
    if subsets.get(fname.stem):
        subset_desc.update(subsets[fname.stem])
    if fname.stem == 'english2':
        print("NOTE: english2 had an encoding error. Opening and re-saving it seems to fix it.")
    df = pd.read_csv(fname, encoding='utf-8')
    cleaned = prep_general(df, **subset_desc, save_dir='../data/datasets')
    display(cleaned.sample(2))

arabic1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_arabic1
- no_of_prompts: 1
- no_of_participants: 160
- no_of_data_points: 1524
- prompts: ['Tin cans']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rater1']
- no_of_raters: 1




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1481,uses,multiaut_arabic1,ما هو الاستخدام المفاجئ لـ TIN CANS؟,Tin cans,وسيلة لتعليم الألوان,multiaut_arabic1_Tin cans-d9925e,3.0,multiaut_arabic1189,3,ara,1,
642,uses,multiaut_arabic1,ما هو الاستخدام المفاجئ لـ TIN CANS؟,Tin cans,صناعة جاروف,multiaut_arabic1_Tin cans-105f96,4.0,multiaut_arabic184,11,ara,1,


dutch4


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch4
- no_of_prompts: 1
- no_of_participants: 99
- no_of_data_points: 2414
- prompts: ['backstreen']
- ICC2k: 0.84
- ICC2k_CI: 0.82-0.86
- ICC3k: 0.85
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
654,uses,multiaut_dutch4,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,Muren,multiaut_dutch4_backstreen-9d94b6,1.8,multiaut_dutch444,,dut,3,0.0
83,uses,multiaut_dutch4,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,Boekenhouder,multiaut_dutch4_backstreen-b332e4,3.4,multiaut_dutch411,,dut,3,1.0


english6


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

Dropping 7 unrated items


- name: multiaut_english6
- no_of_prompts: 2
- no_of_participants: 241
- no_of_data_points: 3425
- prompts: ['brick', 'knife']
- ICC2k: 0.48
- ICC2k_CI: 0.15-0.65
- ICC3k: 0.64
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
575,uses,multiaut_english6,What is a surprising use for a KNIFE?,knife,cut ropes,multiaut_english6_knife-69d7c0,1.0,multiaut_english644,,eng,3,0.0
3171,uses,multiaut_english6,What is a surprising use for a KNIFE?,knife,cut tress,multiaut_english6_knife-1613f0,1.333,multiaut_english6223,,eng,3,0.57735


german3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_german3
- no_of_prompts: 16
- no_of_participants: 51
- no_of_data_points: 8065
- prompts: ['Axt', 'Trompete', 'Erbse', 'Tisch', 'Flöte', 'Zange', 'Gurke', 'Bett', 'Tomate', 'Geige', 'Schaufel', 'Schrank', 'Paprika', 'Stuhl', 'Trommel', 'Säge']
- ICC2k: 0.85
- ICC2k_CI: 0.83-0.87
- ICC3k: 0.86
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3827,uses,multiaut_german3,Was ist eine überraschende Verwendung für eine...,Gurke,sexspielzeug,multiaut_german3_Gurke-b525bd,3.0,multiaut_german327,17,ger,3,0.0
8058,uses,multiaut_german3,Was ist eine überraschende Verwendung für eine...,Säge,in den Boden stecken als Zaun für Meerschweinc...,multiaut_german3_Säge-3b70a2,4.333,multiaut_german355,33,ger,3,0.57735


russian1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_russian1
- no_of_prompts: 2
- no_of_participants: 111
- no_of_data_points: 1728
- prompts: ['газета', 'деревянная линейка']
- ICC2k: 0.72
- ICC2k_CI: 0.69-0.74
- ICC3k: 0.72
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
376,uses,multiaut_russian1,Какое удивительное применение для ГАЗЕТЫ?,газета,Можно сделать одноразовую одежду,multiaut_russian1_газета-b666e0,2.333,multiaut_russian132,,rus,3,0.57735
1410,uses,multiaut_russian1,Какое удивительное применение для ДЕРЕВЯННОЙ Л...,деревянная линейка,Плот для муравьёв,multiaut_russian1_деревянная линейка-64c54b,4.0,multiaut_russian133,,rus,3,0.0


chinese1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0, 5)

- name: multiaut_chinese1
- no_of_prompts: 122
- no_of_participants: 466
- no_of_data_points: 14176
- prompts: ['积木', '漏斗', '轮胎', '盘子', '皮带', '塑料袋', '西瓜', '牙签', '锅', '蚊帐', '头发', '纸盒', '铁链', '硬币', '字典', '报纸', '靴子', '易拉罐', '玉米', '南瓜', '耳机', '喇叭', '领带', '扑克', '手套', '银行卡', '袜子', '镊子', '卫生纸', '蛋清', '小米', '耳机线', '鞋带', '发簪', '柳树', '生姜', '红酒', '白酒', '木头', '火柴', '纽扣', '图钉', '吸管', '牙膏', '船桨', '面团', '西红柿', '荷叶', '磁铁', '弹弓', '钉子', '光盘', '毛巾', '橡皮擦', '球拍', '鹅卵石', '贝壳', '椰子', '花生', '核桃', '铃铛', '酒瓶', '西瓜皮', '冰块', '马来貘', '咖啡', '擀面杖', '戒指', '气球', '芦荟', '橄榄油', '花瓣', '土豆', '曲别针', '浴缸', '茶壶', '灌木', '无花果', '韭菜', '花椒', '纸巾', '皮筋', '黄金', '音响', '画像', '狐狸', '发带', '纸杯', '棉签', '柿子', '白纸', '杯子', '窗帘', '蛋糕', '地图', '风车', '夹子', '胶水', '蜡烛', '毛笔', '墨水', '铅笔', '钳子', '扇子', '勺子', '梳子', '柳条', '水壶', '算盘', '蛋壳', '台灯', '围巾', '温度计', '相机', '香蕉', '牙刷', '钥匙', '衣架', '砖头', '笛子', '酸奶', '吹风机']
- ICC2k: 0.48
- ICC2k_CI: 0.44-0.51
- ICC3k: 0.49
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
8089,uses,multiaut_chinese1,芦荟的一个令人惊讶的用途是什么？,芦荟,做青团,multiaut_chinese1_芦荟-7d4aa6,2.6,multiaut_chinese11270,,chi,4,1.154701
8180,uses,multiaut_chinese1,橄榄油的一个令人惊讶的用途是什么？,橄榄油,按摩,multiaut_chinese1_橄榄油-ac0593,2.6,multiaut_chinese11273,,chi,4,0.816497


french2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

Dropping 47 unrated items


- name: multiaut_french2
- no_of_prompts: 1
- no_of_participants: 82
- no_of_data_points: 449
- prompts: ['chapeau']
- ICC2k: 0.52
- ICC2k_CI: 0.25-0.68
- ICC3k: 0.64
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
482,uses,multiaut_french2,Quel est un usage surprenant pour un CHAPEAU?,chapeau,Pour tenir bonheur,multiaut_french2_chapeau-f73049,3.333,multiaut_french23061,,fre,3,1.527525
284,uses,multiaut_french2,Quel est un usage surprenant pour un CHAPEAU?,chapeau,Decorer,multiaut_french2_chapeau-7f221f,2.666,multiaut_french22809,,fre,3,2.081666


hebrew1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

WARNING: ICC has an undefined error for this dataset

- name: multiaut_hebrew1
- no_of_prompts: 10
- no_of_participants: 51
- no_of_data_points: 2027
- prompts: ['סכין', 'נעל', 'עיפרון', 'עיתון', 'מברג', 'קולב', 'צמיג', 'מטאטא', 'כיסא', 'כרית']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4', 'rater5', 'rater6', 'rater7', 'rater8', 'rater9', 'rater10', 'rater11', 'rater12', 'rater13', 'rater14', 'rater15', 'rater16', 'rater17', 'rater18', 'rater19', 'rater20', 'rater21', 'rater22', 'rater23', 'rater24', 'rater25', 'rater26', 'rater27', 'rater28', 'rater29', 'rater30', 'rater31', 'rater32', 'rater33', 'rater34', 'rater35', 'rater36', 'rater37', 'rater38', 'rater39', 'rater40', 'rater41', 'rater42', 'rater43', 'rater44', 'rater45']
- no_of_raters: 45




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1977,uses,multiaut_hebrew1,מהו שימוש מפתיע לכרית?,כרית,נפנף כדי לעשות רוח,multiaut_hebrew1_כרית-f4d62b,3.667,multiaut_hebrew11596,4,heb,9,1.224745
1906,uses,multiaut_hebrew1,מהו שימוש מפתיע לכרית?,כרית,להוציא את המילוי שלה ולעשות ממנו צמות,multiaut_hebrew1_כרית-80ec92,4.75025,multiaut_hebrew11903,6,heb,8,0.707107


russian2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_russian2
- no_of_prompts: 1
- no_of_participants: 45
- no_of_data_points: 370
- prompts: ['картонная коробка']
- ICC2k: 0.78
- ICC2k_CI: 0.74-0.82
- ICC3k: 0.79
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
52,uses,multiaut_russian2,Какое удивительное применение для КАРТОННОЙ КО...,картонная коробка,Воздушный шар,multiaut_russian2_картонная коробка-06a8b8,2.0,multiaut_russian24,,rus,3,1.0
237,uses,multiaut_russian2,Какое удивительное применение для КАРТОННОЙ КО...,картонная коробка,"Если делать коробки внутри мягкими, то и запак...",multiaut_russian2_картонная коробка-215bb5,1.666,multiaut_russian228,,rus,3,1.154701


chinese2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 42)

- name: multiaut_chinese2
- no_of_prompts: 2
- no_of_participants: 217
- no_of_data_points: 1302
- prompts: ['筷子', '易拉罐']
- ICC2k: 0.6
- ICC2k_CI: 0.54-0.66
- ICC3k: 0.64
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
873,uses,multiaut_chinese2,易拉罐的一个令人惊讶的用途是什么？,易拉罐,制作模型,multiaut_chinese2_易拉罐-3419d4,1.14639,multiaut_chinese2342,,chi,4,1.0
1,uses,multiaut_chinese2,筷子有什么令人惊讶的用途？,筷子,插在土里固定农作物,multiaut_chinese2_筷子-89a91e,1.146293,multiaut_chinese2101,,chi,4,1.0


english2
NOTE: english2 had an encoding error. Opening and re-saving it seems to fix it.


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_english2
- no_of_prompts: 2
- no_of_participants: 182
- no_of_data_points: 3723
- prompts: ['rope', 'box']
- ICC2k: 0.71
- ICC2k_CI: 0.59-0.78
- ICC3k: 0.77
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3015,uses,multiaut_english2,What is a surprising use for a ROPE?,rope,leash,multiaut_english2_rope-e5c684,2.24975,multiaut_english2152,,eng,4,1.258306
149,uses,multiaut_english2,What is a surprising use for a ROPE?,rope,mimic shapes,multiaut_english2_rope-1e54d2,2.74975,multiaut_english27,,eng,4,0.957427


french3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 11.0)

Dropping 10 unrated items


- name: multiaut_french3
- no_of_prompts: 2
- no_of_participants: 277
- no_of_data_points: 2181
- prompts: ['ceinture', 'brouette']
- ICC2k: 0.72
- ICC2k_CI: 0.67-0.77
- ICC3k: 0.75
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1506,uses,multiaut_french3,Quel est un usage surprenant pour une BROUETTE?,brouette,Jeux pour enfant au sein d’une kermesse,multiaut_french3_brouette-4fcd98,1.2668,multiaut_french3192,,fre,3,0.57735
771,uses,multiaut_french3,Quel est un usage surprenant pour une BROUETTE?,brouette,Mettre un enfant dedans pour jouer comme si c'...,multiaut_french3_brouette-5c51ff,1.1332,multiaut_french3106,,fre,3,0.57735


italian1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_italian1
- no_of_prompts: 6
- no_of_participants: 151
- no_of_data_points: 4269
- prompts: ['Attaccapanni', 'Barile', 'Bottiglia di plastica', 'Lampadina', 'Libro', 'Sedia']
- ICC2k: 0.89
- ICC2k_CI: 0.89-0.9
- ICC3k: 0.89
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
427,uses,multiaut_italian1,Qual è un uso sorprendente per una LAMPADINA?,Lampadina,strumento,multiaut_italian1_Lampadina-4524c4,2.0,multiaut_italian118,,ita,2,0.0
3983,uses,multiaut_italian1,Qual è un uso sorprendente per una LAMPADINA?,Lampadina,base per cartapesta per fare forme tonde,multiaut_italian1_Lampadina-5c62b6,3.0,multiaut_italian11059,,ita,2,0.0


spanish1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_spanish1
- no_of_prompts: 1
- no_of_participants: 491
- no_of_data_points: 2735
- prompts: ['ladrillo']
- ICC2k: 0.57
- ICC2k_CI: 0.18-0.75
- ICC3k: 0.75
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
748,uses,multiaut_spanish1,¿Cuál es un uso sorprendente para un LADRILLO?,ladrillo,fogatas,multiaut_spanish1_ladrillo-fe9d47,1.666,multiaut_spanish1109,7,spa,3,1.154701
826,uses,multiaut_spanish1,¿Cuál es un uso sorprendente para un LADRILLO?,ladrillo,muros o paredes,multiaut_spanish1_ladrillo-dd77ed,1.0,multiaut_spanish1120,4,spa,3,0.0


dutch1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch1
- no_of_prompts: 4
- no_of_participants: 633
- no_of_data_points: 10549
- prompts: ['backstreen', 'vork', 'paperclip', 'handoek']
- ICC2k: 0.81
- ICC2k_CI: 0.81-0.82
- ICC3k: 0.81
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
6443,uses,multiaut_dutch1,Wat is een verrassend gebruik voor een VORK?,vork,prikken,multiaut_dutch1_vork-fc0e41,1.8,multiaut_dutch1166,,dut,2,0.0
5137,uses,multiaut_dutch1,Wat is een verrassend gebruik voor een VORK?,vork,hark,multiaut_dutch1_vork-d85aa1,3.8,multiaut_dutch1753,,dut,2,0.707107


english3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 33.0)

Dropping 11 unrated items


- name: multiaut_english3
- no_of_prompts: 2
- no_of_participants: 209
- no_of_data_points: 3225
- prompts: ['box', 'rope']
- ICC2k: 0.38
- ICC2k_CI: 0.18-0.52
- ICC3k: 0.49
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2793,uses,multiaut_english3,What is a surprising use for a BOX?,box,spray paint booth,multiaut_english3_box-5a6001,1.166625,multiaut_english383898,,eng,3,1.527525
1359,uses,multiaut_english3,What is a surprising use for a ROPE?,rope,belt,multiaut_english3_rope-c50a92,1.041625,multiaut_english382969,,eng,3,0.57735


french4


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 5.0)

- name: multiaut_french4
- no_of_prompts: 2
- no_of_participants: 238
- no_of_data_points: 2332
- prompts: ['brouette', 'ceinture']
- ICC2k: 0.79
- ICC2k_CI: 0.78-0.81
- ICC3k: 0.8
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
419,uses,multiaut_french4,Quel est un usage surprenant pour une CEINTURE?,ceinture,En tant que corde,multiaut_french4_ceinture-464447,1.0,multiaut_french448,,fre,3,0.0
794,uses,multiaut_french4,Quel est un usage surprenant pour une CEINTURE?,ceinture,Se pendre,multiaut_french4_ceinture-f2f33a,1.667,multiaut_french483,,fre,3,0.57735


italian2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

Dropping 3 unrated items


- name: multiaut_italian2
- no_of_prompts: 21
- no_of_participants: 80
- no_of_data_points: 6895
- prompts: ['guanto', 'lampadina', 'libro', 'martello', 'mattone', 'cappello', 'cestino', 'coltello', 'cucchiaio', 'graffetta', 'accendino', 'accetta', 'appendino', 'aspirapolvere', 'banana', 'barattolo', 'bicicletta', 'borsa', 'botte', 'bottiglietta', 'capello']
- ICC2k: 0.88
- ICC2k_CI: 0.87-0.89
- ICC3k: 0.88
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3615,uses,multiaut_italian2,Qual è un uso sorprendente per una BOTTE?,botte,comodino,multiaut_italian2_botte-e2ef7f,2.6,multiaut_italian2103,,ita,2,0.0
5496,uses,multiaut_italian2,Qual è un uso sorprendente per un LIBRO?,libro,ventaglio,multiaut_italian2_libro-8afa8b,2.6,multiaut_italian2123,,ita,2,0.0


dutch2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 5.0)

- name: multiaut_dutch2
- no_of_prompts: 2
- no_of_participants: 111
- no_of_data_points: 1640
- prompts: ['backstreen', 'paperclip']
- ICC2k: 0.94
- ICC2k_CI: 0.93-0.95
- ICC3k: 0.94
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1406,uses,multiaut_dutch2,Wat is een verrassend gebruik voor een paperclip?,paperclip,ring van maken,multiaut_dutch2_paperclip-f854b7,2.6,multiaut_dutch21168,,dut,2,0.0
371,uses,multiaut_dutch2,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,kei,multiaut_dutch2_backstreen-6876c2,1.8,multiaut_dutch21225,,dut,2,0.0


german1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0.0, 3.0)

- name: multiaut_german1
- no_of_prompts: 3
- no_of_participants: 298
- no_of_data_points: 8116
- prompts: ['konservendose', 'messer', 'haarfoehn']
- ICC2k: 0.7
- ICC2k_CI: 0.68-0.72
- ICC3k: 0.71
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1587,uses,multiaut_german1,Was ist eine überraschende Verwendung für eine...,haarfoehn,Federn im Raum verteilen,multiaut_german1_haarfoehn-fd8d73,2.666333,multiaut_german166,,ger,4,0.5
1708,uses,multiaut_german1,Was ist eine überraschende Verwendung für eine...,haarfoehn,"um mit nem Tischtennisball zu spielen, ohne ih...",multiaut_german1_haarfoehn-6fc4e4,4.333333,multiaut_german171,,ger,4,0.57735


polish1


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 7)

- name: multiaut_polish1
- no_of_prompts: 2
- no_of_participants: 791
- no_of_data_points: 7415
- prompts: ['puszka', 'cegła']
- ICC2k: 0.82
- ICC2k_CI: 0.81-0.83
- ICC3k: 0.82
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2402,uses,multiaut_polish1,Jakie jest zaskakujące zastosowanie dla PUSZKI?,puszka,po zgnieceniu jako formę mozaiki na ścianę,multiaut_polish1_puszka-5dd583,2.333333,multiaut_polish1480110,p3,pol,3,0.0
6386,uses,multiaut_polish1,Jakie jest zaskakujące zastosowanie dla CEGŁY?,cegła,obciążnik,multiaut_polish1_cegła-62dbf6,1.888667,multiaut_polish1515364,p2,pol,3,0.57735


dutch3


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (0, 5)

- name: multiaut_dutch3
- no_of_prompts: 1
- no_of_participants: 111
- no_of_data_points: 1004
- prompts: ['backstreen']
- ICC2k: 0.86
- ICC2k_CI: 0.79-0.89
- ICC3k: 0.87
- rater_cols: ['rater1', 'rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
929,uses,multiaut_dutch3,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,smelten,multiaut_dutch3_backstreen-c5d027,3.8,multiaut_dutch31530,,dut,2,0.707107
165,uses,multiaut_dutch3,Wat is een verrassend gebruik voor een BACKSTR...,backstreen,bed,multiaut_dutch3_backstreen-2325a5,3.4,multiaut_dutch31316,,dut,2,0.0


german2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1, 5)

- name: multiaut_german2
- no_of_prompts: 3
- no_of_participants: 154
- no_of_data_points: 3530
- prompts: ['Büroklammer', 'Mülltüte', 'Seil']
- ICC2k: 0.71
- ICC2k_CI: 0.54-0.8
- ICC3k: 0.77
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
3404,uses,multiaut_german2,Was ist eine überraschende Verwendung für eine...,Mülltüte,jemanden würgen :),multiaut_german2_Mülltüte-90638d,2.667,multiaut_german22322RAJU190,,ger,3,0.57735
1656,uses,multiaut_german2,Was ist eine überraschende Verwendung für eine...,Mülltüte,im Winter als Schlitten,multiaut_german2_Mülltüte-69ab1f,3.333,multiaut_german21733ANAU192,,ger,3,1.527525


polish2


### Loading *Patterson et al., 2023*

Patterson, J. D., Merseal, H. M., Johnson, D. R., Agnoli, S., Baas, M., Baker, B. S., ... & Beaty, R. E. (2023). Multilingual semantic distance: Automatic verbal creativity assessment in many languages. Psychology of Aesthetics, Creativity, and the Arts, 17(4), 495.

- Renaming columns {'ID': 'participant', 'order': 'response_num'}

- Inferred range of original data: (1.0, 7.0)

WARNING: ICC has an undefined error for this dataset

- name: multiaut_polish2
- no_of_prompts: 3
- no_of_participants: 497
- no_of_data_points: 3054
- prompts: ['sznurek', 'puszka', 'cegła']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rater1', 'rater2', 'rater3', 'rater4']
- no_of_raters: 4




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
302,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREKA?,sznurek,robienie ozdób,multiaut_polish2_sznurek-fbf549,2.0,multiaut_polish2102016,p2,pol,2,0.707107
759,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREKA?,sznurek,którym później owinie się pień rośliny doniczk...,multiaut_polish2_sznurek-19e14c,2.0,multiaut_polish2419895,p2,pol,2,0.707107


## TransDis

A Chinese-language AUT dataset.

In [196]:
desc = {
    "name": "transdis",
    "test_type": "uses",
    "meta": {
        "inline": "Yang et al., 2023",
        "citation": "Yang, T., Zhang, Q., Sun, Z., & Hou, Y. (2023). Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach. arXiv preprint arXiv:2306.14790.",
        "url": "https://arxiv.org/abs/2306.14790",
        "download": [{
            "url": "https://osf.io/download/3fk8y", 
            "extension": "xlsx"
            }, {
            "url": "https://osf.io/download/mcwtu", 
            "extension": "xlsx"
            }],
    },
    "null_marker": "NA",
    "column_mappings": {'Item': 'prompt', 'Response': 'response',
                        'ParticipantID': 'participant'},
    "range": [0, 4],
    "rater_cols": ['Originality_Rater1', 'Originality_Rater2'],
    "language":"chi"
}

fnames = download_from_description(desc, '../data/raw')
df = pd.concat([pd.read_excel(fname) for fname in fnames])
# number each participant's responses in order (based on responseID)
df['response_num'] = df.groupby('ParticipantID').cumcount() + 1
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)


### Loading *Yang et al., 2023*

Yang, T., Zhang, Q., Sun, Z., & Hou, Y. (2023). Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach. arXiv preprint arXiv:2306.14790.

- Renaming columns {'Item': 'prompt', 'Response': 'response', 'ParticipantID': 'participant'}

Replacing NA with NaN in response column
Dropping 4 unrated items


- name: transdis
- no_of_prompts: 4
- no_of_participants: 350
- no_of_data_points: 8007
- prompts: ['床单', '筷子', '拖鞋', '牙刷']
- ICC2k: 0.67
- ICC2k_CI: 0.6-0.73
- ICC3k: 0.7
- rater_cols: ['Originality_Rater1', 'Originality_Rater2']
- no_of_raters: 2




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
4873,uses,transdis,什么是拖鞋的一个令人惊讶的用途？,拖鞋,变成飞船,transdis_拖鞋-84b552,3.5,transdis349,12,chi,2,0.707107
6043,uses,transdis,什么是牙刷的一个令人惊讶的用途？,牙刷,发圈,transdis_牙刷-0e3acd,3.0,transdis267,39,chi,2,0.0


## DiStefano, Beaty, Patterson, 2023 (Metaphors)

Based on the paper, semantic distance with BERT DSI correlated with human ratings  $r=.42$ on the held out dataset; GPT-2 at $r=.70$, and RoBERTa at $r=.72$.

In [197]:
desc = {
    "name": "dbc23",
    "test_type": "metaphors",
    "meta": {
        "inline": "DiStefano, Beaty, Patterson, 2023",
        "citation": "DiStefano, P. V., Patterson, J. D., & Beaty, R. (2023). Automatic Scoring of Metaphor Creativity with Large Language Models. https://doi.org/10.31234/osf.io/6jtxb",
        "url": "https://arxiv.org/abs/2306.14790",
        "download": [{
            "url": "https://osf.io/download/mr5a3", 
            "extension": "csv"
            }],
    },
    "column_mappings": {'item': 'prompt', 'Story': 'response',
                        'ID': 'participant'},
    # the released dataset is already merged among the raters
    "rater_cols": ['rating'],
    "language":"eng",
    "question_mappings": {
        'boring class': 'Think of the most boring high-school or college class you’ve ever had. What was it like to sit through?',
        'gross food': 'Think about the most disgusting thing you ever ate or drank. What was it like to eat or drink it?',
        'bad movie': 'Think about the worst movie or TV show you have ever seen. What was it like to watch it?',
        'messy room': 'Think of the messiest room that you’ve ever had to live in. What was it like to live there?'
        }
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *DiStefano, Beaty, Patterson, 2023*

DiStefano, P. V., Patterson, J. D., & Beaty, R. (2023). Automatic Scoring of Metaphor Creativity with Large Language Models. https://doi.org/10.31234/osf.io/6jtxb

- Renaming columns {'item': 'prompt', 'Story': 'response', 'ID': 'participant'}

- Inferring questions {'boring class': 'Think of the most boring high-school or college class you’ve ever had. What was it like to sit through?', 'gross food': 'Think about the most disgusting thing you ever ate or drank. What was it like to eat or drink it?', 'bad movie': 'Think about the worst movie or TV show you have ever seen. What was it like to watch it?', 'messy room': 'Think of the messiest room that you’ve ever had to live in. What was it like to live there?'}

- Inferred range of original data: (-1.675025493, 4.115607135)

- name: dbc23
- no_of_prompts: 4
- no_of_participants: 1546
- no_of_data_points: 4589
- prompts: ['boring class', 'gross food', 'bad movie', 'messy room']
- ICC2k: None
- ICC2k_CI: None
- ICC3k: None
- rater_cols: ['rating']
- no_of_raters: 1




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
1997,metaphors,dbc23,Think about the most disgusting thing you ever...,gross food,Eating that food was as bad as stepping on a Lego,dbc23_gross food-5dbd0c,2.105517,dbc237115,,eng,1,
4440,metaphors,dbc23,Think of the messiest room that you’ve ever ha...,messy room,That room was messier than a retail store on B...,dbc23_messy room-c3edfb,2.907474,dbc2394347,,eng,1,


## Haas 2018

This data uses online Mechanical Turk judges to re-judge Silvia et al. 2008. While the quality is likely to be lower, the position taken with Ocsai is that more raters is better, augmenting the btter trained raters.

In [198]:
desc = {
    "name": "h18",
    "meta": {
        "inline": "Hass, Rivera, Silvia 2018",
        "citation": "Hass, R. W., Rivera, M., & Silvia, P. J. (2018). On the Dependability and Feasibility of Layperson Ratings of Divergent Thinking. Frontiers in Psychology, 9. https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01343",
        "url": "https://doi.org/10.3389/fpsyg.2018.01343",
        "download": [{
            "url": "https://osf.io/download/p2b9c", 
            "extension": "csv"
            }],
    },
    "null_marker": " ",
    "column_mappings": {'subject': 'participant', 'task':'prompt',
                        'order':'response_num'},
    "range": [1, 5],
    "rater_cols": ['rater1', 'rater2', 'rater3'],
    "replace_values": {
        "prompt": {
            'Brick': "brick",
            'Round Things': "round",
            'Sleep': "no sleep",
            'Knife': "knife",
            'Make Noises': "noise",
            '12 Inches': "shrank"
        },
    },
    "question_mappings": {
        "brick": "What is a surprising use for a BRICK?",
        "round": "What is a surprising thing that is ROUND?", 
        "no sleep": "What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?", 
        "knife": "What is a surprising use for a KNIFE?",
        "noise": "What is a surprising thing that makes a NOISE?",
        "shrank": "What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?"
    },
    "type_mappings": {
        "brick": "uses",
        "round": "uses",
        "no sleep": "consequences",
        "knife": "uses",
        "noise": "instances",
        "shrank": "consequences"
    },
    "language":"eng"
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Hass, Rivera, Silvia 2018*

Hass, R. W., Rivera, M., & Silvia, P. J. (2018). On the Dependability and Feasibility of Layperson Ratings of Divergent Thinking. Frontiers in Psychology, 9. https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01343

- Renaming columns {'subject': 'participant', 'task': 'prompt', 'order': 'response_num'}

- Inferring questions {'brick': 'What is a surprising use for a BRICK?', 'round': 'What is a surprising thing that is ROUND?', 'no sleep': 'What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?', 'knife': 'What is a surprising use for a KNIFE?', 'noise': 'What is a surprising thing that makes a NOISE?', 'shrank': 'What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?'}

- Inferring types {'brick': 'uses', 'round': 'uses', 'no sleep': 'consequences', 'knife': 'uses', 'noise': 'instances', 'shrank': 'consequences'}

Replacing   with NaN in response column
Dropping 37 unrated items


- name: h18
- no_of_prompts: 6
- no_of_participants: 242
- no_of_data_points: 11490
- prompts: ['shrank', 'brick', 'knife', 'noise', 'round', 'no sleep']
- ICC2k: 0.43
- ICC2k_CI: 0.22-0.57
- ICC3k: 0.54
- rater_cols: ['rater1', 'rater2', 'rater3']
- no_of_raters: 3




Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
2815,consequences,h18,What would be a surprising consequence if PEOP...,no sleep,economy would go down because of high demand,h18_no sleep-02c559,2.0,h1863,10,eng,3,1.0
10635,uses,h18,What is a surprising use for a BRICK?,brick,play football,h18_brick-4cc0c0,2.0,h18223,15,eng,3,1.0


# Summary of Stats

(Also check for redundancy)

In [199]:
import duckdb
conn = duckdb.connect("../data/datasets/stats_db.duckdb")
stats = pd.read_sql('select * from stats', conn)
# sort to check for redundancy
stats.sort_values('no_of_data_points')



Unnamed: 0,name,no_of_prompts,no_of_participants,no_of_data_points,prompts,ICC2k,ICC2k_CI,ICC3k,rater_cols,no_of_raters
21,multiaut_russian2,1,45,370,[картонная коробка],0.78,0.74-0.82,0.79,"[rater1, rater2, rater3]",3
19,multiaut_french2,1,82,449,[chapeau],0.52,0.25-0.68,0.64,"[rater1, rater2, rater3]",3
11,motesp,29,35,963,"[backpack, ball, bottle, hat, light bulb, penc...",0.73,0.66-0.78,0.75,"[D, K, T]",3
34,multiaut_dutch3,1,111,1004,[brick],0.86,0.79-0.89,0.87,"[rater1, rater2]",2
5,hass17,2,57,1093,"[bottle, brick]",0.79,0.75-0.82,0.8,"[r1, r2, r3]",3
22,multiaut_chinese2,2,217,1302,"[筷子, 易拉罐]",0.6,0.54-0.66,0.64,"[rater1, rater2, rater3, rater4]",4
13,multiaut_arabic1,1,160,1524,[Tin cans],,,,[rater1],1
31,multiaut_dutch2,2,111,1640,"[brick, paperclip]",0.94,0.93-0.95,0.94,"[rater1, rater2]",2
17,multiaut_russian1,2,111,1728,"[газета, деревянная линейка]",0.72,0.69-0.74,0.72,"[rater1, rater2, rater3]",3
10,bs12,1,133,1807,[brick],0.72,0.56-0.8,0.78,"[br_rater1, br_rater2, br_rater3]",3


In [200]:
pd.read_csv('../data/datasets/multiaut_polish2.csv').sort_values('target', ascending=False)

Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num,language,rater_count,rating_std
56,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREK?,sznurek,Do stworzenia zegara poprzez wyliczenie czasu ...,multiaut_polish2_sznurek-6db2e6,4.666667,multiaut_polish230277,p1,pol,2,0.707107
57,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREK?,sznurek,Sznurek moze byc takze forma rurki przesylajac...,multiaut_polish2_sznurek-1c48eb,4.666667,multiaut_polish230277,p2,pol,2,0.707107
802,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREK?,sznurek,Do skonstruowania liny po ktorej mozna uciec z...,multiaut_polish2_sznurek-75e214,4.333333,multiaut_polish2442188,p1,pol,2,1.414214
804,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREK?,sznurek,Ze sznurka można zrobić sztucznego węża i nast...,multiaut_polish2_sznurek-b26865,4.333333,multiaut_polish2444381,p1,pol,2,1.414214
414,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla SZNUREK?,sznurek,do tamowania krwotoku,multiaut_polish2_sznurek-678d06,4.333333,multiaut_polish2334939,p5,pol,2,1.414214
...,...,...,...,...,...,...,...,...,...,...,...,...
1809,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKA?,puszka,zrobić wazon na kwiaty,multiaut_polish2_puszka-e453cb,1.000000,multiaut_polish2420809,p1,pol,2,0.000000
1811,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKA?,puszka,Jako popielniczkę,multiaut_polish2_puszka-e50b47,1.000000,multiaut_polish2421847,p1,pol,2,0.000000
1812,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKA?,puszka,jako doniczkę,multiaut_polish2_puszka-a8fda8,1.000000,multiaut_polish2421847,p2,pol,2,0.000000
1817,uses,multiaut_polish2,Jakie jest zaskakujące zastosowanie dla PUSZKA?,puszka,do zrobienia popielniczki,multiaut_polish2_puszka-feac8c,1.000000,multiaut_polish2423427,p3,pol,2,0.000000


In [201]:
from pathlib import Path
all = pd.concat([pd.read_csv(x) for x in Path('../data/datasets/').glob('*csv')])
print("Total number of rows", len(all))
display(all['type'].value_counts())

Total number of rows 162872


uses            140193
instances         8572
consequences      6561
metaphors         4589
completion        2957
Name: type, dtype: int64