This notebook normalizes human judged ground truth from various originality and creativity scoring studies.

This is an assortment of studies, with different demographics, goals, and test setups. It is most appropriate for supervised learning of automated scoring, where we're not necessarily trying to learn about the participants but about the *human judges* - how they interpret the originality scoring task in general.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from ocsai.data import download_from_description, prep_general
import pandas as pd

## Dumas et al 2020

In [239]:
desc = {
    "name": "dod20",
    "test_type": "uses",
    "meta": {
        "inline": "Dumas et al 2020",
        "download": {"url": "https://osf.io/download/u3yv4/", "extension": "csv"}
    },
    "null_marker": "!!!",
    "column_mappings": {},
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0], index_col=0)
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Dumas et al 2020*

Replacing !!! with NaN in response column
Rater cols: ['rater1', 'rater2', 'rater3', 'rater4']
# of prompts 10
# of participants 92
# of data points 5490
Prompts ['book' 'bottle' 'brick' 'fork' 'pants' 'rope' 'shoe' 'shovel' 'table'
 'tire']
# of raters 4
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1    df2  pval         CI95%
0   ICC1   Single raters absolute  0.57  6.27  5376  16131   0.0  [0.56, 0.58]
1   ICC2     Single random raters  0.58  7.78  5376  16128   0.0  [0.49, 0.65]
2   ICC3      Single fixed raters  0.63  7.78  5376  16128   0.0  [0.62, 0.64]
3  ICC1k  Average raters absolute  0.84  6.27  5376  16131   0.0  [0.83, 0.85]
4  ICC2k    Average random raters  0.85  7.78  5376  16128   0.0  [0.79, 0.88]
5  ICC3k     Average fixed raters  0.87  7.78  5376  16128   0.0  [0.87, 0.88]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
560,uses,dod20,What is a surprising use for a BRICK?,brick,doorstop,dod20_brick-97b3cd,2.24975,dod206,0
2617,uses,dod20,What is a surprising use for a BOOK?,book,escape,dod20_book-77df8f,2.334,dod2035,7


## Silvia et al 2009

In [240]:
desc = {
    "name": "snbmo09",
    "test_type": "uses",
    "meta": {
        "inline": "Silvia et al. 2009",
        "citation": "Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015",
        "download": {"url": "https://osf.io/download/qdrv8/", "ext": "csv"}
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "language": "eng",
}
fname = download_from_description(desc, '../data/raw')
df = pd.read_csv(fname[0])
df['prompt'] = df.task.apply(lambda x: x.split('_')[-1])
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2009*

Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O'Connor, A. (2009). Openness to experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive effects. Journal of Research in Personality, 43(6), 1087–1090. https://doi.org/10.1016/j.jrp.2009.04.015

Renaming columns {'subject': 'participant', 'response_order': 'response_num'}
Rater cols: ['rater_1', 'rater_2', 'rater_3', 'rater_4']
Dropping 10 unrated items
# of prompts 3
# of participants 202
# of data points 4099
Prompts ['brick' 'knife' 'box']
# of raters 4
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1    df2  pval         CI95%
0   ICC1   Single raters absolute  0.33  2.97  4097  12294   0.0  [0.31, 0.35]
1   ICC2     Single random raters  0.36  4.10  4097  12291   0.0  [0.25, 0.45]
2   ICC3      Single fixed raters  0.44  4.10  4097  12291   0.0  [0.42, 0.45]
3  ICC1k  Average raters absolute  0.66  2.97  4097  12294   0.0  [0.65, 0.68]
4  ICC2k    Average random raters  0.69  4.10  4097  12291   0.0  [0.57, 0.77]
5  ICC3k     Average fixed raters  0.76  4.10  4097  12291   0.0  [0.74, 0.77]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
1571,uses,snbmo09,What is a surprising use for a BOX?,box,a door mat,snbmo09_3_box-d08fee,1.75025,snbmo0975,6
2309,uses,snbmo09,What is a surprising use for a KNIFE?,knife,peeling,snbmo09_2_knife-0a52cf,1.0,snbmo09112,9


## Hass 2017

This study looked at uses for *bottle* and *brick*. There were 54 participants after data cleaning.

Rating was on a 5-point scale. For verification, their reported inter-rater reliability was ICC2k was 0.80 for brick and 0.78 for bottle, which is about what we see below.

The rating data was stoplisted, so I need to reconstruct the original responses here.

In [241]:
desc = {
    "name": 'hass17',
    "test_type": "uses",
    "meta": {
        "inline": "Hass 2017",
        "citation": "Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039",
        "url": "https://osf.io/ng598",
        "download": [
            {"url": 'https://osf.io/download/mcykr/', "ext": "xlsx"}, # rater scores
            {"url": 'https://osf.io/download/27bx8/', "ext": "xlsx"},  # responses 1
            {"url": 'https://osf.io/download/rzvyd/', "ext": "xlsx"}  # responses 2
        ],
    },
    "column_mappings": {
        "subject":"participant",
        "response_order":"response_num"
        },
    "range": [1, 5],
    "rater_cols": ['r1','r2','r3'],
    "language": "eng",
}

(ratings_fname, responses_fname, responses2_fname) = download_from_description(desc, '../data/raw')

# custom parsing specific to this dataset
all_ratings = []
for sheet, prompt in [('br_exp1', 'brick'),('br_exp2', 'brick'),('bot_exp1', 'bottle'),('bot_exp2', 'bottle')]:
    data = pd.read_excel(ratings_fname, sheet_name=sheet) #.rename(columns={'subject':'participant','response_order':'response_num'})
    data['prompt'] = prompt
    all_ratings.append(data)
hassratings = pd.concat(all_ratings).rename(columns={'response':'cleaned'})
participants = pd.concat([pd.read_excel(responses_fname), pd.read_excel(responses2_fname)])

# melt original responses to long, reconstructe the cleaned columns, then join with ratings
long_part = participants.melt(id_vars='ID', value_name='response').rename(columns={'ID':'participant'})
long_part = long_part[long_part.variable.str.contains('resp') & ~long_part.variable.str.contains('time')].dropna()
long_part[['prompt', 'response_num']] = long_part.variable.str.split('_', expand=True)

long_part.loc[long_part.prompt.str.contains('resp1'), 'prompt'] = 'bottle'
long_part.loc[long_part.prompt.str.contains('resp2'), 'prompt'] = 'brick'
long_part.sample(10)

import nltk
nltk.download('punkt')
nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stops = stopwords.words('english')
# not sure which list the study used, so just adjust based on testing
stops += ['could']
stops = [w for w in stops if w not in ['can']]
stops = set(stops)

def stop_clean(x):
    x = x.lower()
    x = x.replace('i.e.', 'e') # quirk of the tokenization in original study
    for c in list("/\\'-()"):
        x = x.replace(c, '')
    words = [word for word in word_tokenize(x) if word not in stops]
    return " ".join(words)

long_part['cleaned'] = long_part.response.apply(stop_clean)
hass07 = long_part.merge(hassratings, how='left', on=['prompt', 'cleaned'])

cleaned = prep_general(hass07, **desc, save_dir='../data/datasets')
cleaned.sample(2)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/peter.organisciak/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Loading *Hass 2017*

Hass, R. W. (2017). Semantic search during divergent thinking. Cognition, 166, 344–357. https://doi.org/10.1016/j.cognition.2017.05.039

Renaming columns {'subject': 'participant', 'response_order': 'response_num'}
Rater cols: ['r1', 'r2', 'r3']
# of prompts 2
# of participants 57
# of data points 1093
Prompts ['bottle' 'brick']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F  df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.55  4.68  794  1590   0.0  [0.51, 0.59]
1   ICC2     Single random raters  0.56  5.04  794  1588   0.0    [0.5, 0.6]
2   ICC3      Single fixed raters  0.57  5.04  794  1588   0.0  [0.54, 0.61]
3  ICC1k  Average raters absolute  0.79  4.68  794  1590   0.0  [0.76, 0.81]
4  ICC2k    Average random raters  0.79  5.04  794  1588   0.0  [0.75, 0.82]
5  ICC3k     Average fixed raters  0.80  5.04  794  1588   0.0  [0.78, 0.82]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
382,uses,hass17,What is a surprising use for a BOTTLE?,bottle,recycle it,hass17_bottle-5cf2d2,1.999,hass1756,5
80,uses,hass17,What is a surprising use for a BOTTLE?,bottle,drink,hass17_bottle-943855,1.0,hass1752,1


## Silvia et al Data 2008

This was the order of creativity tasks:

1. Please list all of the creative, unusual uses for a brick that you can think of.
2. Please list all of the creative, unusual instances of things that are round that you can think of.
3. Imagine that people no longer needed to sleep. Please list creative, unusual consequences that would follow.
4. Please list all of the creative, unusual uses for a knife that you can think of.
5. Please list all of the creative, unusual instances of things that will make a noise that you can think of.
6. Imagine that everyone shrank to 12 inches tall. Please list creative, unusual consequences that would follow.

Numbers 1 and 4 are AUT.



In [242]:
# Support .sav files
import pyreadstat
desc = {
    "name": "setal08",
    "meta": {
        "inline": "Silvia et al. 2008",
        "citation": "Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68",
        "url": "https://osf.io/dh7ey/",
        "download": {"url": "https://files.osf.io/v1/resources/4ketx/providers/osfstorage/5dd70d1f83135e000ec3c242/?zip=",
                    "extension": "zip",
                    "archive_files": ['DT_Responses_PACA_2008_Study_2.sav']
                    }
    },
    "column_mappings": {
        "subject":"participant",
        "order":"response_num"
        },
    "replace_values": {
        "prompt": {
            1: "brick",
            2: "round",
            3: "no sleep",
            4: "knife",
            5: "noise",
            6: "shrank"
        },
        "type": {
            1: "uses",
            2: "instances",
            3: "consequences",
            4: "uses",
            5: "instances",
            6: "consequences"
        },
        "question": {
            1:"What is a surprising use for a BRICK?",
            2: "What is a surprising thing that is ROUND?", 
            3: "What would be a surprising consequence if PEOPLE NEEDED NO SLEEP?", 
            4: "What is a surprising use for a KNIFE?",
            5: "What is a surprising thing that makes a NOISE?",
            6: "What would be a surprising consequence if EVERYONE SHRANK TO 12 INCHES TALL?"
        }
    },
    "range": [1, 5],
    "language": "eng",
}

# Download data
fnames = download_from_description(desc, '../data/raw', extension='zip')

# Some manual cleanup
df, meta = pyreadstat.read_sav(fnames[0])
# all three are mapped from task
for col in ['prompt', 'type', 'question']:
    df[col] = df['task'].astype(int)
df['subject'] = df['subject'].astype(int)
# doublecheck - burczak reported ICC2k as 0.48 for uses
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Silvia et al. 2008*

Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram, J. T., Hess, K. I., Martinez, J. L., & Richard, C. A. (2008). Assessing creativity with divergent thinking tasks: Exploring the reliability and validity of new subjective scoring methods. Psychology of Aesthetics, Creativity, and the Arts, 2(2), 68–85. https://doi.org/10.1037/1931-3896.2.2.68

Renaming columns {'subject': 'participant', 'order': 'response_num'}
Rater cols: ['rater1', 'rater2', 'rater3']
Dropping 37 unrated items
# of prompts 6
# of participants 242
# of data points 11490
Prompts ['brick' 'round' 'no sleep' 'knife' 'noise' 'shrank']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F    df1    df2  pval  \
0   ICC1   Single raters absolute  0.12  1.40  11488  22978   0.0   
1   ICC2     Single random raters  0.20  2.16  11488  22976   0.0   
2   ICC3      Single fixed raters  0.28  2.16  11488  22976   0.0   
3  ICC1k  Average raters absolute  0.29  1.40  11488  22978   0.0   
4  ICC2k    Average random raters  0.43  2.16  11488  22976   0.0   
5  ICC3k     Average fixed raters  0.54  2.16  11488  22976   0.0   

          CI95%  
0  [0.11, 0.13]  
1   [0.09, 0.3]  
2  [0.27, 0.29]  
3  [0.26, 0.31]  
4  [0.22, 0.57]  
5  [0.52, 0.55]  


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
3070,uses,setal08,What is a surprising use for a BRICK?,brick,make a snow man by decorating,setal08_1.0-0ec5b5,2.334,setal0869,2.0
7618,uses,setal08,What is a surprising use for a BRICK?,brick,arm exercises,setal08_1.0-35aa55,1.667,setal08159,5.0


## Hofelich Mohr, Sell, and Lindsay 2016

In [243]:
desc = {
    "name": "hmsl",
    "meta": {
        "inline": "Hofelich Mohr et al. 2016",
        "citation": "Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736",
        "url": "https://doi.org/10.1177/0894439315588736",
        "download": {
            "url": "https://conservancy.umn.edu/bitstream/handle/11299/172116/HMSL_CSV%20Data%20Files.zip?sequence=28&isAllowed=y",
            "extension": "zip",
            "archive_files": ['HMSL_Originality_scores_all.csv']   
        }
    },
    "null_marker": 11,
    "column_mappings": {'Item': 'prompt', 'QLogin_1':'participant'},
    "rater_cols": ['J1_Rating','J2_Rating','J3_Rating','J4_Rating'],
    "range": [1, 5],
    "language": "eng",
}

fname = download_from_description(desc, '../data/raw')[0]
df = pd.read_csv(fname)
# Doublecheck ICC2k - burczak paper had icc2k=0.67
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Hofelich Mohr et al. 2016*

Hofelich Mohr, A., Sell, A., & Lindsay, T. (2016). Thinking Inside the Box: Visual Design of the Response Box Affects Creative Divergent Thinking in an Online Survey. Social Science Computer Review, 34(3), 347–359. https://doi.org/10.1177/0894439315588736

Renaming columns {'Item': 'prompt', 'QLogin_1': 'participant'}
Replacing 11 with NaN in response column
Rater cols: ['J1_Rating', 'J2_Rating', 'J3_Rating', 'J4_Rating']
Dropping 23 unrated items
# of prompts 2
# of participants 638
# of data points 3843
Prompts ['paperclip' 'brick']
# of raters 4
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1    df2  pval         CI95%
0   ICC1   Single raters absolute  0.30  2.71  3713  11142   0.0  [0.28, 0.32]
1   ICC2     Single random raters  0.33  3.86  3713  11139   0.0  [0.22, 0.43]
2   ICC3      Single fixed raters  0.42  3.86  3713  11139   0.0   [0.4, 0.43]
3  ICC1k  Average raters absolute  0.63  2.71  3713  11142   0.0  [0.61, 0.65]
4  ICC2k    Average random raters  0.67  3.86  3713  11139   0.0  [0.53, 0.75]
5  ICC3k     Average fixed raters  0.74  3.86  3713  11139   0.0  [0.73, 0.75]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
769,uses,hmsl,What is a surprising use for a PAPERCLIP?,paperclip,use to spread cells on plates in a microbiolog...,hmsl_paperclip-125aa5,4.75025,hmslW8K1dDJu,4.0
3214,uses,hmsl,What is a surprising use for a BRICK?,brick,house,hmsl_brick-103a35,1.24975,hmslEoD82940,1.0


## Datasets used by Beaty and Johnson 2021

From SemDis paper:

- Study 1 was re-analysis of AUT responses from Beaty et al., 2018 to see if ensemble approaches work better. Two tests: `box` and `rope`
   - according to their paper, using additive composition was slightly negative correlation, while multiplicative 'results revealed a large correlation between latent semantic distance and human ratings:$r=.91$, p<.001'. This uses a model that weighs the factors, but is (I think) tailored to the dataset without held out data.

- Study 2 was re-analysis of results from Silvia et al. 2017, also on box and rope 
- Study 3 was brick - yet again - via Beaty and Silvia 2012
- Study 4 and 5- Heinen and Johnson (2018) - were noun matching, not relevant here

In [244]:
desc = {
    "name": "bj21",
    "meta": {
        "inline": "Beaty and Johnson 2021",
        "citation": "Beaty, R. E., & Johnson, D. R. (2021). Automating creativity assessment with SemDis: An open platform for computing semantic distance. Behavior Research Methods, 53(2), 757–780. https://doi.org/10.3758/s13428-020-01453-w",
        "url": "https://doi.org/10.3758/s13428-020-01453-w",
        "download": {
            "url": "https://files.osf.io/v1/resources/gz4fc/providers/osfstorage/5e45b6c73e86a800be6e662e/?zip=",
            "extension": "zip",
            "archive_files": ['Study 1/s1_data_long.xlsx',
                              'Study 2/s2_data_long.xlsx',
                              'Study 3/s3_data_long.xlsx']   
        }
    },
    "column_mappings": {'id':'participant', 'item':'prompt'},
    "range": [1, 5],
    "language": "eng",
}

substudies = [
    {
        "name": "betal18",
        "meta": {
            "inline": "Beaty et al., 2018",
            "citation": "Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115"
        }
    },
    {
        "name": "snb17",
        "meta": {
            "inline": "Silvia et al., 2017",
            "citation": "Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101"
        }
    },
    {
        "name": "bs12",
        "meta": {
            "inline": "Beaty & Silvia, 2012",
            "citation": "Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171"
        }
    },

]

fnames = download_from_description(desc, '../data/raw')
# the data comes from past studies, so we'll rename the files
# individually to their original studies
for fname,substudy in zip(fnames, substudies):
    new_desc = desc.copy()
    new_desc.update(substudy)
    df = pd.read_excel(fname)
    cleaned = prep_general(df, **new_desc, save_dir='../data/datasets')
    display(cleaned.sample(2))


### Loading *Beaty et al., 2018*

Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['rater1', 'rater2', 'rater3', 'rater4']
# of prompts 2
# of participants 171
# of data points 2918
Prompts ['box' 'rope']
# of raters 4
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.52  5.26  2894  8685   0.0   [0.5, 0.53]
1   ICC2     Single random raters  0.52  6.18  2894  8682   0.0  [0.46, 0.58]
2   ICC3      Single fixed raters  0.56  6.18  2894  8682   0.0  [0.55, 0.58]
3  ICC1k  Average raters absolute  0.81  5.26  2894  8685   0.0   [0.8, 0.82]
4  ICC2k    Average random raters  0.82  6.18  2894  8682   0.0  [0.77, 0.85]
5  ICC3k     Average fixed raters  0.84  6.18  2894  8682   0.0  [0.83, 0.85]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
2090,uses,betal18,What is a surprising use for a ROPE?,rope,a carpet beater,betal18_rope-26ef9e,2.5,betal182063,
2056,uses,betal18,What is a surprising use for a ROPE?,rope,trail marker,betal18_rope-6ed4eb,1.75025,betal182059,


### Loading *Silvia et al., 2017*

Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['rater1', 'rater2', 'rater3']
# of prompts 2
# of participants 142
# of data points 2372
Prompts ['box' 'rope']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.38  2.82  2352  4706   0.0   [0.35, 0.4]
1   ICC2     Single random raters  0.40  3.57  2352  4704   0.0   [0.29, 0.5]
2   ICC3      Single fixed raters  0.46  3.57  2352  4704   0.0  [0.44, 0.49]
3  ICC1k  Average raters absolute  0.65  2.82  2352  4706   0.0  [0.62, 0.67]
4  ICC2k    Average random raters  0.67  3.57  2352  4704   0.0  [0.55, 0.75]
5  ICC3k     Average fixed raters  0.72  3.57  2352  4704   0.0   [0.7, 0.74]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
2363,uses,snb17,What is a surprising use for a ROPE?,rope,to use as a necklace,snb17_rope-aec441,1.667,snb17155,
1661,uses,snb17,What is a surprising use for a ROPE?,rope,playing red rover,snb17_rope-1de706,1.666,snb1757,


### Loading *Beaty & Silvia, 2012*

Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['br_rater1', 'br_rater2', 'br_rater3']
# of prompts 1
# of participants 133
# of data points 1807
Prompts ['brick']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.43  3.24  1800  3602   0.0   [0.4, 0.46]
1   ICC2     Single random raters  0.46  4.46  1800  3600   0.0   [0.3, 0.57]
2   ICC3      Single fixed raters  0.54  4.46  1800  3600   0.0  [0.51, 0.56]
3  ICC1k  Average raters absolute  0.69  3.24  1800  3602   0.0  [0.67, 0.72]
4  ICC2k    Average random raters  0.72  4.46  1800  3600   0.0   [0.56, 0.8]
5  ICC3k     Average fixed raters  0.78  4.46  1800  3600   0.0  [0.76, 0.79]


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
1503,uses,bs12,What is a surprising use for a BRICK?,brick,play frisbee,bs12_brick-9bd94e,1.667,bs12109,
1497,uses,bs12,What is a surprising use for a BRICK?,brick,put on a loose roof during a hurricane or stor...,bs12_brick-cf2dad,1.333,bs12108,


### Loading *Beaty et al., 2018*

Beaty, R. E., Kenett, Y. N., Christensen, A. P., Rosenberg, M. D., Benedek, M., Chen, Q., Fink, A., Qiu, J., Kwapil, T. R., Kane, M. J., & Silvia, P. J. (2018). Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 115(5), 1087–1092. https://doi.org/10.1073/pnas.1713532115

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['rater1', 'rater2', 'rater3', 'rater4']
# of prompts 2
# of participants 171
# of data points 2918
Prompts ['box' 'rope']
# of raters 4
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.52  5.26  2894  8685   0.0   [0.5, 0.53]
1   ICC2     Single random raters  0.52  6.18  2894  8682   0.0  [0.46, 0.58]
2   ICC3      Single fixed raters  0.56  6.18  2894  8682   0.0  [0.55, 0.58]
3  ICC1k  Average raters absolute  0.81  5.26  2894  8685   0.0   [0.8, 0.82]
4  ICC2k    Average random raters  0.82  6.18  2894  8682   0.0  [0.77, 0.85]
5  ICC3k     Average fixed raters  0.84  6.18  2894  8682   0.0  [0.83, 0.85]


### Loading *Silvia et al., 2017*

Silvia, P. J., Nusbaum, E. C., & Beaty, R. E. (2017). Old or New? Evaluating the Old/New Scoring Method for Divergent Thinking Tasks. The Journal of Creative Behavior, 51(3), 216–224. https://doi.org/10.1002/jocb.101

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['rater1', 'rater2', 'rater3']
# of prompts 2
# of participants 142
# of data points 2372
Prompts ['box' 'rope']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.38  2.82  2352  4706   0.0   [0.35, 0.4]
1   ICC2     Single random raters  0.40  3.57  2352  4704   0.0   [0.29, 0.5]
2   ICC3      Single fixed raters  0.46  3.57  2352  4704   0.0  [0.44, 0.49]
3  ICC1k  Average raters absolute  0.65  2.82  2352  4706   0.0  [0.62, 0.67]
4  ICC2k    Average random raters  0.67  3.57  2352  4704   0.0  [0.55, 0.75]
5  ICC3k     Average fixed raters  0.72  3.57  2352  4704   0.0   [0.7, 0.74]


### Loading *Beaty & Silvia, 2012*

Beaty, R. E., & Silvia, P. J. (2012). Why do ideas get more creative across time? An executive interpretation of the serial order effect in divergent thinking tasks. Psychology of Aesthetics, Creativity, and the Arts, 6(4), 309–319. https://doi.org/10.1037/a0029171

Renaming columns {'id': 'participant', 'item': 'prompt'}
Rater cols: ['br_rater1', 'br_rater2', 'br_rater3']
# of prompts 1
# of participants 133
# of data points 1807
Prompts ['brick']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.43  3.24  1800  3602   0.0   [0.4, 0.46]
1   ICC2     Single random raters  0.46  4.46  1800  3600   0.0   [0.3, 0.57]
2   ICC3      Single fixed raters  0.54  4.46  1800  3600   0.0  [0.51, 0.56]
3  ICC1k  Average raters absolute  0.69  3.24  1800  3602   0.0  [0.67, 0.72]
4  ICC2k    Average random raters  0.72  4.46  1800  3600   0.0   [0.56, 0.8]
5  ICC3k     Average fixed raters  0.78  4.46  1800  3600   0.0  [0.76, 0.79]


## MOTES Pilot

MOTES is related to the "Measuring Original Thinking in Elementary Students: A Text-Mining Approach" (IES #R305A200519). This data is related to a high stakes test and is limited to research access. If you're a creativity research, please reach out to request it from <peter.organisciak@du.edu> and/or <selcuk.acar@unt.edu>.

In [245]:
desc = {
    "name": "motesp",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "rater_cols": ['D', 'K', 'T'],
    "range": [1, 7],
    "language": "eng",
}
df = pd.read_csv('../data/raw/motesp_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

Rater cols: ['D', 'K', 'T']
# of prompts 29
# of participants 35
# of data points 963
Prompts ['backpack' 'ball' 'bottle' 'hat' 'lightbulb' 'pencil' 'shoe' 'sock'
 'spoon' 'toothbrush' 'big' 'cold' 'fun' 'red' 'smelly' 'soft' 'tasty'
 'wet' 'aliens landed' 'kid president' 'rain soda' 'teacher read minds'
 'time travel' 'friend phone' 'library' 'playground' 'school bus'
 'sleepover' 'teacher talking']
# of raters 3
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F  df1   df2  pval         CI95%
0   ICC1   Single raters absolute  0.46  3.52  962  1926   0.0  [0.42, 0.49]
1   ICC2     Single random raters  0.47  4.02  962  1924   0.0  [0.39, 0.53]
2   ICC3      Single fixed raters  0.50  4.02  962  1924   0.0  [0.46, 0.54]
3  ICC1k  Average raters absolute  0.72  3.52  962  1926   0.0  [0.68, 0.75]
4  ICC2k    Average random raters  0.73  4.02  962  1924   0.0  [0.66, 0.78]
5  ICC3k     Average fixed raters  0.75  4.02  962  1924   0.0  [0.72

Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
139,uses,motesp,What is a surprising use for a LIGHTBULB?,lightbulb,Use it for a big thing like using 100 light bu...,motesp_g1_lightbulb-f6f31c,1.888667,motesp14ML,
498,instances,motesp,What is a surprising example of something SMELLY?,smelly,An example of something smelly is trash! Trash...,motesp_g2_smelly-f64aae,2.333333,motesp4SG,


In [None]:
src = 'motesp'
motes_pilot = pd.read_csv(os.path.join(base_dir, 'motes_pilot_gt_scores.csv')).rename(columns=dict(D='rater1', K='rater2', T='rater3', ID='participant'))
motes_pilot['type'] = motes_pilot.task.apply(lambda x: x.split('_')[0])
datasets[desc['name']] = prep_general(df, **desc)
datasets[desc['name']].sample(2)

Rater cols: ['rater1', 'rater2', 'rater3']
# of prompts 10
# of participants 35
# of data points 339
Prompts ['backpack' 'ball' 'bottle' 'hat' 'lightbulb' 'pencil' 'shoe' 'sock'
 'spoon' 'toothbrush']
# of raters 3
Intraclass correlation coefficients (report ICC2k)


Unnamed: 0,Type,Description,ICC,F,df1,df2,pval,CI95%
0,ICC1,Single raters absolute,0.58,5.15,338,678,0.0,"[0.52, 0.63]"
1,ICC2,Single random raters,0.58,5.28,338,676,0.0,"[0.52, 0.64]"
2,ICC3,Single fixed raters,0.59,5.28,338,676,0.0,"[0.53, 0.64]"
3,ICC1k,Average raters absolute,0.81,5.15,338,678,0.0,"[0.77, 0.84]"
4,ICC2k,Average random raters,0.81,5.28,338,676,0.0,"[0.77, 0.84]"
5,ICC3k,Average fixed raters,0.81,5.28,338,676,0.0,"[0.77, 0.84]"


## MOTES

This is the post-pilot data. As with the pilot data, this dataset is available on request. Please reach out!

In [246]:
desc = {
    "name": "motesf",
    "meta": {
        "inline": "Acar et al., 2023",
        "citation": "Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "url": "http://dx.doi.org/10.13140/RG.2.2.19804.56968",
        "download": {}
    },
    "column_mappings": {'ID':'participant'},
    "null_marker": -999,
    "rater_cols": ["Kscore", "Hscore", "Cscore", "Tscore", "Mscore"],
    "range": [1, 5], # note different scale than motesp pilot
    "language": "eng",
}
# data was already reshaped to long format for previous study
df = pd.read_csv('../data/raw/motesf_0.csv')
cleaned = prep_general(df, **desc, save_dir='../data/datasets')
cleaned.sample(2)

### Loading *Acar et al., 2023*

Acar, S., Dumas, D., Organisciak, P., Berthiaume, K. (2023). Measuring original thinking in elementary school: Development and validation of a computational psychometric approach. Journal of Educational Psychology. http://dx.doi.org/10.13140/RG.2.2.19804.56968

Renaming columns {'ID': 'participant'}
Replacing -999 with NaN in response column
Rater cols: ['Kscore', 'Hscore', 'Cscore', 'Tscore', 'Mscore']
# of prompts 24
# of participants 386
# of data points 8563
Prompts ['ball' 'sock' 'pencil' 'spoon' 'lightbulb' 'hat' 'bottle' 'toothbrush'
 'smelly' 'soft' 'red' 'frozen' 'wet' 'huge' 'fun' 'tasty' 'school bus'
 'games' 'library' 'lecture' 'phone' 'rain' 'closet' 'lunchroom']
# of raters 5
Intraclass correlation coefficients (report ICC2k)
    Type              Description   ICC     F   df1    df2  pval         CI95%
0   ICC1   Single raters absolute  0.40  4.34  8559  34240   0.0  [0.39, 0.41]
1   ICC2     Single random raters  0.42  6.41  8559  34236   0.0   [0.3, 0.52]
2   ICC3      Single fixed raters  0.52  6.41  8559  34236   0.0  [0.51, 0.53]
3  ICC1k  Average raters absolute  0.77  4.34  8559  34240   0.0  [0.76, 0.78]
4  ICC2k    Average random raters  0.79  6.41  8559  34236   0.0  [0.69, 0.85]
5  ICC3k     Average fixed raters  0.8

Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
5500,instances,motesf,What is a surprising example of something TASTY?,tasty,The cafeteria's corn-bread,motesf_tasty-1b4507,2.3996,motesfc9f0f8,6
6807,completion,motesf,"Complete this sentence in a surprising way: ""W...",library,there was people screaming,motesf_library-d0f74f,2.3996,motesfd39577,7


In [None]:
src = 'motesf'

corrected = True # use spelling corrected columns
items = [col.replace('_prompt', '') for col in df.columns if col.startswith('G') and col.endswith('_prompt')]

collector = []
# RESHAPE TO long
for item in items:
    subset = df[['participant'] + [col for col in df.columns if col.startswith(item)]].copy()
    subset.columns = [col.split('_')[-1] for col in subset.columns]
    subset['game'] = item.split('_')[0]
    subset['prompt_code'] = item
    collector.append(subset)
reshaped = pd.concat(collector)
# remove non-responses
reshaped = reshaped[~reshaped.raw.isna()]
# restore original wording in the test
reshaped.prompt =reshaped.prompt.str.replace('light bulbs', 'lightbulb').str.replace('hat cap', 'hat').str.replace('soccer ball', 'ball').str.replace('lead pencil', 'pencil').str.replace('spoons', 'spoon')

# add display order
displayorder = df[['participant'] + [col for col in df.columns if 'DO' in col]]
displayorder = displayorder.melt(id_vars='participant', value_name='prompt_code')
displayorder['response_num'] = displayorder.variable.apply(lambda x:x[-1])
reshaped = reshaped.merge(displayorder[['participant', 'prompt_code', 'response_num']])
# use spelling corrected response, unless set otherwise
reshaped = reshaped.rename(columns={('corrected' if corrected else 'raw'):'response'})
reshaped['type'] = reshaped['game'].replace({'G1':'uses', 'G2': 'instances', 'G3':'completion'})

completion_ref = {
    "playground": "When the friends met on the playground...",
    "school bus": "When I got on the school bus...",
    "games": "At a sleepover we...",
    'library': "When the kids were in the library...",
    'lecture': "When the teacher was talking...",
    'phone': "My friend called me on the phone to tell me...",
    'rain': "It started raining and...",
    'closet': "When I opened my closet...",
    'lunchroom': "When I was at lunch..."
}
reshaped.loc[reshaped.type == 'uses', 'question'] = reshaped.loc[reshaped.type == 'uses', 'prompt'].apply(lambda x: f"What is a surprising use for a {x.upper()}?")
reshaped.loc[reshaped.type == 'instances','question'] = reshaped.loc[reshaped.type == 'instances', 'prompt'].apply(lambda x: f"What is a surprising example of something {x.upper()}?")
reshaped.loc[(reshaped.type == 'completion'), 'question'] = reshaped.loc[reshaped.type == 'completion', 'prompt'].replace(completion_ref).str.replace("(.*)", 'Complete this sentence in a surprising way: "\\1"...', regex=True)

datasets[src] = prep_general(reshaped, src,
                             rater_cols=[col for col in reshaped if 'score' in col.lower()],
                             include_rater_std=include_rater_std, inputrange=(1,5))

datasets[src].sample()

Unnamed: 0,Type,Description,ICC,F,df1,df2,pval,CI95%
0,ICC1,Single raters absolute,0.4,4.34,8535,34144,0.0,"[0.39, 0.41]"
1,ICC2,Single random raters,0.42,6.4,8535,34140,0.0,"[0.3, 0.52]"
2,ICC3,Single fixed raters,0.52,6.4,8535,34140,0.0,"[0.51, 0.53]"
3,ICC1k,Average raters absolute,0.77,4.34,8535,34144,0.0,"[0.76, 0.78]"
4,ICC2k,Average random raters,0.79,6.4,8535,34140,0.0,"[0.69, 0.85]"
5,ICC3k,Average fixed raters,0.84,6.4,8535,34140,0.0,"[0.84, 0.85]"


Unnamed: 0,type,src,question,prompt,response,id,target,participant,response_num
5093,instances,motesf,What is a surprising example of something HUGE?,huge,my dad is bigger than me.,motesf_huge-6cae,2.6,motesf138bb0,8
