# Cleaning and pre-processing

The data/raw folder has files from four branches of the computational_helping repository. These branches, which collected data that was used for the analysis in the 2024 Cogsci paper, are v102b, v103, v104, and v105. While there were slight variations in these experiment versions, we believe they were not scientifically meaningful enough to warrant considering them as different experiences. The differences between these versions is described below:

- v102b was the first experiment version that collected data i included in my final analysis. it fixed some bugs from the pilot that occurred when people refreshed their browser and also added some instructions to ask people not to refresh their browser.
- v103 decreased the max number of instruction quiz attempts to 5 from 6. it added the following sentence to the instructions: "The round ends when all the veggies are in the farm box."
- v104 hand-assigned conditions (rather than random assignment) since i needed to fill in some conditions that were getting fewer participants.
- v105 returned to random condition assignment, and clarified instructions for participants who didn't complete the full task so they would click through properly to get partial compensation (previously they would just return the task and then email me about it)


In [1]:
# imports
import numpy as np
import pandas as pd
import os

import json
import math
import datetime

import gzip
import warnings

warnings.filterwarnings("ignore")

# display all columns of dataframes
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 60)

In [2]:
# for each branch (version) of the experiment we have a separate data file. these are the included versions for the dataset.
fname = [
    "real-all-v105-data.json.gz",
    "real-all-v104-data.json.gz",
    "real-all-v103-data.json.gz",
    "real-all-v102b-data.json.gz",
]
bigdf = pd.DataFrame()

dataframes = []
for file in fname:
    with gzip.open("../data/raw/" + file, "rb") as f:
        data = json.load(f)
        dataframes.append(pd.DataFrame(data))

bigdf = pd.concat(dataframes, ignore_index=True)

# at this point, bigdf is two columns: id and data. the data column is nested dictionaries with all of that participant's data
# across all stages of the experiment (quiz, captcha, task, demographics, etc)
# unpack the data dictionary
# bigdf = pd.concat(
#     [bigdf.drop(["data"], axis=1), bigdf["data"].apply(pd.Series)], axis=1
# )

# now, bigdf has the following columns:
    # ['id', 'status', 'started_game', 'trial_num', 'bonus_points',
    #    'quiz_form', 'counterbalance', 'game_setup', 'done', 'user_data',
    #    'quiz_attempts', 'withdraw_data', 'bonus_amount', 'datetime',
    #    'conditions', 'captcha_data', 'mode', 'game_data', 'finalsurvey_data',
    #    'demographic_form', 'smileconfig', 'issues', 'recruitment_service',
    #    'time_data', 'recruitment_info', 'consented', 'browser_data',
    #    'withdraw']

print(bigdf.shape)
display(bigdf.columns)

(900, 26)


Index(['id', 'status', 'started_game', 'trial_num', 'bonus_points',
       'quiz_form', 'counterbalance', 'game_setup', 'done', 'user_data',
       'quiz_attempts', 'withdraw_data', 'bonus_amount', 'datetime',
       'conditions', 'captcha_data', 'mode', 'game_data', 'finalsurvey_data',
       'demographic_form', 'issues', 'recruitment_service', 'time_data',
       'consented', 'browser_data', 'withdraw'],
      dtype='object')

Now, "bigdf" has the following columns:
- 'id': document ID set by firebase/firestore addDoc method
- 'status': not used in all experiment versions, but sets participant status during expt: "instructions", "quiz", "endFailedQuiz", "waiting" (waiting to be paired with partner), "play" (started playing the game), "endEarly" (couldn't find partner to pair), "endFinished" (completed all 12 games), "endDisconnect" (partner disconnected)
- 'started_game': false until subj is initialized as a user in the helping game (to be paired with another player)
- 'trial_num': not used
- 'bonus_points': bonus points earned (updated at end of each game)
- 'quiz_form': array of quiz attempts
- 'counterbalance': one of 16 block orders that determined order of the 12 game contexts
- 'game_setup': not used
- 'done': false until helping game emits "done" event
- 'user_data': data about player game session set when they are partnered, start single player game, or timeout to be paired and end task
- 'quiz_attempts': number of attempts at instruction quiz
- 'withdraw_data': if subj withdraws, data from form described in molecules/WithdrawFormModal.vue
- 'bonus_amount': calculated dollar amount based on bonus points
- 'datetime': datetime when smilestore is created
- 'conditions': resource, cost, and visibility condition settings
- 'captcha_data': data from the captcha game (moving apple and banana sprites to baskets)
- 'mode': appconfig.mode
- 'game_data': list of game data for the helping game! :-) 
- 'finalsurvey_data': not used (no post-game survey)
- 'demographic_form': data from demographics form at beginning of exp (DOB etc)
- 'issues': not used
- 'recruitment_service': default "web" but setRecruitmentService will change it (e.g., to prolific)
- 'time_data': not used
- 'consented': false until participant consents
- 'browser_data': fills with data on participants switching windows etc.
- 'withdraw': false unless participant withdraws

REMOVED FROM THIS DATA FOR SHARING:
- 'smileconfig': smile experiment config (sensitive lab data)
- 'recruitment_info': for prolific, prolific ID, study ID, and session ID 

In [3]:
bigdf.head()

Unnamed: 0,id,status,started_game,trial_num,bonus_points,quiz_form,counterbalance,game_setup,done,user_data,quiz_attempts,withdraw_data,bonus_amount,datetime,conditions,captcha_data,mode,game_data,finalsurvey_data,demographic_form,issues,recruitment_service,time_data,consented,browser_data,withdraw
0,1CJbe7WKBAIaC5ncSe6K,endDisconnect,True,0,410,"[{'pass': 'Press the space bar', 'goal': 'When...",3,{},True,"{'partnerName': 'C7D', 'playerName': 'Jbe', 'j...",2,{},0.41,2023-04-04T15:29:36.823Z,"{'resourceCond': 'uneven', 'costCond': 'high',...","[{'timestamp': 1680622559836, 'dragY': 8, 'dra...",production,"[{'redPoints': None, 'redFinished': False, 'fa...",{},"{'fluent_english': 'Yes', 'dob': '1975-06-24',...",[],prolific,[],True,"[{'event_type': 'blur', 'timestamp': {'seconds...",False
1,1SzT5XLyZefoThroWYjy,endFinished,True,0,759,"[{'ngames': '12', 'goal': 'When all the vegeta...",0,{},True,"{'playerName': 'zT5', 'partnerName': 'Yvn', 's...",1,{},0.76,2023-04-04T18:47:03.287Z,"{'costCond': 'high', 'resourceCond': 'even', '...","[{'dragY': 0, 'timestamp': 1680634169061, 'eve...",production,"[{'purpleYloc': 16, 'counterbalance': 0, 'turn...",{},"{'country': 'United States', 'gender': 'Male',...",[],prolific,[],True,"[{'event_type': 'blur', 'timestamp': {'seconds...",False
2,1WAhASeGQ2FIRcljqvjG,play,True,0,266,"[{'attempt': 1, 'pass': 'Click on the pillow b...",13,{},False,"{'joinAt': 1679951114590, 'status': 'play', 'p...",1,{},0.27,2023-03-27T21:03:51.380Z,"{'resourceCond': 'uneven', 'costCond': 'low', ...","[{'timestamp': 1679951050209, 'event': 'dragen...",production,"[{'condition': '{""costCond"":""low"",""resourceCon...",{},"{'household_income': '', 'color_blind': '', 'c...",[],web,[],True,"[{'event_type': 'resize', 'timestamp': {'secon...",False
3,1Xmby2bWFQ8qq4F44DS9,endFinished,True,0,1295,"[{'attempt': 1, 'bonus': 'Number of red veggie...",1,{},True,"{'session': 'GaoxGQc1Xmby2b', 'playerName': 'm...",1,{},1.3,2023-04-04T18:46:44.851Z,"{'visibilityCond': 'full', 'costCond': 'high',...","[{'event': 'dragend', 'timestamp': 16806340878...",production,"[{'gameNum': 0, 'redFirst': True, 'purpleBackp...",{},"{'gender': 'Female', 'dob': '1991-07-22', 'his...",[],prolific,[],True,"[{'event_type': 'resize', 'timestamp': {'secon...",False
4,1sWuRopCJyArujSEkN7e,endFinished,True,0,269,"[{'pass': 'Click on the pillow by your name', ...",4,{},True,"{'status': 'play', 'playerId': '1sWuRopCJyAruj...",2,{},0.27,2023-03-28T17:10:04.714Z,"{'costCond': 'high', 'resourceCond': 'uneven',...","[{'dragY': 8, 'dragX': 9, 'event': 'dragend', ...",production,"[{'target': None, 'redPointsCumulative': 0, 'g...",{},"{'gender': 'Male', 'normal_vision': 'Yes', 'do...",[],prolific,[],True,"[{'timestamp': {'seconds': 1680024076, 'nanose...",False


The "userdata" column has the player names and their session names, so we will expand that nested json dict and append those columns onto the bigdf.

We are replacing "userdata" with its columns:
- "partnerName": three character string of the partner of this participant
- "playerName": three character string randomly assigned as a username for this participant
- "joinAt": timestamp of joining the waiting room
- "playerId": same as "id", this is the document id
- "session": unique session id shared by two players
- "partner": document id of the participant this subj was partnered with
- "status": game status (either "play", "endEarly", or nan)

In [4]:
# expand userdata column so that each row says what session it is
user_data = pd.concat([bigdf["id"], bigdf["user_data"].apply(pd.Series)], axis=1) 
bigdf = pd.concat(
    [bigdf.drop(["user_data"], axis=1), bigdf["user_data"].apply(pd.Series)], axis=1
)

In [5]:
# Cut out subjects who never got partnered
bigdf = bigdf.loc[bigdf["partner"] != "NONE"]
bigdf = bigdf.loc[bigdf["partner"].notna()]

print(bigdf.shape)

(750, 32)


In [6]:
# Rename each player by their session and whether they were red or purple
bigdf["agent"] = ""

for sesh in bigdf["session"].unique():
    ids = bigdf.loc[bigdf["session"] == sesh]["id"].unique()
    if len(ids) != 2:
        print(
            "need two subjects per session. drop this session due to incomplete data: "
            + sesh
        )
        bigdf = bigdf.loc[bigdf["session"] != sesh]
    else:
        # Good, two people in this session. name them properly
        if (
            bigdf.loc[bigdf["id"] == ids[0], "joinAt"].item()
            > bigdf.loc[bigdf["id"] == ids[1], "joinAt"].item()
        ):
            bigdf.loc[bigdf["id"] == ids[0], ["agent"]] = "red"
            bigdf.loc[bigdf["id"] == ids[1], ["agent"]] = "purple"
        else:
            bigdf.loc[bigdf["id"] == ids[0], ["agent"]] = "purple"
            bigdf.loc[bigdf["id"] == ids[1], ["agent"]] = "red"

bigdf["subjid"] = (
    bigdf["session"] + bigdf["agent"]
)  # unique string for each player in each session so don't have to use id
bigdf.insert(0, "subjid", bigdf.pop("subjid"))  # put subjid at the front
bigdf = bigdf.drop(["id"], axis=1)  # remove id

print("Currently have: ")
print("N=" + str(len(bigdf["subjid"].unique())))
# Now we tentatively have this many sessions:
print("N sessions: " + str(len(bigdf["session"].unique())))

need two subjects per session. drop this session due to incomplete data: PinIaC22JNyfbX
need two subjects per session. drop this session due to incomplete data: eRC3vSJeJwOEMO
need two subjects per session. drop this session due to incomplete data: hBZigw9g27JWr7
need two subjects per session. drop this session due to incomplete data: se3SBWNpfm8zM5
need two subjects per session. drop this session due to incomplete data: qyWPm3w1kRh2kA
need two subjects per session. drop this session due to incomplete data: BXYfIZA4WsOHsq
need two subjects per session. drop this session due to incomplete data: WBW92Uv9T6gnGv
need two subjects per session. drop this session due to incomplete data: p6Dp8tOHBzIvhk
need two subjects per session. drop this session due to incomplete data: a67fDSdYlyWqpY
need two subjects per session. drop this session due to incomplete data: wiZI2j5kdexg6Q
need two subjects per session. drop this session due to incomplete data: nDbjpgnS3IzD8b
need two subjects per session. d

In [7]:
# Unfortunately, not all of these sessions are usable because many times, one participant leaves early so they don't make it through all 12 games.
# To determine which sessions to drop, we need to melt the gamedata column to take a closer look at how much data is available per session.

# This will take several minutes to run.

# game_data has one row for each subject, and the columns are trials
game_data = pd.concat(
    [bigdf[["subjid", "session"]], bigdf.pop("game_data").apply(pd.Series)], axis=1
)  # bigdf["game_data"].apply(pd.Series)

# melt to get the trial as rows
game_data = pd.melt(
    game_data,
    id_vars=["subjid", "session"],
    value_vars=game_data.columns[3:],
    var_name="trialNum",
    value_name="trialData",
)
game_data = game_data[game_data["trialData"].notna()]

# sort by subjid with trialNum increasing
game_data = (
    game_data.groupby(["subjid", "session"])
    .apply(lambda x: x.sort_values(["trialNum"], ascending=True))
    .reset_index(drop=True)
)

# unpack trialData column
game_data = pd.concat(
    [
        game_data[["subjid", "session", "trialNum"]],
        game_data["trialData"].apply(pd.Series),
    ],
    axis=1,
)
game_data

Unnamed: 0,subjid,session,trialNum,purplePointsCumulative,gameNum,purpleYloc,purpleScore,responseTime,timestamp,redScore,redEnergy,turnCount,counterbalance,farmBox,redPointsCumulative,legalMoves,objectLayer,agent,decisionMadeTimestamp,purpleEnergy,turnStartTimestamp,redBackpackSize,redXloc,farmItems,redPoints,purpleFinished,purpleXloc,purpleBackpack,redFirst,purplePoints,target,redBackpack,redYloc,gameover,eventName,purpleBackpackSize,condition,redFinished,bonuspoints,bonus
0,3SMEx832vRG5h1purple,3SMEx832vRG5h1,1,0.0,0.0,16.0,0.0,1507.0,1678824668315,0.0,100.0,2.0,10.0,,0.0,"purple_none(3,16) Tomato00(7,8) Turnip01(12,13...",Items01,purple,1.678825e+12,100.0,1.678825e+12,5.0,2.0,"Tomato00(7,8) Turnip01(12,13) Turnip00(8,7) St...",,False,3.0,,False,,Eggplant00,,15.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
1,3SMEx832vRG5h1purple,3SMEx832vRG5h1,2,0.0,0.0,15.0,0.0,1507.0,1678824670864,0.0,100.0,2.0,10.0,,0.0,,Items01,purple,1.678825e+12,78.0,1.678825e+12,5.0,2.0,"Tomato00(7,8) Turnip01(12,13) Turnip00(8,7) St...",,False,12.0,"Eggplant00(22,5)",False,,Eggplant00,,15.0,False,objectEncountered,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
2,3SMEx832vRG5h1purple,3SMEx832vRG5h1,3,0.0,0.0,15.0,0.0,2132.0,1678824672997,0.0,100.0,3.0,10.0,,0.0,"red_none(2,15) Tomato00(7,8) Turnip01(12,13) T...",Items01,red,1.678825e+12,78.0,1.678825e+12,5.0,2.0,"Tomato00(7,8) Turnip01(12,13) Turnip00(8,7) St...",,False,12.0,"Eggplant00(22,5)",False,,Tomato00,,15.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
3,3SMEx832vRG5h1purple,3SMEx832vRG5h1,4,0.0,0.0,14.0,0.0,2132.0,1678824675796,0.0,76.0,3.0,10.0,,0.0,,Items01,red,1.678825e+12,78.0,1.678825e+12,5.0,7.0,"Tomato00(22,2) Turnip01(12,13) Turnip00(8,7) S...",,False,12.0,"Eggplant00(22,5)",False,,Tomato00,"Tomato00(22,2)",9.0,False,objectEncountered,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
4,3SMEx832vRG5h1purple,3SMEx832vRG5h1,5,0.0,0.0,14.0,0.0,4502.0,1678824680299,0.0,76.0,4.0,10.0,,0.0,"purple_none(12,14) box(16,5) Turnip01(12,13) T...",Items01,purple,1.678825e+12,78.0,1.678825e+12,5.0,7.0,"Tomato00(22,2) Turnip01(12,13) Turnip00(8,7) S...",,False,12.0,"Eggplant00(22,5)",False,,Strawberry01,"Tomato00(22,2)",9.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259783,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,368,960.0,11.0,7.0,0.0,393.0,1680024305747,3.0,66.0,13.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"red_none(9,7) box(16,5)",Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,9.0,"Eggplant01(23,2) Eggplant00(25,2) Turnip00(22,...",,True,15.0,,False,,box,"Turnip00(22,2) Eggplant01(23,2) Turnip02(24,2)...",7.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
259784,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,369,960.0,11.0,7.0,5.0,393.0,1680024307787,3.0,48.0,13.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,,Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,16.0,"Eggplant01(23,9) Eggplant00(26,8) Turnip00(24,...",,True,15.0,,False,,box,,6.0,False,objectEncountered,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
259785,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,370,960.0,11.0,7.0,5.0,393.0,1680024308736,3.0,42.0,15.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"red_none(16,7)",Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,16.0,"Eggplant01(23,9) Eggplant00(26,8) Turnip00(24,...",,True,15.0,,False,,none,,7.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False,,
259786,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,371,1165.0,11.0,7.0,5.0,393.0,1680024313744,3.0,42.0,15.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",944.0,,Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,17.0,"Eggplant01(23,9) Eggplant00(26,8) Turnip00(24,...",126.0,True,15.0,,False,205.0,none,,7.0,True,gameFinished,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",True,,


In [8]:
# Are there are 12 games in each session?
print(str(game_data["session"].nunique()) + " sessions")
print(str(game_data["subjid"].nunique()) + " subjects (data records in smile)")
incomplete = [
    name for name, gp in game_data.groupby(["session"])["gameNum"] if gp.nunique() < 12
]
print(incomplete)

print("number of incomplete sessions: " + str(len(incomplete)))

print("Dropping incomplete sessions... ")
game_data = game_data.loc[~game_data["session"].isin(incomplete)]

print("\nN=" + str(game_data.subjid.nunique()))
print("or " + str(game_data.session.nunique()) + " twelve-game sessions.")

368 sessions
735 subjects (data records in smile)
[('3SMEx832vRG5h1',), ('86ny31f51Kyf8X',), ('8XRZXj87KnA1dx',), ('8bkmVMD1WAhASe',), ('DH3YDiFAy9blYY',), ('HH7fvQQGfSwFyD',), ('M9cEyrw2UXKlaG',), ('SSzMZUdI5zogmR',), ('TJiWEdWHp2OZFv',), ('UUvucQBOIgF0tp',), ('UxvWmcHR1egm0Q',), ('WjqSdCAGrYnNnA',), ('YtnI1bg8SNE2jW',), ('ZZ0Ic81LHRBvsp',), ('ZgfTzR1LU9gsLL',), ('Zho6bqfYERQYCf',), ('cED3rPb2CC3sSu',), ('epStUGMF8gyYUf',), ('fYlMwv024bnMa5',), ('fkRzaSlWQHKpqw',), ('fs3Q7MRKzokPPS',), ('i5vzEU82Nf9I3d',), ('i7wlRIzR8jhlNm',), ('iFYw4xJHXUUeTt',), ('j7XaVdZ31zBUhW',), ('kCCl6lLdQrYIKV',), ('mJfL2XUTu9D8Tr',), ('mOS7vIIkM92azZ',), ('mObqD3CDzCg1ZG',), ('n2BY9lb5w2txPK',), ('nAxSKJP80Zlemo',), ('nKh4lWQW76ECI8',), ('nsRFzukWzCsYgb',), ('nwDOyVxc54jsOH',), ('o0B0Jiy2Gi1Hf6',), ('oFC7DL71CJbe7W',), ('orxdhsueCyWbPy',), ('qBDf5rejg83qF2',), ('ruJI1RISQDUtbP',), ('sYel4LjqP4TQeL',), ('sb7OrjrSWyQgnJ',), ('t7OvwtJ2nEgXsB',), ('tR80aKa9Y2jy5u',), ('tzd3bnuLGsExuV',), ('ubi3bAZpuYDV0Q',), ('ud

In [9]:
# Furthermore, even though these sessions have 12 games (great sign!) there are cases where players quit in the last game.
# in that case, they still have incomplete data (even though they have data for 12 games, the last one is incomplete) so need to be excluded.

df = game_data.loc[game_data["eventName"].isin(["targetPicked", "objectEncountered"])]

gamelengths = {}
incomplete_sessions = []

# for each session for each game, find the length of that game. e.g., turnCount when box is last encountered (when len(farmBox)==len(farmItems))
grouped = df.groupby(["session", "gameNum"])
for name, group in grouped:
    # turn count when game is done
    lastturns = group.loc[group["target"] == "box"].loc[
        group["farmItems"].str.len() == group["farmBox"].str.len()
    ]["turnCount"]

    # are there games where there are multiple values for game length (e.g., got two different values from the two participants?)
    # print(lastturns)
    if len(lastturns) > 2:
        print(
            "err: more than two values for the last turn (should be only two since one per subject)"
        )
        print(lastturns)
        print(name)
        # print(group)
    elif len(lastturns) == 1:
        print(
            "err: only one value for last turn. this prob happened on a final game where only one player's data saved"
        )
        print(name)
        print(lastturns)
        lastturn = lastturns.item()  # lastturns.iloc[0].item()
    elif len(lastturns) == 0:
        print("err: this game did not finish so it is incomplete data!!!")
        incomplete_sessions.append(name[0])
        print(name)
    else:  # two values exactly
        if lastturns.iloc[0].item() == lastturns.iloc[1].item():
            # good! same number as they should be
            lastturn = lastturns.iloc[0].item()
        else:
            print("uh oh, two different values for last turn: ")
            print(lastturns)
            print(name)
            # print(group)
            lastturn = 10000

    # print('last turn: '+str(lastturn))
    gamelengths[name] = lastturn

# gamelengths
print("sessions to drop: ")
print(incomplete_sessions)

print("Dropping incomplete sessions... ")
df = df.loc[~df["session"].isin(incomplete_sessions)]

print("\nN=" + str(df.subjid.nunique()))
print("or " + str(df.session.nunique()) + " FULL AND COMPLETE sessions.")

# We also need to clean up the ends of the games to easily group them
# mark the last trial of the game as such
df["lastTrial"] = df.apply(
    lambda row: (
        True
        if row["turnCount"] == gamelengths[(row["session"], row["gameNum"])]
        else False
    ),
    axis=1,
)

# keep only the trials before the last move of the game
df["keep"] = df.apply(
    lambda row: (
        True
        if row["turnCount"] <= gamelengths[(row["session"], row["gameNum"])]
        else False
    ),
    axis=1,
)
df = df[df["keep"]]

# keep trials with targetPicked only, except also keep lastTrial objectEncountered (that's our new "gameFinished" row)
df["keep"] = df.apply(
    lambda row: (
        False
        if (row["eventName"] == "objectEncountered" and not row["lastTrial"] == True)
        else True
    ),
    axis=1,
)
df = df[df["keep"]]

# mark those rows as the row ending the game
df["gameover"] = df.apply(
    lambda row: True if row["eventName"] == "objectEncountered" else False, axis=1
)

# calculate the points earned for that game
df["redPoints"] = df.apply(
    lambda row: row["redEnergy"] * row["redScore"] if row["gameover"] == True else 0,
    axis=1,
)
df["purplePoints"] = df.apply(
    lambda row: (
        row["purpleEnergy"] * row["purpleScore"] if row["gameover"] == True else 0
    ),
    axis=1,
)

df = df.drop(
    ["purpleFinished", "redFinished", "keep", "bonuspoints", "bonus"], axis=1
)  # dont need these one
df

err: this game did not finish so it is incomplete data!!!
('3SMEx832vRG5h1', np.float64(6.0))
err: this game did not finish so it is incomplete data!!!
('86ny31f51Kyf8X', np.float64(6.0))
err: this game did not finish so it is incomplete data!!!
('8XRZXj87KnA1dx', np.float64(6.0))
err: only one value for last turn. this prob happened on a final game where only one player's data saved
('Bj5ET9K9eXGWRE', np.float64(11.0))
12432    12.0
Name: turnCount, dtype: float64
err: this game did not finish so it is incomplete data!!!
('DH3YDiFAy9blYY', np.float64(5.0))
err: only one value for last turn. this prob happened on a final game where only one player's data saved
('GIpt06E7scyQAs', np.float64(11.0))
20914    12.0
Name: turnCount, dtype: float64
err: this game did not finish so it is incomplete data!!!
('HH7fvQQGfSwFyD', np.float64(2.0))
err: only one value for last turn. this prob happened on a final game where only one player's data saved
('LLlh0LYAxsUnae', np.float64(11.0))
31024    11.

Unnamed: 0,subjid,session,trialNum,purplePointsCumulative,gameNum,purpleYloc,purpleScore,responseTime,timestamp,redScore,redEnergy,turnCount,counterbalance,farmBox,redPointsCumulative,legalMoves,objectLayer,agent,decisionMadeTimestamp,purpleEnergy,turnStartTimestamp,redBackpackSize,redXloc,farmItems,redPoints,purpleXloc,purpleBackpack,redFirst,purplePoints,target,redBackpack,redYloc,gameover,eventName,purpleBackpackSize,condition,lastTrial
492,4ISIiFA1J99JScpurple,4ISIiFA1J99JSc,1,0.0,0.0,16.0,0.0,3506.0,1678818024693,0.0,100.0,2.0,12.0,,0.0,"red_none(2,15) Tomato04(9,13) Turnip02(14,15) ...",Items04,red,1.678818e+12,100.0,1.678818e+12,4.0,2.0,"Tomato04(9,13) Turnip02(14,15) Turnip00(11,7) ...",0.0,3.0,,True,0.0,Tomato04,,15.0,False,targetPicked,4.0,"{""costCond"":""high"",""resourceCond"":""even"",""visi...",False
494,4ISIiFA1J99JScpurple,4ISIiFA1J99JSc,3,0.0,0.0,16.0,0.0,3786.0,1678818030573,0.0,82.0,3.0,12.0,,0.0,"purple_none(3,16) Turnip02(14,15) Turnip00(11,...",Items04,purple,1.678818e+12,100.0,1.678818e+12,4.0,9.0,"Tomato04(22,2) Turnip02(14,15) Turnip00(11,7) ...",0.0,3.0,,True,0.0,Eggplant01,"Tomato04(22,2)",14.0,False,targetPicked,4.0,"{""costCond"":""high"",""resourceCond"":""even"",""visi...",False
496,4ISIiFA1J99JScpurple,4ISIiFA1J99JSc,5,0.0,0.0,11.0,0.0,2502.0,1678818034902,0.0,82.0,4.0,12.0,,0.0,"red_none(9,13) box(16,5) Turnip02(14,15) Turni...",Items04,red,1.678818e+12,84.0,1.678818e+12,4.0,9.0,"Tomato04(22,2) Turnip02(14,15) Turnip00(11,7) ...",0.0,5.0,"Eggplant01(22,5)",True,0.0,Strawberry02,"Tomato04(22,2)",13.0,False,targetPicked,4.0,"{""costCond"":""high"",""resourceCond"":""even"",""visi...",False
498,4ISIiFA1J99JScpurple,4ISIiFA1J99JSc,7,0.0,0.0,10.0,0.0,1768.0,1678818036701,0.0,80.0,5.0,12.0,,0.0,"purple_none(5,10) box(16,5) Turnip02(14,15) Tu...",Items04,purple,1.678818e+12,84.0,1.678818e+12,4.0,9.0,"Tomato04(22,2) Turnip02(14,15) Turnip00(11,7) ...",0.0,5.0,"Eggplant01(22,5)",True,0.0,Turnip01,"Tomato04(22,2) Strawberry02(23,2)",13.0,False,targetPicked,4.0,"{""costCond"":""high"",""resourceCond"":""even"",""visi...",False
500,4ISIiFA1J99JScpurple,4ISIiFA1J99JSc,9,0.0,0.0,10.0,0.0,1501.0,1678818038234,0.0,80.0,6.0,12.0,,0.0,"red_none(9,14) box(16,5) Turnip02(14,15) Turni...",Items04,red,1.678818e+12,82.0,1.678818e+12,4.0,9.0,"Tomato04(22,2) Turnip02(14,15) Turnip00(11,7) ...",0.0,5.0,"Eggplant01(22,5) Turnip01(23,5)",True,0.0,Strawberry01,"Tomato04(22,2) Strawberry02(23,2)",14.0,False,targetPicked,4.0,"{""costCond"":""high"",""resourceCond"":""even"",""visi...",False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259778,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,363,960.0,11.0,7.0,0.0,8002.0,1680024300039,3.0,68.0,10.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"purple_none(15,7) Turnip01(9,7)",Items08,purple,1.680024e+12,46.0,1.680024e+12,5.0,8.0,"Eggplant01(23,2) Eggplant00(25,2) Turnip00(22,...",0.0,15.0,,False,0.0,purplePillow,"Turnip00(22,2) Eggplant01(23,2) Turnip02(24,2)...",8.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False
259780,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,365,960.0,11.0,7.0,0.0,278.0,1680024302833,3.0,68.0,11.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"red_none(8,7) box(16,5) Turnip01(9,7)",Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,8.0,"Eggplant01(23,2) Eggplant00(25,2) Turnip00(22,...",0.0,15.0,,False,0.0,Turnip01,"Turnip00(22,2) Eggplant01(23,2) Turnip02(24,2)...",7.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False
259782,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,367,960.0,11.0,7.0,0.0,278.0,1680024302854,3.0,66.0,12.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"purple_none(15,7)",Items08,purple,1.680024e+12,41.0,1.680024e+12,5.0,8.0,"Eggplant01(23,2) Eggplant00(25,2) Turnip00(22,...",0.0,15.0,,False,0.0,none,"Turnip00(22,2) Eggplant01(23,2) Turnip02(24,2)...",7.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",False
259783,zRNUjyaiWBgEP0red,zRNUjyaiWBgEP0,368,960.0,11.0,7.0,0.0,393.0,1680024305747,3.0,66.0,13.0,3.0,"Strawberry01(22,8) Strawberry00(23,8) Tomato00...",818.0,"red_none(9,7) box(16,5)",Items08,red,1.680024e+12,41.0,1.680024e+12,5.0,9.0,"Eggplant01(23,2) Eggplant00(25,2) Turnip00(22,...",0.0,15.0,,False,0.0,box,"Turnip00(22,2) Eggplant01(23,2) Turnip02(24,2)...",7.0,False,targetPicked,3.0,"{""costCond"":""high"",""resourceCond"":""uneven"",""vi...",True


In [10]:
# I will note that there are some games within a session where we only have one partner's data (this is always the last game of a session
# and I presume it happens because one player finishes too quickly and the smile buffer prevents fast writes to data)

print("cases in which there are not two subjects per game: ")

# another way of confirming the same thing. trying to get those session and game numbers that we need to handle a little differently
# because there is only one subject to use data from.
onesubjectonly = []
for name, grp in df.groupby(["session", "gameNum"]):
    ngameovers = len(grp.loc[grp["gameover"] == True])
    if ngameovers != 2:
        id = grp["subjid"].unique()
        # print(id[0])
        onesubjectonly.append(name)
        # print(name)
        # print(ngameovers)
print(str(len(onesubjectonly)) + " games, " + str(onesubjectonly))

# print(onesubjectonly)

cases in which there are not two subjects per game: 
17 games, [('Bj5ET9K9eXGWRE', np.float64(11.0)), ('GIpt06E7scyQAs', np.float64(11.0)), ('LLlh0LYAxsUnae', np.float64(11.0)), ('MnUx0OULOzy46v', np.float64(11.0)), ('Ujdl2K6GeflUWs', np.float64(11.0)), ('Uyet9JuJtlLzZb', np.float64(11.0)), ('YTLpZxpJ8wFKYn', np.float64(11.0)), ('bSxiPUDOjnyQuF', np.float64(11.0)), ('bVmYLCYVLxicfd', np.float64(11.0)), ('gA6I24mQCRlplv', np.float64(11.0)), ('hqdXjJ50oSQrNk', np.float64(11.0)), ('mB2Zp8vYtp5D8Q', np.float64(11.0)), ('mpcGaIj4HGABbb', np.float64(11.0)), ('nBkV82AICknxGh', np.float64(11.0)), ('oGOi0GITrDQ7sp', np.float64(11.0)), ('qR0OZtlpBD00E3', np.float64(11.0)), ('utdwJjYg9S6Dx6', np.float64(11.0))]


In [11]:
# for each game, keep only the data from one subject (since each subject saves all trials from each game)
# if two subjects data available, make sure you pick one who has complete data (because sometimes one of them doesn't D:)
# if one subject available, that is the one

for name, grp in df.groupby(["session", "gameNum"]):
    sesh = name[0]
    game = name[1]

    if name in onesubjectonly:
        # handle this case - we only have one subject's copy of the data from this game
        # what color is the player we DO have the data from?
        id = grp["subjid"].unique()[0]  # .item()
        color = "red" if "red" in id else "purple" if "purple" in id else "whatdawhat"
        partnercolor = (
            "red"
            if color == "purple"
            else "purple" if color == "red" else "whatdeewhat"
        )

        # find the rows in this game that were choices made by the partner of the subjid player
        mask = (
            (df["session"] == sesh)
            & (df["gameNum"] == game)
            & (df["agent"] == partnercolor)
        )
        # rename the subjid for all the partner's data so it's associated with the correct subjid (e.g., for subj level analysis)
        df.loc[mask, "subjid"] = sesh + partnercolor

# now we are done handling the special cases so everyone left should have two copies of the game data for each game
# so we have two copies of the game data for this game (one from each subject)
# drop trials where subjid and agent color mismatch
mask = ~(
    (df["agent"] == "purple") & (df["subjid"].str.contains("red"))
)  # (df['agent']=='purple') & (df['subjid'].str.contains('purple')) or (df['agent']=='red') & (df['subjid'].str.contains('red'))
df = df.loc[mask]  # keep everyone BUT mask
mask = ~((df["agent"] == "red") & (df["subjid"].str.contains("purple")))
df = df.loc[mask]
# now we only have one copy of each game trial, labeled correctly by subjid as to who performed the action

# sort by trialNum increasing to interleave red and purple's data so it's consecutive within game
df = df.groupby(["session", "gameNum"]).apply(
    lambda x: x.sort_values(["turnCount"], ascending=True)
)  # .sort_values(ascending=True)

# how did we do?
df = df.reset_index(drop=True)

# anythin else to clean up before we export?
# unpack the condition column
conditions = df["condition"].dropna().map(eval).apply(pd.Series)
df = pd.concat([df, conditions], axis=1)

# reorder the columns to look nice
df = df[
    [
        "subjid",
        "session",
        "trialNum",
        "gameNum",
        "costCond",
        "resourceCond",
        "visibilityCond",
        "redFirst",
        "counterbalance",
        "objectLayer",  # details about this game setup
        "eventName",
        "turnCount",
        "agent",
        "target",  # details about this trial specifically
        "turnStartTimestamp",
        "responseTime",
        "decisionMadeTimestamp",
        "timestamp",  # timing information
        "redXloc",
        "redYloc",
        "purpleXloc",
        "purpleYloc",
        "farmItems",
        "farmBox",
        "purpleBackpack",
        "redBackpack",  # what the game looks like on this trial to both players
        "gameover",
        "legalMoves",  # whether the game is over, what legal moves each player could take
        "purpleBackpackSize",
        "purpleEnergy",
        "purpleScore",
        "purplePoints",
        "purplePointsCumulative",  # purple's current items, energy, and scores
        "redBackpackSize",
        "redEnergy",
        "redScore",
        "redPoints",
        "redPointsCumulative",  # red's current items, energy, and scores
        "lastTrial",  # we added this one to keep track of last trial
    ]
]

# i would like purplePointsCumulative and redPC to update on the gameover trial rather than the first trial of the next game
df.loc[df["gameover"] == True, "purplePointsCumulative"] = (
    df.loc[df["gameover"] == True, "purplePointsCumulative"]
    + df.loc[df["gameover"] == True, "purplePoints"]
)
df.loc[df["gameover"] == True, "redPointsCumulative"] = (
    df.loc[df["gameover"] == True, "redPointsCumulative"]
    + df.loc[df["gameover"] == True, "redPoints"]
)

# how does timestamp differ from turnStartTimestamp and decisionMadeTimestamp?
# "timestamp" is the time when the data was saved to smile. let's rename it for clarity.
df = df.rename(columns={"timestamp": "dataSavedTimestamp"})

# integer columns
df["trialNum"].astype("int")
df = df.astype(
    {
        "trialNum": "int",
        "gameNum": "int",
        "turnCount": "int",
        "responseTime": "int",
        "turnStartTimestamp": "int",
        "decisionMadeTimestamp": "int",
        "dataSavedTimestamp": "int",
        "redXloc": "int",
        "redYloc": "int",
        "purpleXloc": "int",
        "purpleYloc": "int",
        "purpleBackpackSize": "int",
        "purpleEnergy": "int",
        "purpleScore": "int",
        "purplePoints": "int",
        "purplePointsCumulative": "int",
        "redBackpackSize": "int",
        "redEnergy": "int",
        "redScore": "int",
        "redPoints": "int",
        "redPointsCumulative": "int",
    }
)

# booleans
df = df.astype({"gameover": "boolean", "lastTrial": "boolean", "redFirst": "boolean"})

# categorical
df = df.astype(
    {
        "subjid": "category",
        "session": "category",
        "costCond": "category",
        "resourceCond": "category",
        "visibilityCond": "category",
        "counterbalance": "category",
        "objectLayer": "category",
        "eventName": "category",
        "agent": "category",
        "target": "category",
    }
)

In [12]:
# We have now fully processed the gamedata. In doing so, we determined which sessions to keep and which to exclude. We cleaned the data and removed redundant copies.
include_sessions = df["session"].unique()
include_subjects = df["subjid"].unique()

print("Num sessions: " + str(len(include_sessions)))
print("N participants: " + str(len(include_subjects)))

Num sessions: 315
N participants: 630


In [13]:
# Let's export the game data to one entire big file and then as separate csv for each session.

# save this gamedata save all in one
path = "../data/alldata"
os.makedirs(path, exist_ok=True)

write_to_file = False


if write_to_file:

    print("writing " + str(df["session"].nunique()) + " files, one per session")
    for name, grp in df.groupby(["session"]):
        fname = path + "/gamedata_" + name + ".csv"
        grp.to_csv(fname, index=False)
        print(".", end=" ")
        # print('file saved to: '+ fname)
    print("\ndone!")

    fname = path + "/gamedata_all" + ".csv"
    df.to_csv(fname, index=False)
    print("file saved to: " + fname)

    # save a list of the valid sessions in case you need to grab them from the raw uncleaned data later
    fname = path + "/valid_sessions" + ".csv"
    np.savetxt(fname, df["session"].unique(), delimiter=",", fmt="%s")
    print("file saved to: " + fname)

In [14]:
# Let's filter the rest of the dataset (e.g., demographics, quiz, captcha) by the included subjects only.
# Then we can save subject level data for these other parts of the experiment if we want to explore them and relate them to the game data.
bigdf = bigdf.loc[bigdf["session"].isin(df["session"])]
bigdf

Unnamed: 0,subjid,status,started_game,trial_num,bonus_points,quiz_form,counterbalance,game_setup,done,quiz_attempts,withdraw_data,bonus_amount,datetime,conditions,captcha_data,mode,finalsurvey_data,demographic_form,issues,recruitment_service,time_data,consented,browser_data,withdraw,partnerName,playerName,joinAt,playerId,session,partner,status.1,agent
1,d2YvnIb1SzT5XLred,endFinished,True,0,759,"[{'ngames': '12', 'goal': 'When all the vegeta...",0,{},True,1,{},0.76,2023-04-04T18:47:03.287Z,"{'costCond': 'high', 'resourceCond': 'even', '...","[{'dragY': 0, 'timestamp': 1680634169061, 'eve...",production,{},"{'country': 'United States', 'gender': 'Male',...",[],prolific,[],True,"[{'event_type': 'blur', 'timestamp': {'seconds...",False,Yvn,zT5,1.680634e+12,1SzT5XLyZefoThroWYjy,d2YvnIb1SzT5XL,d2YvnIbNKnEc0MgWd7Sc,play,red
2,8bkmVMD1WAhASepurple,play,True,0,266,"[{'attempt': 1, 'pass': 'Click on the pillow b...",13,{},False,1,{},0.27,2023-03-27T21:03:51.380Z,"{'resourceCond': 'uneven', 'costCond': 'low', ...","[{'timestamp': 1679951050209, 'event': 'dragen...",production,{},"{'household_income': '', 'color_blind': '', 'c...",[],web,[],True,"[{'event_type': 'resize', 'timestamp': {'secon...",False,kmV,AhA,1.679951e+12,1WAhASeGQ2FIRcljqvjG,8bkmVMD1WAhASe,8bkmVMDoKiwmwMJi7tPF,play,purple
3,GaoxGQc1Xmby2bred,endFinished,True,0,1295,"[{'attempt': 1, 'bonus': 'Number of red veggie...",1,{},True,1,{},1.30,2023-04-04T18:46:44.851Z,"{'visibilityCond': 'full', 'costCond': 'high',...","[{'event': 'dragend', 'timestamp': 16806340878...",production,{},"{'gender': 'Female', 'dob': '1991-07-22', 'his...",[],prolific,[],True,"[{'event_type': 'resize', 'timestamp': {'secon...",False,oxG,mby,1.680634e+12,1Xmby2bWFQ8qq4F44DS9,GaoxGQc1Xmby2b,GaoxGQcfiE7Nnu7Nm0df,play,red
4,TivIAtz1sWuRopred,endFinished,True,0,269,"[{'pass': 'Click on the pillow by your name', ...",4,{},True,2,{},0.27,2023-03-28T17:10:04.714Z,"{'costCond': 'high', 'resourceCond': 'uneven',...","[{'dragY': 8, 'dragX': 9, 'event': 'dragend', ...",production,{},"{'gender': 'Male', 'normal_vision': 'Yes', 'do...",[],prolific,[],True,"[{'timestamp': {'seconds': 1680024076, 'nanose...",False,vIA,WuR,1.680025e+12,1sWuRopCJyArujSEkN7e,TivIAtz1sWuRop,TivIAtzSuCe17yCg6VsJ,play,red
7,Z7SpYO22HIBbPMpurple,endFinished,True,0,2148,[{'goal': 'When all the vegetables are in the ...,5,{},True,1,{},2.15,2023-04-04T15:27:39.036Z,"{'visibilityCond': 'self', 'resourceCond': 'un...","[{'event': 'dragend', 'dragX': -7, 'dragY': 20...",production,{},"{'household_income': '$60,000–$79,999', 'norma...",[],prolific,[],True,"[{'timestamp': {'seconds': 1680622059, 'nanose...",False,SpY,IBb,1.680622e+12,2HIBbPMuiZdUXX7zj8jL,Z7SpYO22HIBbPM,Z7SpYO2pWD5Vqay3zFXU,play,purple
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
891,w2ixw51E2kOOXppurple,,True,0,3180,[{'goal': 'When all the vegetables are in the ...,6,{},True,3,{},3.18,2023-03-09T20:52:56.153Z,"{'visibilityCond': 'self', 'resourceCond': 'ev...","[{'event': 'dragend', 'timestamp': 16783952628...",production,{},"{'color_blind': 'No', 'household_income': '$10...",[],prolific,[],True,"[{'event_type': 'blur', 'timestamp': {'seconds...",False,kOO,ixw,1.678396e+12,w2ixw51ETMVQaBTvvSNg,w2ixw51E2kOOXp,E2kOOXpl8eJvxmDXVSJw,play,purple
893,yExSMPVAyBy9Yipurple,,True,0,422,"[{'cent': '10', 'move': 'Mouse clicks', 'bonus...",4,{},True,1,{},0.43,2023-03-08T22:33:57.653Z,"{'visibilityCond': 'full', 'resourceCond': 'un...","[{'event': 'dragend', 'dragY': 8.0227355957, '...",production,{},"{'fluent_english': 'Yes', 'race': 'Caucasian/W...",[],prolific,[],True,"[{'timestamp': {'seconds': 1678314837, 'nanose...",False,By9,xSM,1.678315e+12,yExSMPVN9tcjP3NnYGhG,yExSMPVAyBy9Yi,AyBy9Yi8zdH5QerDLMOj,play,purple
894,yM7Mq6iCXeB9Svpurple,,True,0,3030,"[{'move': 'Mouse clicks', 'cent': '10', 'goal'...",5,{},False,1,{},3.03,2023-03-08T22:37:55.253Z,"{'resourceCond': 'uneven', 'costCond': 'low', ...","[{'dragX': 5.1999816895, 'timestamp': 16783151...",production,{},"{'household_income': '$60,000–$79,999', 'norma...",[],prolific,[],True,[],False,eB9,7Mq,1.678315e+12,yM7Mq6i512GasHQXDkVP,yM7Mq6iCXeB9Sv,CXeB9SvCqcbYA185YwuN,play,purple
895,yPQzz9iPJVUCQmred,,True,0,1090,"[{'pass': 'Click on the pillow by your name', ...",12,{},True,2,{},1.09,2023-03-08T21:28:37.338Z,"{'visibilityCond': 'full', 'resourceCond': 'un...","[{'timestamp': 1678311041512, 'dragY': 24, 'ev...",production,{},{'education_level': 'Technical/Community Colle...,[],prolific,[],True,"[{'event_type': 'blur', 'timestamp': {'seconds...",False,VUC,Qzz,1.678312e+12,yPQzz9iQt3T6FxZaWDZm,yPQzz9iPJVUCQm,PJVUCQmybYYnpFzpjNXq,play,red


In [16]:
# expand columns of dictionaries and drop original column
captcha_data = pd.concat(
    [bigdf["subjid"], bigdf.pop("captcha_data").apply(pd.Series)], axis=1
)
quiz_form = pd.concat(
    [bigdf["subjid"], bigdf.pop("quiz_form").apply(pd.Series)], axis=1
)
demographic_form = pd.concat(
    [bigdf["subjid"], bigdf.pop("demographic_form").apply(pd.Series)], axis=1
)
conditions = pd.concat(
    [bigdf["subjid"], bigdf.pop("conditions").apply(pd.Series)], axis=1
)
browser_data = pd.concat(
    [bigdf["subjid"], bigdf.pop("browser_data").apply(pd.Series)], axis=1
)

bigdf

Unnamed: 0,subjid,status,started_game,trial_num,bonus_points,counterbalance,game_setup,done,quiz_attempts,withdraw_data,bonus_amount,datetime,mode,finalsurvey_data,issues,recruitment_service,time_data,consented,withdraw,partnerName,playerName,joinAt,playerId,session,partner,status.1,agent
1,d2YvnIb1SzT5XLred,endFinished,True,0,759,0,{},True,1,{},0.76,2023-04-04T18:47:03.287Z,production,{},[],prolific,[],True,False,Yvn,zT5,1.680634e+12,1SzT5XLyZefoThroWYjy,d2YvnIb1SzT5XL,d2YvnIbNKnEc0MgWd7Sc,play,red
2,8bkmVMD1WAhASepurple,play,True,0,266,13,{},False,1,{},0.27,2023-03-27T21:03:51.380Z,production,{},[],web,[],True,False,kmV,AhA,1.679951e+12,1WAhASeGQ2FIRcljqvjG,8bkmVMD1WAhASe,8bkmVMDoKiwmwMJi7tPF,play,purple
3,GaoxGQc1Xmby2bred,endFinished,True,0,1295,1,{},True,1,{},1.30,2023-04-04T18:46:44.851Z,production,{},[],prolific,[],True,False,oxG,mby,1.680634e+12,1Xmby2bWFQ8qq4F44DS9,GaoxGQc1Xmby2b,GaoxGQcfiE7Nnu7Nm0df,play,red
4,TivIAtz1sWuRopred,endFinished,True,0,269,4,{},True,2,{},0.27,2023-03-28T17:10:04.714Z,production,{},[],prolific,[],True,False,vIA,WuR,1.680025e+12,1sWuRopCJyArujSEkN7e,TivIAtz1sWuRop,TivIAtzSuCe17yCg6VsJ,play,red
7,Z7SpYO22HIBbPMpurple,endFinished,True,0,2148,5,{},True,1,{},2.15,2023-04-04T15:27:39.036Z,production,{},[],prolific,[],True,False,SpY,IBb,1.680622e+12,2HIBbPMuiZdUXX7zj8jL,Z7SpYO22HIBbPM,Z7SpYO2pWD5Vqay3zFXU,play,purple
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
891,w2ixw51E2kOOXppurple,,True,0,3180,6,{},True,3,{},3.18,2023-03-09T20:52:56.153Z,production,{},[],prolific,[],True,False,kOO,ixw,1.678396e+12,w2ixw51ETMVQaBTvvSNg,w2ixw51E2kOOXp,E2kOOXpl8eJvxmDXVSJw,play,purple
893,yExSMPVAyBy9Yipurple,,True,0,422,4,{},True,1,{},0.43,2023-03-08T22:33:57.653Z,production,{},[],prolific,[],True,False,By9,xSM,1.678315e+12,yExSMPVN9tcjP3NnYGhG,yExSMPVAyBy9Yi,AyBy9Yi8zdH5QerDLMOj,play,purple
894,yM7Mq6iCXeB9Svpurple,,True,0,3030,5,{},False,1,{},3.03,2023-03-08T22:37:55.253Z,production,{},[],prolific,[],True,False,eB9,7Mq,1.678315e+12,yM7Mq6i512GasHQXDkVP,yM7Mq6iCXeB9Sv,CXeB9SvCqcbYA185YwuN,play,purple
895,yPQzz9iPJVUCQmred,,True,0,1090,12,{},True,2,{},1.09,2023-03-08T21:28:37.338Z,production,{},[],prolific,[],True,False,VUC,Qzz,1.678312e+12,yPQzz9iQt3T6FxZaWDZm,yPQzz9iPJVUCQm,PJVUCQmybYYnpFzpjNXq,play,red


In [17]:
# fnamedict = {
#     "smileconfig_data": smileconfig,
#     "captcha_data": captcha_data,
#     "quiz_data": quiz_form,
#     "demographic_data": demographic_form,
#     "conditions_data": conditions,
#     "recruitment_data": recruitment_info,
#     "browser_data": browser_data,
#     "misc_data": bigdf,
# }

# for f, df in fnamedict.items():
#     fname = path + "/" + f + ".csv"
#     df.to_csv(fname, index=False)
#     print("file saved to: " + fname)