# SAMPLE DATA PROCESSING PIPELINES

Hi! Say you've collected some data on PC Ibex, transformed the raw data into a cleaner file (using `process_raw_data.R`), and now you're looking to analyze what you've got. But, GRIS experiments come with a lot of data, and it can be difficult to know where to start.

This notebook will outline three sample data pipelines (two gradient templates, one categorical template; the type of analysis is demarcated in the header for each sample pipeline). Rather than having to write all of the data analysis code yourself, we've provided you some (optimized) functions in `utils.py` that will help us get what we need from our data.

In [1]:
import numpy as np
import pandas as pd 
import ast
from utils import *

## SAMPLE PIPELINE 1 (GRADIENT): 
#### Semantic Proto-Roles data provided by Zander Lynch (zcl7@cornell.edu)

Zander Lynch ran a GRIS experiment that studies how people process differences in semantic proto roles. *To avoid issues with generalization, we do not line up the experimental items (and their conditions) with the results files.* However, we do process the results file such that it can be easily aligned with the stimuli.

Note that Zander's experiment uses an earlier version of GRIS where coordinates are 2D (x, y), as opposed to the most recent version of GRIS where coordinates are 4D (x_cat, y_cat, x, y). The functions provided in `utils.py` natively account for this flexibility.

In [2]:
# Read in the data
data = pd.read_csv('data/demo-1-cleaned.csv')

In [3]:
# See what the data look like
data.head()

Unnamed: 0,Results.reception.time,MD5.hash.of.participant.s.IP.address,Controller.name,Order.number.of.item,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,...,Sentence4Cond,Sentence5,Sentence5Cond,Sentence6,Sentence6Cond,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments
0,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,1,0,Prolific-ID,,PennController,0,_Trial_,...,,,,,,,,,,
1,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,1,0,Prolific-ID,,PennController,0,_Header_,...,,,,,,,,,,
2,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,1,0,Prolific-ID,,PennController,0,_Header_,...,,,,,,,,,,
3,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,1,0,Prolific-ID,,Var,subject,Final,...,,,,,,,,,,
4,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,1,0,Prolific-ID,,TextInput,Pro-ID-input,EnterReturn,...,,,,,,,,,,


### Time series analyses

If we want to focus on the incremental drags and drops of an individual trial, as well as the total time it took a participant to complete that trial, we can use the `compute_action_times` function on the relevant rows from the results file.


In [4]:
# Make a copy of the data for our incremental purposes
incremental = data.copy()

In [5]:
# Keeping only the rows that are trials
incremental = incremental[incremental['Label'] == 'trials'].copy()

In [6]:
# Visualizing our trials (note the "Label" column)
incremental.head()

Unnamed: 0,Results.reception.time,MD5.hash.of.participant.s.IP.address,Controller.name,Order.number.of.item,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,...,Sentence4Cond,Sentence5,Sentence5Cond,Sentence6,Sentence6Cond,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments
47,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,6,0,trials,,PennController,7,_Trial_,...,ModAware4,Dignitaries condemn the rebellious act.,ModAware4,He had this habit of telling you everything.,NoModAware2,I e-mailed your assistant earlier this morning .,NoModAware4,Ben made another happy announcement on March 28.,ModAware2,
48,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,6,0,trials,,PennController,7,_Header_,...,ModAware4,Dignitaries condemn the rebellious act.,ModAware4,He had this habit of telling you everything.,NoModAware2,I e-mailed your assistant earlier this morning .,NoModAware4,Ben made another happy announcement on March 28.,ModAware2,
49,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,6,0,trials,,PennController,7,_Header_,...,ModAware4,Dignitaries condemn the rebellious act.,ModAware4,He had this habit of telling you everything.,NoModAware2,I e-mailed your assistant earlier this morning .,NoModAware4,Ben made another happy announcement on March 28.,ModAware2,
50,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,6,0,trials,,DragDrop,experimental-trials,Drag,...,ModAware4,Dignitaries condemn the rebellious act.,ModAware4,He had this habit of telling you everything.,NoModAware2,I e-mailed your assistant earlier this morning .,NoModAware4,Ben made another happy announcement on March 28.,ModAware2,Dropped on (8%2C 2)
51,1742304662,abda418378729d84a1a46fe1f54043bc,PennController,6,0,trials,,DragDrop,experimental-trials,Drop,...,ModAware4,Dignitaries condemn the rebellious act.,ModAware4,He had this habit of telling you everything.,NoModAware2,I e-mailed your assistant earlier this morning .,NoModAware4,Ben made another happy announcement on March 28.,ModAware2,Dopped Dignitaries condemn the rebellious act.


In [7]:
# Compute the incremental action times
incremental = compute_action_times(incremental)

If we visualize `incremental` now, we will now see the following columns:
- EventIndex (the order of events as trials persisted)
- TimeSinceLastEvent (how long it took to complete the current event)
- TotalItemTime (how long the full trial took for this participant)
- EventTime (the exact time this event occurred)

In [8]:
# Visualizing the end of the file
incremental.tail()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,Sentence4Cond,Sentence5,Sentence5Cond,Sentence6,Sentence6Cond,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments
1250,e8cbee887ff48a5f732382ffa9fac6e1,9,19,1742307856536,1202.0,31798,1742307860,PennController,0,trials,...,ModVolition2,The president demoted the expert from his cabi...,NoModVolition2,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,Dropped on (4%2C 7)
1251,e8cbee887ff48a5f732382ffa9fac6e1,9,20,1742307857435,899.0,31798,1742307860,PennController,0,trials,...,ModVolition2,The president demoted the expert from his cabi...,NoModVolition2,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,Dopped They chased the proud rebels out of the...
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,ModVolition2,The president demoted the expert from his cabi...,NoModVolition2,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,
1253,e8cbee887ff48a5f732382ffa9fac6e1,9,22,1742307858295,1.0,31798,1742307860,PennController,0,trials,...,ModVolition2,The president demoted the expert from his cabi...,NoModVolition2,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,Value at the end of the trial
1254,e8cbee887ff48a5f732382ffa9fac6e1,9,23,1742307858295,0.0,31798,1742307860,PennController,0,trials,...,ModVolition2,The president demoted the expert from his cabi...,NoModVolition2,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,


In [9]:
# Saving to CSV
incremental.to_csv('outputs/demo-1-incremental.csv', index=False)

# Distance Analyses
Fundamentally, GRIS is used to measure the distance between objects in space. We can see how participants positioned these objects in relation to one another by splitting our data as follows, cleaning the `Final` positions with `clean_string`, building the graphs with `expand_graphs`, and finally calculating the pairwise distances between all objects in a trial with `compute_pairwise_distances`.

In [10]:
# Keeping the rows which have the "Final" positions of all objects. 
distance = incremental[(incremental['Parameter'] == 'Final') & (incremental['PennElementName'] == 'experimental-trials')].copy()

In [11]:
distance.head()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,Sentence4Cond,Sentence5,Sentence5Cond,Sentence6,Sentence6Cond,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments
49,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,ModAware4,I put my angry foot in the cool water.,ModAware4,Jane calls the bluff.,NoModAware2,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,
93,32ab20bbeccaa48346503c065698e438,7,41,1742304623301,1800.0,80890,1742304762,PennController,0,trials,...,ModInstigation2,The managers staffed the organization with toa...,NoModInstigation4,I found the agreeable office to be very clean.,ModInstigation4,The politician pinched a few energetic nerves ...,ModInstigation2,Dr Johnson fixed my neck from a snowboard injury.,NoModInstigation2,
135,32ab20bbeccaa48346503c065698e438,8,39,1742304689784,2063.0,66475,1742304762,PennController,0,trials,...,ModSentience4,In the end%2C John kept his promise.,NoModSentience2,I saw this place.,NoModSentience4,Pound for pound%2C mountain lions eat more gen...,ModSentience4,I enjoyed my lively tour.,ModSentience2,
167,32ab20bbeccaa48346503c065698e438,9,29,1742304756247,1824.0,66451,1742304762,PennController,0,trials,...,ModVolition2,They chased the rebels out of the capital.,NoModVolition2,You want your dog to admire you as his pack le...,NoModVolition4,I encourage you to do this joyful dance.,ModVolition2,Sam used his duplicitous company to buy items ...,ModVolition4,
191,3cc42984c8fd67719dbad98c057b931f,6,21,1742308199127,5566.0,117058,1742308418,PennController,0,trials,...,NoModAware4,I put my foot in the cool water.,NoModAware4,Jane calls the aggressive bluff.,ModAware2,Dr. Hart killed our kind pet.,ModAware4,She loves giraffes.,NoModAware2,


In [12]:
# What does the output string look like? Ugly!
print(distance['Value'].iloc[0])

Ben made another happy announcement on March 28.:(3%2C 1);I e-mailed your assistant earlier this morning .:(19%2C 1);He had this habit of telling you everything.:(0%2C 5);Dignitaries condemn the rebellious act.:(11%2C 4);I put my angry foot in the cool water.:(18%2C 5);Jane calls the bluff. :(30%2C 1);Dr. Hart killed our pet.:(13%2C 2);She loves playful giraffes.:(30%2C 5)


In [13]:
# Cleaning the output strings like the one shown in the cell above
distance['final_graphs'] = distance['Value'].apply(clean_string)

In [14]:
# And showing you the cleaned data:
for item in distance['final_graphs'].iloc[0]:
    print(item)

('Ben made another happy announcement on March 28.', (3, 1))
('I e-mailed your assistant earlier this morning .', (19, 1))
('He had this habit of telling you everything.', (0, 5))
('Dignitaries condemn the rebellious act.', (11, 4))
('I put my angry foot in the cool water.', (18, 5))
('Jane calls the bluff. ', (30, 1))
('Dr. Hart killed our pet.', (13, 2))
('She loves playful giraffes.', (30, 5))


Now, we'll run `expand_graphs` to prepare our data for all of the pairwise-distance calculations between each object for each trial:

In [15]:
graphs = expand_graphs(distance)

In [16]:
graphs.tail()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,Sentence6,Sentence6Cond,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments,final_graphs,object,location
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,,(They chased the proud rebels out of the capit...,They chased the proud rebels out of the capital.,"(4, 7)"
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,,(The president demoted the expert from his cab...,The president demoted the expert from his cabi...,"(8, 4)"
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,,"(We hauled the horse to Corning., (27, 4))",We hauled the horse to Corning.,"(27, 4)"
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,,"(We took our bold cat Mitten camping ., (21, 4))",We took our bold cat Mitten camping .,"(21, 4)"
1252,e8cbee887ff48a5f732382ffa9fac6e1,9,21,1742307858294,859.0,31798,1742307860,PennController,0,trials,...,We hauled the horse to Corning.,NoModVolition4,We took our bold cat Mitten camping .,ModVolition2,We trust the caring teacher.,ModVolition4,,"(We trust the caring teacher., (26, 6))",We trust the caring teacher.,"(26, 6)"


In [17]:
# Compute pairwise distances:
pairwise_df = compute_pairwise_distances(
    graphs,
    group_cols=['Participant', 'item'],
    location_col='location',
    object_col='object'
)

In [18]:
# Check out the "distance" column all the way on the right!
pairwise_df.head()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,Sentence7,Sentence7Cond,Sentence8,Sentence8Cond,Comments,final_graphs,object,object_2,location_2,distance
0,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,,(Ben made another happy announcement on March ...,Ben made another happy announcement on March 28.,I e-mailed your assistant earlier this morning .,"(19, 1)",16.0
1,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,,(Ben made another happy announcement on March ...,Ben made another happy announcement on March 28.,He had this habit of telling you everything.,"(0, 5)",5.0
2,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,,(Ben made another happy announcement on March ...,Ben made another happy announcement on March 28.,Dignitaries condemn the rebellious act.,"(11, 4)",8.544004
3,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,,(Ben made another happy announcement on March ...,Ben made another happy announcement on March 28.,I put my angry foot in the cool water.,"(18, 5)",15.524175
4,32ab20bbeccaa48346503c065698e438,6,49,1742304542400,3998.0,122914,1742304762,PennController,0,trials,...,Dr. Hart killed our pet.,NoModAware4,She loves playful giraffes.,ModAware2,,(Ben made another happy announcement on March ...,Ben made another happy announcement on March 28.,Jane calls the bluff.,"(30, 1)",27.0


In [19]:
# Saving to CSV
pairwise_df.to_csv('outputs/demo-1-distances.csv', index=False)

Thus concludes the general pipeline for extracting data. Again, we leave it to you to align your stimuli and conditions with the data. 

The remaining two sample pipelines will be less detailed, but comments are included to help you follow what's going on.

## SAMPLE PIPELINE 2 (GRADIENT): 
#### Typicality data provided by John R. Starr (jrs673@cornell.edu)

I ran a GRIS experiment that studies how people process differences in category typicality. The data provided here are from a demonstration of the task, meaning there are very few trials and items. As in the first pipeline, *we do not line up the experimental items (and their conditions) with the results files to avoid issues with generalization.* 

Note that this experiment uses the four-point coordinate system (x_cat, y_cat, x, y). 

In [20]:
# Load in the data
data2 = pd.read_csv('data/demo-2-cleaned.csv')

In [None]:
# Look at the data
data2.head()

Unnamed: 0,Results.reception.time,MD5.hash.of.participant.s.IP.address,Controller.name,Order.number.of.item,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,Value,EventTime,item,Comments
0,1743628553,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,PennController,2,_Trial_,Start,1743628525864,item2,
1,1743628553,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drag,sprite,1743628527893,item2,Dropped on (1%2C 1%2C 19%2C 5)
2,1743628553,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drop,(1%2C 1%2C 19%2C 5),1743628528570,item2,Dopped sprite
3,1743628553,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drag,pepsi,1743628530951,item2,Dropped on (1%2C 1%2C 11%2C 6)
4,1743628553,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drop,(1%2C 1%2C 11%2C 6),1743628531966,item2,Dopped pepsi


In [22]:
# Compute the incremental action times
incremental2 = compute_action_times(data2)

In [None]:
# Check out what `compute_action_times` gets us:
incremental2.head()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,Value,item,Comments
0,8625c054d78abdfb4dce4fa7872cfd0e,1,0,1743628545760,0.0,6897,1743628553,PennController,0,demo,,PennController,1,_Trial_,Start,item1,
1,8625c054d78abdfb4dce4fa7872cfd0e,1,1,1743628547117,1357.0,6897,1743628553,PennController,0,demo,,DragDrop,pt-1,Drag,cat,item1,Dropped on (1%2C 1%2C 18%2C 7)
2,8625c054d78abdfb4dce4fa7872cfd0e,1,2,1743628547404,287.0,6897,1743628553,PennController,0,demo,,DragDrop,pt-1,Drop,(1%2C 1%2C 18%2C 7),item1,Dopped cat
3,8625c054d78abdfb4dce4fa7872cfd0e,1,3,1743628548137,733.0,6897,1743628553,PennController,0,demo,,DragDrop,pt-1,Drag,snake,item1,Dropped on (1%2C 1%2C 11%2C 11)
4,8625c054d78abdfb4dce4fa7872cfd0e,1,4,1743628548440,303.0,6897,1743628553,PennController,0,demo,,DragDrop,pt-1,Drop,(1%2C 1%2C 11%2C 11),item1,Dopped snake


In [24]:
# Saving to CSV
incremental2.to_csv('outputs/demo-2-incremental.csv', index=False)

In [None]:
# Keeping only final positions
distance2 = incremental2[incremental2['Parameter'] == 'Final'].copy()

In [None]:
# Cleaning the output strings 
distance2['final_graphs'] = distance2['Value'].apply(clean_string)

In [None]:
# Ensuring our strings look cleaned in the 'final_graphs' column
distance2

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,Value,item,Comments,final_graphs
11,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,,DragDrop,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"[(cat, (1, 1, 18, 7)), (dog, (1, 1, 19, 13)), ..."
24,8625c054d78abdfb4dce4fa7872cfd0e,2,11,1743628545757,3050.0,19893,1743628553,PennController,0,demo,,DragDrop,pt-1,Final,sprite:(1%2C 1%2C 19%2C 5);pepsi:(1%2C 1%2C 11...,item2,,"[(sprite, (1, 1, 19, 5)), (pepsi, (1, 1, 11, 6..."


In [None]:
# Expanding our dataframes
graphs2 = expand_graphs(distance2)

In [None]:
# Compute pairwise distances
pairwise_df2 = compute_pairwise_distances(
    graphs2,
    group_cols=['Participant', 'item'],
    location_col='location',
    object_col='object'
)

In [30]:
# Check out the "distance" column all the way on the right!
pairwise_df2.head()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,PennElementName,Parameter,Value,item,Comments,final_graphs,object,object_2,location_2,distance
0,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,...,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"(cat, (1, 1, 18, 7))",cat,dog,"(1, 1, 19, 13)",6.082763
1,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,...,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"(cat, (1, 1, 18, 7))",cat,snake,"(1, 1, 11, 11)",8.062258
2,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,...,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"(cat, (1, 1, 18, 7))",cat,hamster,"(1, 1, 18, 10)",3.0
3,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,...,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"(dog, (1, 1, 19, 13))",dog,snake,"(1, 1, 11, 11)",8.246211
4,8625c054d78abdfb4dce4fa7872cfd0e,1,11,1743628552657,851.0,6897,1743628553,PennController,0,demo,...,pt-1,Final,cat:(1%2C 1%2C 18%2C 7);dog:(1%2C 1%2C 19%2C 1...,item1,,"(dog, (1, 1, 19, 13))",dog,hamster,"(1, 1, 18, 10)",3.162278


In [31]:
# Saving to CSV
pairwise_df2.to_csv('outputs/demo-2-distances.csv', index=False)

## SAMPLE PIPELINE 3 (CATEGORICAL): 
#### Acceptability data provided by John R. Starr (jrs673@cornell.edu)

I ran a GRIS experiment that studies how people process differences in word acceptability. The data provided here are from a demonstration of the task, meaning there are very few trials and items. Again, *we do not line up the experimental items (and their conditions) with the results files to avoid issues with generalization.* 

Note that this experiment uses the four-point coordinate system (x_cat, y_cat, x, y). 

In [32]:
# Load in the data
data3 = pd.read_csv('data/demo-3-cleaned.csv')

In [None]:
# Check out the data
data3.head()

Unnamed: 0,Results.reception.time,MD5.hash.of.participant.s.IP.address,Controller.name,Order.number.of.item,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,Value,EventTime,item,Comments
0,1743629442,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,PennController,2,_Trial_,Start,1743629417343,item2,
1,1743629442,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drag,sprite,1743629418701,item2,Dropped on (3%2C 0%2C 0%2C 0)
2,1743629442,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drop,(3%2C 0%2C 0%2C 0),1743629419633,item2,Dopped sprite
3,1743629442,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drag,pepsi,1743629421361,item2,Dropped on (5%2C 0%2C 1%2C 1)
4,1743629442,8625c054d78abdfb4dce4fa7872cfd0e,PennController,2,0,demo,,DragDrop,pt-1,Drop,(5%2C 0%2C 1%2C 1),1743629422244,item2,Dopped pepsi


In [34]:
# Compute the incremental action times
incremental3 = compute_action_times(data3)

In [35]:
# Saving to CSV
incremental3.to_csv('outputs/demo-3-incremental.csv', index=False)

In [None]:
# Keeping the final graph positions
distance3 = incremental3[incremental3['Parameter'] == 'Final'].copy()

In [None]:
# Cleaning the output strings 
distance3['final_graphs'] = distance3['Value'].apply(clean_string)

In [None]:
# Ensuring our graphs are cleaned
distance3

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,Latin.Square.Group,PennElementType,PennElementName,Parameter,Value,item,Comments,final_graphs
10,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,,DragDrop,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"[(cat, (5, 0, 0, 0)), (dog, (2, 0, 1, 1)), (sn..."
24,8625c054d78abdfb4dce4fa7872cfd0e,2,12,1743629433030,1571.0,15688,1743629442,PennController,0,demo,,DragDrop,pt-1,Final,sprite:(3%2C 0%2C 0%2C 0);pepsi:(5%2C 0%2C 1%2...,item2,,"[(sprite, (3, 0, 0, 0)), (pepsi, (5, 0, 1, 1))..."


In [None]:
# Expanding the screens
graphs3 = expand_graphs(distance3)

In [None]:
# Compute pairwise distances
pairwise_df3 = compute_pairwise_distances(
    graphs3,
    group_cols=['Participant', 'item'],
    location_col='location',
    object_col='object',
    categorical=True        # Note: Categorical = True will calculate
                            # differences in the first two coordinates
                            # rather than the last two coordinates.
)

In [41]:
# Check out the "distance" column all the way on the right!
pairwise_df3.head()

Unnamed: 0,Participant,Item,EventIndex,EventTime,TimeSinceLastEvent,TotalItemTime,Results.reception.time,Controller.name,Inner.element.number,Label,...,PennElementName,Parameter,Value,item,Comments,final_graphs,object,object_2,location_2,distance
0,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,...,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"(cat, (5, 0, 0, 0))",cat,dog,"(2, 0, 1, 1)",3.0
1,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,...,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"(cat, (5, 0, 0, 0))",cat,snake,"(3, 0, 0, 1)",2.0
2,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,...,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"(cat, (5, 0, 0, 0))",cat,hamster,"(1, 0, 0, 1)",4.0
3,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,...,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"(dog, (2, 0, 1, 1))",dog,snake,"(3, 0, 0, 1)",1.0
4,8625c054d78abdfb4dce4fa7872cfd0e,1,10,1743629441512,852.0,8479,1743629442,PennController,0,demo,...,pt-1,Final,cat:(5%2C 0%2C 0%2C 0);dog:(2%2C 0%2C 1%2C 1);...,item1,,"(dog, (2, 0, 1, 1))",dog,hamster,"(1, 0, 0, 1)",1.0


In [42]:
# Saving to CSV
pairwise_df3.to_csv('outputs/demo-3-distances.csv', index=False)