# Perception experiment data preparation

* The data collected in psychological experiments using experimental software are complex
* The data are collected in OpenSesame software that is based on Python, but it is implemented in javaSrcipt so it could be runned online
* Collected data consisted of 4 files (for 4 different versions/groups of experiment)
* Each output file consisted of **210 different variables** (the most of them are automatically generated in the program) and **1230 observations each**. 
* A few variables of interest are: keyboard responses and reaction time (as well time passed since the beginning of the experiment) on 4 different scales (these will be described later in the file)

* design of the experiment:
    * there are 4 different groups in experiment, each seeing different experimental file 
    * each file consisted of 4 experimental blocks
    * since race and sex are experimental factors, 4 blocks are formed as combination of those factors, with 20 different pictures presented in each block (80 in sum):
        * 10 male and 10 female pictures of white faces
        * 10 male and 10 female pictures of black faces 
        * 5 male and 5 female pictures of (1) white and (2) black faces
        * 5 male and 5 female pictures of (1) black and (2) white faces
    * the initial goal of performed experiment was to examine if experimental design could influence aesthetic estimation of the of the face, or if framing effect can occur in experimental designs and influence the observed results. 
+ Four estimation scales were used
    - (1) beauty-lepo, 
    - (2) pleasentness-prijatno, 
    - (3) attractiveness-privlacno, 
    - (4) harmonious-skladno 

* the goal in this analysis: to detect is there is framing effect in the function of time during experiment
* there was 88 observations for each subject performing experiment, 80 for experiment, and 8 for practice trials

**DEPENDENT VARIABLES OF INTEREST in the dataset:** 

* **aesthetics estimation** of four aesthetic scales, scoring 1-7 (beauty, pleasant, attractive and harmonious):
    * “response_lepo_odg”, “response_prijatno_odg”, “response_privlacno_odg” , “response_skladno_odg”
    
* **response time - rt (time until response):** 
    * “response_time_lepo_odg”, “response_time_prijatno_odg”, “response_time_privlacno_odg”, “response_time_skladno_odg”
            * in psychological experiments, reaction time could be used of indicator of latent processes that are not accessed in simple estimation on the scale that is made by participants


**TIME VARIABLES:**
* **presentation order:** value - 3 (it is starting from zero, we are deleting 4 practice trials)
    * variables: “count_lepo_odg”, “count_prijatno_odg”,  “count_privlacno_odg”, “count_skladno_odg” (it is enough to use one of them?)
* **time since the beginning of the experiment for different scales:** “time_lepo_odg”, “time_prijatno_odg”, “time_privlacno_odg”, “time_skladno_odg”, 
* **time of the beginning of the experimental block:** “time_blok1”
* **time of the beginning of the experiment:** the first recorded value of the “time_fiksacionatacka” (without exercise trials)


In [11]:
import pandas as pd
import numpy as np

In [27]:
data = pd.read_excel('v1sve.xlsx')
data.head()

Unnamed: 0,acc,accuracy,average_response_time,avg_rt,background,bidi,blok,canvas_backend,clock_backend,color_backend,...,time_sledieksperiment_4,time_slediproba,time_uputstvo,time_vezba1,title,total_correct,total_response_time,total_responses,uniform_coordinates,width
0,0,0,2337,2337,white,no,,psycho,psycho,psycho,...,,106731,59388,138,New experiment,0,100510,43,yes,1024
1,0,0,2326,2326,white,no,,psycho,psycho,psycho,...,,106731,59388,138,New experiment,0,109303,47,yes,1024
2,0,0,2295,2295,white,no,,psycho,psycho,psycho,...,,106731,59388,138,New experiment,0,117055,51,yes,1024
3,0,0,2253,2253,white,no,,psycho,psycho,psycho,...,,106731,59388,138,New experiment,0,123918,55,yes,1024
4,0,0,2230,2230,white,no,blok2,psycho,psycho,psycho,...,155854.0,106731,59388,138,New experiment,0,131563,59,yes,1024


In [28]:
### we should select variables of interest ###

# information about participants
var_subject = ['subject_nr']
# information about stimuli (gender and race of the face, photo nr., specific stimuli block mark)
var_stimulus = ['pol', 'rasa', 'fotografija', 'miks']
# information of the experimental block (practice trials [v] or 1-4)
var_expblock = ['blok']

# DEPENDENT VARIABLES 
# dependent variables - aesthetic estimation on 4 different scales (beauty, pleasant, attractive and harmonious)
var_estimation = ['response_lepo_odg', 'response_prijatno_odg', 'response_privlacno_odg' , 'response_skladno_odg']
# dependent variables - reaction time on 4 different scales (beauty, pleasant, attractive and harmonious)
var_rt = ['response_time_lepo_odg', 'response_time_prijatno_odg', 'response_time_privlacno_odg', 'response_time_skladno_odg']

# TIME VARIABLES
# order of apperance (counting of responses, order of exposure)
var_order = ['count_lepo_odg', 'count_prijatno_odg', 'count_privlacno_odg', 'count_skladno_odg']
# exact time of the given response for different scales (it could be used as time variable)
var_from_beggining = ['time_lepo_odg', 'time_prijatno_odg', 'time_privlacno_odg', 'time_skladno_odg']
# # time since the beggining of experimental block (if we want to calculate the exact time of response)
# var_begg_expblock = ['time_blok1']
# # time since the beggining of the whole experiment (if we want to calculate the exact time of response in whole exp)
# var_begg_exp = ['time_fiksacionatacka']

In [29]:
all_vars = var_subject + var_stimulus + var_expblock + var_estimation + \
            var_rt + var_order + var_from_beggining #+ #var_begg_expblock + var_begg_exp#
print (all_vars)

['subject_nr', 'pol', 'rasa', 'fotografija', 'miks', 'blok', 'response_lepo_odg', 'response_prijatno_odg', 'response_privlacno_odg', 'response_skladno_odg', 'response_time_lepo_odg', 'response_time_prijatno_odg', 'response_time_privlacno_odg', 'response_time_skladno_odg', 'count_lepo_odg', 'count_prijatno_odg', 'count_privlacno_odg', 'count_skladno_odg', 'time_lepo_odg', 'time_prijatno_odg', 'time_privlacno_odg', 'time_skladno_odg']


In [30]:
data_used = data[all_vars]

In [31]:
data_used

Unnamed: 0,subject_nr,pol,rasa,fotografija,miks,blok,response_lepo_odg,response_prijatno_odg,response_privlacno_odg,response_skladno_odg,...,response_time_privlacno_odg,response_time_skladno_odg,count_lepo_odg,count_prijatno_odg,count_privlacno_odg,count_skladno_odg,time_lepo_odg,time_prijatno_odg,time_privlacno_odg,time_skladno_odg
0,17,v,v,vBF1.jpg,v,,5,5,5,5,...,7205,6053,0,0,0,0,127347,124236,110957,118179
1,17,v,v,VWM3.jpg,v,,3,3,4,3,...,3548,2243,1,1,1,1,138446,137098,131291,134852
2,17,v,v,VBM2.jpg,v,,2,2,3,2,...,3432,1936,2,2,2,2,147132,146017,140626,144071
3,17,v,v,vWF5.jpg,v,,4,4,4,4,...,1983,1038,3,3,3,3,153139,151991,148946,150943
4,17,f,b,BF08.jpg,1b.56,blok2,3,3,4,3,...,3119,905,4,4,4,4,166501,165502,161459,164587
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1233,181,f,b,BF05.jpg,1b.56,blok2,4,4,4,4,...,778,270,80,80,80,80,230910,230676,229611,230395
1234,181,m,b,BM01.jpg,1b.56,blok2,4,4,4,4,...,673,269,81,81,81,81,232892,232692,231725,232408
1235,181,f,b,BF08.jpg,1b.56,blok2,4,4,4,4,...,3657,282,82,82,82,82,238040,237790,233828,237497
1236,181,f,b,BF10.jpg,1b.56,blok2,4,4,4,4,...,836,234,83,83,83,83,240156,239889,238790,239640


In [32]:
# lets review our data 

# missing values 
data_used.isna().sum()

subject_nr                      0
pol                             0
rasa                            0
fotografija                     0
miks                            0
blok                           60
response_lepo_odg               0
response_prijatno_odg           0
response_privlacno_odg          0
response_skladno_odg            0
response_time_lepo_odg          0
response_time_prijatno_odg      0
response_time_privlacno_odg     0
response_time_skladno_odg       0
count_lepo_odg                  0
count_prijatno_odg              0
count_privlacno_odg             0
count_skladno_odg               0
time_lepo_odg                   0
time_prijatno_odg               0
time_privlacno_odg              0
time_skladno_odg                0
dtype: int64

In [46]:
# number of subject, pictures, exp blocks...
data_used.groupby('subject_nr').count()['response_lepo_odg']

subject_nr
0       6
3      88
8      88
9      88
10     88
11     88
14     88
15     88
16     88
17     88
18     88
81     88
161    88
171    88
181    88
Name: response_lepo_odg, dtype: int64

In [47]:
# drop all practice trials (notes as 'v' in some variables, for example in 'pol'; or as NaN in 'blok')
df = data_used.dropna()
# drop unfinished case (subject nr. 0)
df = df[df['subject_nr']!= 0]
# review number of measures for each participant
df.groupby('subject_nr').count()['response_lepo_odg']

subject_nr
3      84
8      84
9      84
10     84
11     84
14     84
15     84
16     84
17     84
18     84
81     84
161    84
171    84
181    84
Name: response_lepo_odg, dtype: int64

In [43]:
# review number of measures for each block
df.groupby('blok').count()['response_lepo_odg']

blok
blok1    294
blok2    294
blok3    294
blok4    294
Name: response_lepo_odg, dtype: int64

Data table seems clear now. We should split it into diferent time series files, separate for each of the participants and each of the scales.

In [58]:
# choosing relevant variables 
df_chosen = df[['subject_nr', 
                'response_time_lepo_odg', 'response_time_prijatno_odg', 'response_time_privlacno_odg', 'response_time_skladno_odg', 
                'time_lepo_odg', 'time_prijatno_odg', 'time_privlacno_odg', 'time_skladno_odg']]
# df_chosen

In [66]:
df_chosen[df['subject_nr']==17]

Unnamed: 0,subject_nr,response_time_lepo_odg,response_time_prijatno_odg,response_time_privlacno_odg,response_time_skladno_odg,time_lepo_odg,time_prijatno_odg,time_privlacno_odg,time_skladno_odg
4,17,2629,992,3119,905,166501,165502,161459,164587
5,17,275,183,2648,843,173357,173173,169662,172325
6,17,285,587,1545,644,176967,176370,174172,175719
7,17,290,353,2639,427,181228,180861,177783,180428
8,17,367,195,1173,207,183673,183473,182076,183257
...,...,...,...,...,...,...,...,...,...
83,17,207,228,623,482,362219,361986,360854,361487
84,17,211,219,1109,194,364515,364282,362968,364082
85,17,230,204,1275,217,366978,366761,365247,366529
86,17,206,202,853,253,369074,368858,367727,368592


In [112]:
# list of names of participants
participants = list(df_chosen['subject_nr'].unique())
print(participants)

# ### writing loop for saving .csv data for each participant for each scale ###
# error column (0.05) is assigned for each file 
# variable names are the same: time, rt, rt_error
for participant in participants:
    lepo = df_chosen[['time_lepo_odg','response_time_lepo_odg']][df_chosen['subject_nr']==participant].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    lepo.to_csv(f'timeseries/lepo{participant}.csv', index=False)
    prijatno = df_chosen[['time_prijatno_odg','response_time_prijatno_odg']][df_chosen['subject_nr']==participant].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    prijatno.to_csv(f'timeseries/prijatno{participant}.csv', index=False)
    privlacno = df_chosen[['time_privlacno_odg','response_time_privlacno_odg']][df_chosen['subject_nr']==participant].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    privlacno.to_csv(f'timeseries/privlacno{participant}.csv', index=False)
    skladno = df_chosen[['time_skladno_odg','response_time_skladno_odg']][df_chosen['subject_nr']==participant].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    skladno.to_csv(f'timeseries/skladno{participant}.csv', index=False)

[17, 8, 10, 15, 14, 9, 11, 81, 16, 18, 161, 3, 171, 181]


In [118]:
# # ploting data for one participant
# par_no=161
# example_lepo = pd.read_csv(f'timeseries/lepo{par_no}.csv')
# example_prijatno = pd.read_csv(f'timeseries/prijatno{par_no}.csv')
# example_privlacno = pd.read_csv(f'timeseries/privlacno{par_no}.csv')
# example_skladno = pd.read_csv(f'timeseries/skladno{par_no}.csv')
# example_lepo.plot.line('mjd', 'mag')
# example_prijatno.plot.line('mjd', 'mag')
# example_privlacno.plot.line('mjd', 'mag')
# example_skladno.plot.line('mjd', 'mag')



# preparing data for the analysis of the stimuli

It is interesting to see what we could get in timeseries if we analyze stimulus order presentation instead of participant

In [136]:
# choosing relevant variables 
df_chosen_stim = df[['fotografija', 'pol', 'rasa', 
                'response_time_lepo_odg', 'response_time_prijatno_odg', 'response_time_privlacno_odg', 'response_time_skladno_odg', 
                'count_lepo_odg', 'count_prijatno_odg', 'count_privlacno_odg', 'count_skladno_odg']]
df_chosen_stim['tip']= df_chosen_stim['pol'] + df_chosen_stim['rasa']
df_chosen_stim

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_chosen_stim['tip']= df_chosen_stim['pol'] + df_chosen_stim['rasa']


Unnamed: 0,fotografija,pol,rasa,response_time_lepo_odg,response_time_prijatno_odg,response_time_privlacno_odg,response_time_skladno_odg,count_lepo_odg,count_prijatno_odg,count_privlacno_odg,count_skladno_odg,tip
4,BF08.jpg,f,b,2629,992,3119,905,4,4,4,4,fb
5,BF10.jpg,f,b,275,183,2648,843,5,5,5,5,fb
6,BM07.jpg,m,b,285,587,1545,644,6,6,6,6,mb
7,BM02.jpg,m,b,290,353,2639,427,7,7,7,7,mb
8,BM05.jpg,m,b,367,195,1173,207,8,8,8,8,mb
...,...,...,...,...,...,...,...,...,...,...,...,...
1233,BF05.jpg,f,b,241,230,778,270,80,80,80,80,fb
1234,BM01.jpg,m,b,264,196,673,269,81,81,81,81,mb
1235,BF08.jpg,f,b,213,239,3657,282,82,82,82,82,fb
1236,BF10.jpg,f,b,201,255,836,234,83,83,83,83,fb


In [137]:
## checking
# df_chosen_stim[df['fotografija']=='BF09.jpg']

In [130]:
# list of names of stimuli
stimuli = list(df_chosen_stim['fotografija'].unique())
print(stimuli)

# ### writing loop for saving .csv data for each stimulus for each scale ###
# error column (0.05) is assigned for each file 
# variable names are the same: time, rt, rt_error
for stimulus in stimuli:
    lepo = df_chosen_stim[['count_lepo_odg','response_time_lepo_odg']][df_chosen_stim['fotografija']==stimulus].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    lepo.to_csv(f'timeseries2/lepo{stimulus}.csv', index=False)
    prijatno = df_chosen_stim[['count_prijatno_odg','response_time_prijatno_odg']][df_chosen_stim['fotografija']==stimulus].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    prijatno.to_csv(f'timeseries2/prijatno{stimulus}.csv', index=False)
    privlacno = df_chosen_stim[['count_privlacno_odg','response_time_privlacno_odg']][df_chosen_stim['fotografija']==stimulus].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    privlacno.to_csv(f'timeseries2/privlacno{stimulus}.csv', index=False)
    skladno = df_chosen_stim[['count_skladno_odg','response_time_skladno_odg']][df_chosen_stim['fotografija']==stimulus].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    skladno.to_csv(f'timeseries2/skladno{stimulus}.csv', index=False)

['BF08.jpg', 'BF10.jpg', 'BM07.jpg', 'BM02.jpg', 'BM05.jpg', 'BF02.jpg', 'BM01.jpg', 'BF07.jpg', 'BM06.jpg', 'BF05.jpg', 'BF03.jpg', 'BF04.jpg', 'BF01.jpg', 'BM04.jpg', 'BF06.jpg', 'BM10.jpg', 'BF09.jpg', 'BM08.jpg', 'BM09.jpg', 'BM03.jpg', 'WM04.jpg', 'WM08.jpg', 'WM01.jpg', 'WF03.jpg', 'WM09.jpg', 'WM02.jpg', 'WM05.jpg', 'WF05.jpg', 'WM03.jpg', 'WF02.jpg', 'WF04.jpg', 'WF06.jpg', 'WF08.jpg', 'WF09.jpg', 'WF07.jpg', 'WM06.jpg', 'WF01.jpg', 'WF10.jpg', 'WM07.jpg', 'WM10.jpg', 'WM17.jpg', 'BM14.jpg', 'BF13.jpg', 'BF12.jpg', 'WF19.jpg', 'BF14.jpg', 'BM15.jpg', 'BF15.jpg', 'WF16.jpg', 'WM18.jpg', 'WF17.jpg', 'BM12.jpg', 'WM20.jpg', 'WF18.jpg', 'BM13.jpg', 'BM11.jpg', 'WM19.jpg', 'WM16.jpg', 'BF11.jpg', 'WF20.jpg', 'BM20.jpg', 'BF19.jpg', 'WM15.jpg', 'BM17.jpg', 'WF13.jpg', 'WM12.jpg', 'BF20.jpg', 'BM19.jpg', 'BF17.jpg', 'BF18.jpg', 'WF14.jpg', 'WM11.jpg', 'WM13.jpg', 'WF11.jpg', 'BM18.jpg', 'WM14.jpg', 'BM16.jpg', 'WF12.jpg', 'WF15.jpg', 'BF16.jpg']


It seems that there are not enough measures for each stimulus (14-16) so we decided to group them based on the gender and race factor and analyze

# Organizing stimuli by grouping

In [139]:
# LETS TRY TO GROUP STIMULI

# list of names of stimuli tip
stimuli_tip = list(df_chosen_stim['tip'].unique())
print(stimuli_tip)

# ### writing loop for saving .csv data for each stimulus for each scale ###
# error column (0.05) is assigned for each file 
# variable names are the same: time, rt, rt_error
for stimulus_tip in stimuli_tip:
    lepo = df_chosen_stim[['count_lepo_odg','response_time_lepo_odg']][df_chosen_stim['tip']==stimulus_tip].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    lepo.to_csv(f'timeseries3/lepo_{stimulus_tip}.csv', index=False)
    prijatno = df_chosen_stim[['count_prijatno_odg','response_time_prijatno_odg']][df_chosen_stim['tip']==stimulus_tip].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    prijatno.to_csv(f'timeseries3/prijatno_{stimulus_tip}.csv', index=False)
    privlacno = df_chosen_stim[['count_privlacno_odg','response_time_privlacno_odg']][df_chosen_stim['tip']==stimulus_tip].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    privlacno.to_csv(f'timeseries3/privlacno_{stimulus_tip}.csv', index=False)
    skladno = df_chosen_stim[['count_skladno_odg','response_time_skladno_odg']][df_chosen_stim['tip']==stimulus_tip].assign(err=0.05).set_axis(['mjd', 'mag', 'magerr'], axis=1)
    skladno.to_csv(f'timeseries3/skladno_{stimulus_tip}.csv', index=False)

['fb', 'mb', 'mw', 'fw']
