# Crossmodality experiment - trial list generator

The code below generates a dataframe that gets saved as a csv file. We generate the trial information for a single participant, based on the condition they are assigned to. The information for each participant notes the target domains and tokens and the distractors for each trial, along with the particular stimuli used and their locations on the experiment page. 

In this way, we have all the experiment information for 1 participant, which is fed into the code that produces the experiment interface.

Go through each function and try to figure out what it does. Each line is commented to show you what is happening - but how they fit together to produce the end result? 

If you can't figure out what a line or section is doing, try adding a ```print``` statement beneath it.

e.g. 

```
selector_code = df.loc[(row, col1)] + str(selector)

```

Try adding a print statement:
```
selector_code = df.loc[(row, col1)] + str(selector)
print selector_code
```

to figure out what it does.

And don't be afraid to ask for help if you need it!

In [1]:
# import necessary modules

# you can import a whole module
# e.g. import itertools

# or you can import specific methods from those modules
# e.g. from copy import deepcopy

# you can also shorten the handle of a module, so you don't have to type out the full name everytime
# e.g. import pandas as pd

# do a google search for each of these modules to find out what they are for

import itertools
import pandas as pd
import numpy as np
from copy import deepcopy
import random

In [2]:
# sets for each domain
# this is a dictionary that takes the form {domain: [list of sets]}

sets = {"Pitch": ["Sine","Sung","Piano","Whistle",],
"Amp": ["Sine","Sung", "Piano", "Whistle"],
"Noise":["Sine","Sung","Piano", "Whistle"],
"Shape":["BK1", "BK2","BK3", "BK4"],
"Size":["Circles", "Squares", "Triangles", "Diamonds"],
"Brightness":["Circles", "Squares", "Triangles", "Diamonds"],
"Color": ["RG", "RY","YB", "RB"],
"Speed":["SP1", "SP2", "SP3", "SP4"],
"Affect":["HS", "SC", "EB", "PD"]}


In [3]:
# cross-modality domains, taken as the keys from the dictionary above
domains = sorted(list(sets.keys()))

# print it and we can see!
print domains

['Affect', 'Amp', 'Brightness', 'Color', 'Noise', 'Pitch', 'Shape', 'Size', 'Speed']


In [4]:
# conditions for the experiment, as groups of focal features
conditions = [["Pitch", "Shape", "Affect"],
["Amp", "Size", "Speed"],
["Noise", "Brightness", "Color"],
["Pitch", "Size", "Color"],
["Brightness", "Amp", "Affect"],
["Noise", "Shape", "Speed"]]

In [5]:
def create_combinations_df(focals, non_focals):
    
    '''
    This function creates possible combinations of focal domains (i.e
    those assigned in a condition) and the other domains.
    
    e.g. pitch with shape, pitch with size, pitch with brightness etc.
    
    We can use this to create lists of inducers (the stimuli participants will
    respond to) and lists of concurrents (the options participants can use to 
    respond)
    
    Takes as input a list of focal domains (a condition) and a list of 
    non-focals (all other domains)
    
    '''
    
    # create list of combinations between focals and non-focals
    # this returns an iterator, converted to a list
    # giving tuples of focals and non-focals 
    # e.g. (Pitch, Shape)
    combination_list = list(itertools.product(non_focals, focals))

    # split tuples into a list of inducers and a list of concurrents
    concurrent_list, inducer_list = list(zip(*combination_list))
    
    # convert those lists to data series, with labels
    # concatenate into a data frame
    combination_df = pd.concat([pd.Series(inducer_list, name = "Inducer"),
                               pd.Series(concurrent_list, name = "Concurrent")],
                              axis = 1)
    
    # use only rows where inducer and concurrent are different
    combination_df = combination_df[combination_df["Inducer"]!=
                                   combination_df["Concurrent"]]

    # return resulting data frame
    return combination_df

Running the function in the example below, we can see how it works. <br>
First, it creates each combination of the focal domains (of the condition) with all other domains. It removes instances where the two are the same, because we don't want to test that.

Then, it creates a dataframe where each dimension (target domain, in **inducer** column) is paired up with each non-focal domain (in **concurrent** column). Note that some of the concurrents only appear twice - this is because they can't be paired with each other (e.g. Affect cannot be paired with Affect, so it only appears twice in the **concurrent** column). 

Below, I add the ```.reset_index(drop=True) ``` command to reorganise the index of the data frame so that it is consistent for later referencing.

In [6]:
# example using above function
# takes as input a condition (focal domains) and then the list of domains (non-focals/all other domains)
df_create_combinations_ex = create_combinations_df(conditions[0], domains).reset_index(drop=True)

# example data frame
df_create_combinations_ex

Unnamed: 0,Inducer,Concurrent
0,Pitch,Affect
1,Shape,Affect
2,Pitch,Amp
3,Shape,Amp
4,Affect,Amp
5,Pitch,Brightness
6,Shape,Brightness
7,Affect,Brightness
8,Pitch,Color
9,Shape,Color


In [8]:
def add_token_selection(df, col1, col2, selector):

    '''
    This function adds a number to inducers and concurrents for later choosing tokens fit to each domain.
    These numbers are used later in create_stim_list to evenly match focal tokens to focal domains
    Takes as input a dataframe, 2 column names (inducer/concurrent) and a selector tag (usu. 1 or 2).
    
    Can do this for a dataframe of inducers or a dataframe of concurrents
    '''

    # col1 is column of interest - inducer, for inducer df
    df[col1 + "Token"] = pd.Series(np.random.randn(len(df)), index = df.index)

    # col2 is other column, e.g. concurrent
    df[col2 + "Token"] = pd.Series(np.random.randn(len(df)), index = df.index)

    #for each row, token that goes into the token column is col1 + selector tag, or token is col2 value
    # we are using the selector tags 1 and 2
    for row in range(len(df)):
        
        # create string for token
        # code is col1 value + selector tag
        selector_code = df.loc[(row, col1)] + str(selector)
        
        # add token to each column
        df.loc[(row, col1 + "Token")] = selector_code
        # code for col2Token is col2 value
        df.loc[(row, col2 + "Token")] = df.loc[(row, col2)]

    # return resulting data frame
    return df

In the example below, we can see how this function works. It simply copies the Inducer name in this case and adds the selector token (1 here). This may seem very simple, but in other functions we used this on a loop to attach different selector tokens (1 and 2), and to do this for both Inducer tokens and Concurrent tokens.

In [9]:
# example run of above function
df_add_token_selection_ex = add_token_selection(df_create_combinations_ex, "Inducer", "Concurrent", 1)

# example df
df_add_token_selection_ex

Unnamed: 0,Inducer,Concurrent,InducerToken,ConcurrentToken
0,Pitch,Affect,Pitch1,Affect
1,Shape,Affect,Shape1,Affect
2,Pitch,Amp,Pitch1,Amp
3,Shape,Amp,Shape1,Amp
4,Affect,Amp,Affect1,Amp
5,Pitch,Brightness,Pitch1,Brightness
6,Shape,Brightness,Shape1,Brightness
7,Affect,Brightness,Affect1,Brightness
8,Pitch,Color,Pitch1,Color
9,Shape,Color,Shape1,Color


In [10]:
# takes as input: a dataframe, a column name, and selector codes

def format_df_tokens_and_locations(df, tag, selectors):
    
    '''
    Adds domain tags (for use later) and location tags.
    Takes a single data frame, a tag for the focal columns, and a list of selector tags (usu. [1,2]).
    
    '''
    # duplicate dataframe
    # use deepcopy, otherwise changes will occur to both dataframes
    df1 = df
    df2 = deepcopy(df1)
    
    # put dataframes in list, so we can loop over the two dfs
    dframes = [df1, df2]
    
    # choose order of tags
    # it input tag is INDUCER, the other tag is CONCURRENT, and vice versa
    if tag == "Inducer":
        tag2 = "Concurrent"
    else:
        tag2 = "Inducer"
        
    # create location lists that will match high low values for each stimulus item
    left_right = ["L", "H"]
    # multiply list above by 12 to give a list of 24 L/H values
    left_right2 = left_right * 12
    
    # create list to hold selected tokens
    df_tokens = [] 
    
    # for each data frame:
    for frame in range(len(dframes)):
        
        # create token columns using function above
        df_token = add_token_selection(dframes[frame], tag, tag2, selectors[frame])

        # sort alphabetically by inducer/concurrent - this is just for readability
        df_token.sort_values(by = tag, inplace = True)

        # shuffle the lists of L/H values so that they are in random order
        random.shuffle(left_right)
        random.shuffle(left_right2)
        
        # add location columns with location tags
        df_token[tag + "Loc"] = pd.Series(left_right*12, index = df_token.index)
        df_token[tag2 + "Loc"] = pd.Series(left_right2, index = df_token.index)
        
        # add data frame to a list, so the function can return more than 1 dataframe
        df_tokens.append(df_token)
    
    # return resulting data frames
    return df_tokens[0], df_tokens[1]

In the example, below, we can see how this function works. In this example, it produces 2 inducer data frames, running the *add selection token* function in this function. It also adds markers to signal the value of each stimulus item for the inducer and concurrent tokens.

Why do you think this function produces 2 dataframes?

In [11]:
# run example of above function
df_format_ex = format_df_tokens_and_locations(df_add_token_selection_ex, "Inducer", [1,2])


In [12]:
# example df 1
df_format_ex[0]

Unnamed: 0,Inducer,Concurrent,InducerToken,ConcurrentToken,InducerLoc,ConcurrentLoc
23,Affect,Speed,Affect1,Speed,H,H
20,Affect,Size,Affect1,Size,L,H
4,Affect,Amp,Affect1,Amp,H,L
17,Affect,Shape,Affect1,Shape,L,H
7,Affect,Brightness,Affect1,Brightness,H,H
15,Affect,Pitch,Affect1,Pitch,L,L
13,Affect,Noise,Affect1,Noise,H,L
10,Affect,Color,Affect1,Color,L,L
21,Pitch,Speed,Pitch1,Speed,H,H
18,Pitch,Size,Pitch1,Size,L,L


In [13]:
# example df 2
df_format_ex[1]

Unnamed: 0,Inducer,Concurrent,InducerToken,ConcurrentToken,InducerLoc,ConcurrentLoc
23,Affect,Speed,Affect2,Speed,L,L
20,Affect,Size,Affect2,Size,H,H
4,Affect,Amp,Affect2,Amp,L,H
17,Affect,Shape,Affect2,Shape,H,H
7,Affect,Brightness,Affect2,Brightness,L,L
15,Affect,Pitch,Affect2,Pitch,H,L
13,Affect,Noise,Affect2,Noise,L,L
10,Affect,Color,Affect2,Color,H,H
21,Pitch,Speed,Pitch2,Speed,L,L
18,Pitch,Size,Pitch2,Size,H,L


In [33]:
def create_stim_list(dimensions):
    
    '''
    Builds the stimulus list, assigning tokens and locations.
    Takes as input only the condition as a list of dimensions.
    
    '''
    
    # include domains as global variable, so we can use it within the function
    global domains
    
    # create data frames giving combinations of inducers/concurrents
    inducer_df = create_combinations_df(dimensions, domains).reset_index(drop=True)
    concurrent_df = create_combinations_df(domains,dimensions).reset_index(drop=True)
    
    # create 2 full inducer frames, with locations and token markers
    inducers1, inducers2 = format_df_tokens_and_locations(inducer_df, "Inducer", [1,2])
    
    # create 2 full concurrent frames, with locations and token markers
    concurrents1, concurrents2 = format_df_tokens_and_locations(concurrent_df, "Concurrent",[3,4])
    
    # concatenate all data frames into one big df
    all_trials = pd.concat([inducers1, inducers2, concurrents1, concurrents2], ignore_index = True)
    
    # create list of focal tokens from focal dimensions
    # so all tokens from focal dimensions are picked out once
    # but order is shuffled
    FocalTokens = []
    # loop through all focal domains
    # dimensions = focal domains for that condition
    # I sort the dimension list so we know what order it is in
    for f in sorted(dimensions):
        # randomly shuffles the tokens for each domain
        # does this by creating a random sample from the tokens that is the size of the token list
        # does not allow replacement, because we don't want any token to be selected twice
        f_tokens = np.random.choice(sets[f], size = len(sets[f]), replace = False)
        # add to focal token list
        FocalTokens += list(f_tokens)
    
    # for non-focal tokens:
    # for each domain, randomly select one (and ONLY one) non-focal token that goes with that domain
    # e.g. {Pitch: ["Sine","Sung","Piano","Whistle"]} - "Sung" may be randomly selected
    NonFocalTokens = [np.random.choice(sets[d]) for d in domains]
    
    # create a list of focal tags that map onto the tokens we added earlier
    # e.g. Affect1, Pitch1
    FocalColumns = []
    # for each focal domain (i.e. in condition)
    for d in sorted(dimensions):
        # for number in 1,2,3,4
        for i in range(1,5):
            # add to the empty list FocalColumns the domain + each number
            # then we have a list of e.g. Pitch1, Pitch2, Pitch3, Pitch4
            FocalColumns.append(d + str(i))
            
    # concat columns and tokens, so we have a list of columns
    # list of columns is all focal columns + all non-focal domains
    # list of tokens is focal token + non-focal tokens
    
    # allcolumns = focal domains * 4 and the rest of the other domains
    AllColumns = FocalColumns + domains
    # all tokens = the focal tokens we selected + the randomly selected non-focal tokens
    AllTokens = FocalTokens + NonFocalTokens
    
    
    # THIS IS A BIT COMPLICATED! AARGH!
    
    # we are going to create a dictionary that will map  AllColumns to AllTokens
    # zip(listA, listB) creates pairs of the 1st elements of listA with the 1st element of listB...
    # ...the 2nd element of listA with the 2nd element of listB, etc
    # so zipping the focal columns with focal tokens matches columns to tokens
    zipped_cols_and_tokens = zip(AllColumns, AllTokens)
    
    # and then it takes those pairs (listA[0], listB[0]), and puts them into dictionary format; e.g. {listA[0]: listB[0]}
    # map focal tokens onto focal columns, so that one token gets mapped to Affect1, the next to Affect2, etc.
    # for non-focal tokens, one token maps to the domain
    mapping_dict = {k:v for k,v in zipped_cols_and_tokens}
    
    # then we have a dictionary of domain names and token names
    
    # replace col names with tokens
    # so it replaces the InducerToken column values with the values (tokens) from the mapping dictionary above
    # for each column (inducers and concurrents)
    # this is where those numbers (1,2) come in. 
    # now we can match the domains in the token column e.g. Pitch1 with it's chosen token from the dictionary
    for tag in ["Inducer", "Concurrent"]:
        # column of interest
        t_col = tag + "Token"
        
        # all_trials is our full dataframe
        # we are going to replace the values from the column of interest
        # with the values from the dictionary we created above
        all_trials = all_trials.replace({t_col:mapping_dict})
        
        # now we are going to create a new column that shows the left-hand stimulus
        # which is a copy of the loc column
        all_trials[tag + "Left"] = all_trials[tag + "Loc"]
        
        # Now we are going to create the column that shows the right-hand stimulus
        # right is just the opposite of left column
        # create right column as copy of left column
        # whatever the value in the left column, put opposite value in right column
        # e.g. if left == L, right = H
        all_trials[tag + "Right"] = all_trials[tag + "Left"]
        all_trials = all_trials.replace({tag + "Right":{"L":0, "H":1}})
        all_trials = all_trials.replace({tag + "Right":{0:"H", 1:"L"}})
        
        # string together stim info for left/right hand stims
        # so that the stim name and location is given as a single string
        for pos in ["Left", "Right"]:
            # we use pos[0] in the column name so that the name is abbreviated to L/R
            # the line below subsets the values for the Inducer domain, the Inducer token and the L/H value of the stimulus
            # and then it joins them together as a single string, which is the name of the specific stimulus item
            # e.g. it takes pitch + sine + L and writes it as pitch-sine-L in the InducerL column
            # in order to designate that stimulus item to the left hand side of the screen
            all_trials[tag + pos[0]] = all_trials[[tag, tag + "Token", tag + pos]].apply(lambda x: '-'.join(x), axis = 1)
            
    # shuffle rows of df, so order of trials is randomised
    all_trials = all_trials.sample(frac = 1)
    
    # add trial number column
    # we create a dataframe column that writes a number from 1 to n in each row,
    # where n is the length of the data frame
    # i.e. it writes 1 to 96 for 96 trials
    all_trials["TrialNum"] = pd.Series(range(1, len(all_trials)+1), index = all_trials.index)
    
    # add focal dimensions for the condition as individual columns, for reference
    for focal in range(1,4):
        all_trials["Focal" + str(focal)] = dimensions[focal - 1]
    
    # subset only necessary columns
    # for example, we no longer need our 'loc' columns, or our 'token' columns
    all_trials = all_trials[["TrialNum", "Focal1", "Focal2","Focal3", "Inducer", "Concurrent", "InducerL", "InducerR", 	"ConcurrentL", "ConcurrentR"]]
    
    # we have shuffled and sliced this a lot, so reset the data frame's index
    all_trials.reset_index(drop=True, inplace=True)
    
    # rename concurrent columns so they reflect actual position
    all_trials = all_trials.rename(columns={'ConcurrentL': 'ConcurrentTop',
                                           'ConcurrentR': 'ConcurrentBottom'})
    
    # return full stim list for a single participants
    return all_trials

We can generate a full trial list for a single participant using this function. The other functions from above are embedded within this function to create the full trial structure.

So first, we create the combinations of each focal domain with all other domains.
Then we add some tokens tags to these domains, so that we can match them up with a token, and assign each stimulus item a location.

Then we randomly match tokens to domains (with some constraints in place).

Then we format our dataframe, so that we have a location for each stimulus item, and all the necessary information (domain, token (category), value (L/H)) for each stimulus item. We add some other information, like condition and trial number. And we're done!

In the example below we create a trial list for a participant in the first condition in the list at the top - [Pitch, Shape, Affect]. 

In [31]:
stim_list_ex = create_stim_list(conditions[0])

stim_list_ex

Unnamed: 0,TrialNum,Focal1,Focal2,Focal3,Inducer,Concurrent,InducerL,InducerR,ConcurrentL,ConcurrentR
0,1,Pitch,Shape,Affect,Noise,Shape,Noise-Sung-L,Noise-Sung-H,Shape-BK1-H,Shape-BK1-L
1,2,Pitch,Shape,Affect,Noise,Pitch,Noise-Sung-H,Noise-Sung-L,Pitch-Sung-H,Pitch-Sung-L
2,3,Pitch,Shape,Affect,Shape,Pitch,Shape-BK1-L,Shape-BK1-H,Pitch-Sung-L,Pitch-Sung-H
3,4,Pitch,Shape,Affect,Pitch,Color,Pitch-Whistle-H,Pitch-Whistle-L,Color-RY-H,Color-RY-L
4,5,Pitch,Shape,Affect,Affect,Color,Affect-PD-L,Affect-PD-H,Color-RY-H,Color-RY-L
5,6,Pitch,Shape,Affect,Brightness,Affect,Brightness-Diamonds-L,Brightness-Diamonds-H,Affect-EB-L,Affect-EB-H
6,7,Pitch,Shape,Affect,Brightness,Shape,Brightness-Diamonds-L,Brightness-Diamonds-H,Shape-BK4-L,Shape-BK4-H
7,8,Pitch,Shape,Affect,Speed,Shape,Speed-SP4-L,Speed-SP4-H,Shape-BK1-L,Shape-BK1-H
8,9,Pitch,Shape,Affect,Pitch,Affect,Pitch-Sine-L,Pitch-Sine-H,Affect-EB-H,Affect-EB-L
9,10,Pitch,Shape,Affect,Affect,Speed,Affect-HS-L,Affect-HS-H,Speed-SP4-L,Speed-SP4-H


In [34]:
def write_to_csv(write_folder, dimensions, n):
    
    ''' 
    Creates a stimulus list for a set number of participants and writes it to a csv file.
    Takes as input the condition (dimensions) and the number of participants, n. 
    '''
    
    # add the condition name
    # joins 3 focal domains together in one string
    # e.g. A-B-C
    con_name = ('-').join(dimensions)
    
    # for the number of participants, starting from 1
    for i in range(1, n + 1):
        #create df of stimuli
        stim_list = create_stim_list(dimensions)
        # add participant id made up of con name and number
        # adds participant id as first column in data frame
        p_id = [con_name + '-' + str(i)] * len(stim_list)
        stim_list.insert(0, "Id", p_id)
        
        # write to csv
        stim_list.to_csv(write_folder + "Stimuli-" + con_name + "-" + str(i) + ".csv", index = False)

In [35]:
# create 1 example file

# NOTE this will write a file to your computer
# this creates a trial list for 1 participant in the first condition

write_to_csv('', conditions[0], 1)

The function above writes trial lists for a single condition, for *n* number of participants, depending on how many trial lists you want to pre-generate. In the cell below, I run this code to create trial lists for each condition, for 10 participants per condition.

NOTE: If you run the block below it will generate text files on your machine. The **folder** variable sets the path for the text files to write to. Currently it is left blank, so it will write to the current working directory of your computer.

In [21]:
### operational code
## i.e. code that runs the above functions and generates stim lists

#folder name
# either put in the path to the folder you want the csv files to write to
# or nagivate python to that path and leave the folder field blank
folder = ""

# for each condition, produce stimulus lists for 10 participants
for condition in conditions:
    write_to_csv(folder, condition, 10)