=============================================================================================================

This script is designed to batch-generate RSA hypothesis matrices for the quickread experiment. The code in the script only serves to loop through subjects and conditions. Central to the process is a custom module (make_rsa_model_functions.py) containing a set of pre-defined functions that actually do the work of calculating dissimilarity values and arranging them into matrices. 

Measures:
- articulatory (feature-weighted phonological edit distance)
- phonological (euclidean distance of G2P consistency vectors)
- orthographic (correlation distance of open bigram vectors)
- semantic (cosine distance of word2vec vectors)
- visual (correlation distance of silhouette vectors)
- word length (abs(length of word1 - length of word2))

=============================================================================================================

### Import base packages

Note, we only import packages used in this notebook. All dependencies for our custom functions are imported by  make_rsa_model_functions.py

In [None]:
import os
import sys

### Import module with custom functions

This step requires that the path to our custom module be defined. The path is relative to "top_dir", which in turn is defined by a text file (top_dir_[OS].txt). The top_dir file should be stored ONE directory level up from the folder 

Important: make_rsa_model_functions calls various assets that are required for certain custom functions. The paths to these assets are defined relative to the path in top_dir_[OS].txt, which is also called by the module. HOWEVER (for strange Python reasons), the imported module locates top_dir.txt relative to THIS script. So, the top_dir path in make_rsa_model_functions should still be one directory level up from here, regardless of where make_rsa_model_functions.py is actually stored. 

In [None]:
# Define paths
top_dir =  open('../top_dir_win.txt').read().replace('\n', '')
custom_func_dir = os.path.join(top_dir, 'scripts', '0_custom_functions', 'python')

# Add custom_func_dir to system path
sys.path.insert(0, custom_func_dir)

# Import custom functions
from make_rsa_model_functions import *

# Note - if the above line throws a warning about Levenshtein distance, 
# you can safely ignore it (we don't use Levenshtein distance)

### Define lists of subjects, conditions, and experiments

In [None]:
subjects = ['subject-001', 'subject-002', 'subject-003', 'subject-004', 'subject-005', 'subject-006',
            'subject-007', 'subject-008', 'subject-009', 'subject-010', 'subject-011', 'subject-012',
            'subject-013', 'subject-014', 'subject-015', 'subject-016', 'subject-017', 'subject-018',
           'subject-019', 'subject-020', 'subject-021', 'subject-022', 'subject-023', 'subject-024',
           'subject-025', 'subject-026', 'subject-027', 'subject-028', 'subject-029', 'subject-030']

# Note that the 'alltrials' condition refers to models comprising ALL stimuli 
# (regardless of condition). These are not used for RSA, but are treated later 
# on as "exemplar" models. These are used to estimate (potential) collinearity 
# between the models actually used for RSA.
conditions = ['alltrials', 'aloud', 'silent']

experiment = 'quickread'

measures = [ 
            'articulatory',
            'orthographic',
            'phonological',
            'semantic',
            'visual',
            'wordlength'
           ]


### Loop through subjects and conditions, using the pre-made functions to create matrices for each measure

Note that the visual measure may take a long time, because it generates a unique image for every single word (per participant & condition)

In [None]:
for subject in subjects:
    print(subject)

    # subject-009 did not complete the quickread experiment
    if subject=='subject-009':
        continue

    # Define output directory
    output_dir = os.path.join(assets_dir, subject, 'RSA_models', experiment)

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Read in word lists
    words_aloud_fn = os.path.join(top_dir, 'behavioural_data', subfolder, subject, 'aloud_words.txt')
    words_aloud = pd.read_csv(words_aloud_fn, header=None).sort_values(by=0)[0].tolist()

    words_silent_fn = os.path.join(top_dir, 'behavioural_data', subfolder, subject, 'silent_words.txt')
    words_silent = pd.read_csv(words_silent_fn, header=None).sort_values(by=0)[0].tolist()

    words_alltrials = sorted(words_aloud + words_silent)

    for condition in conditions:

        for measure in measures:

            # Call the function for this measure with eval
            get_matrix = eval('make_' + measure + '_matrix')

            # Call the word list for this condition
            word_list = eval('words_' + condition)

            # Generate a matrix for this condition
            x = get_matrix(word_list)

            # Save matrix to disk
            output_fn = os.path.join(output_dir, experiment + '_' + subject + '_' + condition + '_' + measure + '.csv')
            x.to_csv(output_fn)