# analyze_reaction_times

Performs analysis of reaction time data for subjects categorizing images of animals. The data is simulated and the code is meant only to demonstrate basics of paths and file handling. See the `README.md` for further information.

Part of the *Computational Fluency Short Course* offered in 2024 by Brown University's Carney Institute. Intended only for educational purposes; use at your own risk.

In [None]:
# Import block
# Loads packages that extends base python

import os  # Operating system utilities

import pandas as pd  # Common package for working with tabular data

## Getting paths to the data

As described in the course, different operating systems use slightly different conventions for describing paths. The most notable differences are that Linux and Mac use `/` as a separator while Windows uses `\`, and the root directory on Linux and Max is `/`, while the Windows root is (usually) `C:\`.

In order to write code that works across operating systems, we have to check the OS and figure out the right convention to use. As a convenience, `python` provides the `os` module that does the work for us.

A second issue is that we have to know where the data is, but if we use an absolute path that is correct on our computer, it will break when moved to a different computer. One way we can try to handle this situation is with *relative paths*, assuming we know the working directory set when the notebook starts up (sometimes this breaks also, especially on remote systems, and some troubleshooting is required). By default, opening a notebook within Jupyter Lab sets that notebook's local as the working dirtectory. 

In [None]:
# First pass at loading a data file:
# Just assume it is in the current working directory

#df = pd.read_csv('sub-1/sub-1_task-class_beh.csv')  # This will fail !

In [None]:
# Second pass at loading a data file:
# Use the absolute path on "my" computer

#df = pd.read_csv('/Users/jritt/QENC/Teaching/2024_Spring/ComputationalFluency_2024/Content/Public/cfsc2024-ex-paths/testdata/sub-1/sub-1_task-class_beh.csv')
# This will fail on everyone's computer except mine!

So how do we make a more sharable, portable data load? We need to build a path to a data directory. This repo includes small data files (remember to avoid large files in git repos!) in a directory called testdata.

We also use the `os` module to deal with driectory names.

In [None]:
# Put together a path to the data

# Look at the python kernel's working directory
# Note this directory may not match wherever you started Jupyter Lab
cwd = os.getcwd()  
print(f'Current working directory:\n{cwd}\n')

# Make a relative path to the data
data_dir_rel = os.path.join('..','testdata')  # join adds the correct path separator
# Doube dots ".." mean "go up one directory"
print(f'Relative path to data:\n{data_dir_rel}\n')

# Convert a relative path to absolute path
data_dir = os.path.abspath(data_dir_rel)  
print(f'Absolute path to data:\n{data_dir}\n')

Now we look in the data directory to find subject-wise subdirectories, and continue down to the data files.

**Note**: It's usually a bad idea to blindly loop through everything within a directory. There will often be other kinds of files in there, either intentionally (like an index of subjects) or unintenitionally (automatically generated files and/or dot-files).

In [None]:
print(f'Content of {data_dir}:\n')
for name in os.listdir(data_dir):
    is_subject = str.startswith(name,'sub-')  # Test the name
    if is_subject:
        print(f'Found subject data directory: {name}')
    else:
        print(f'Skipping other file {name}')
print('\n')

## Manually load some data

As a demonstration, we load a single data file, and look at its contents.

In [None]:
filepath = os.path.join(data_dir,'sub-1','sub-1_task-class_beh.csv')
print(f'Loading data at {filepath}')

df = pd.read_csv(filepath)  # Load data as a Pandas DataFrame

In [None]:
df.info()

In [None]:
df

In [None]:
# Extract columns by name
print('The set of reaction times is\n', df["ReactionTime"] )

# Extract operations on columns via "methods"
print(f'\nThe mean reaction time is { df["ReactionTime"].mean() } seconds')

In [None]:
# Optional Exercise: find the standard deviation of RTs

print(f'The standard deviation of reaction times is { df["ReactionTime"].std() } seconds')

In [None]:
# Optional Exercise: use the above results to make "z-scored" RTs, that is, 
#   subtract the mean and then divide by the standard deviation for all trials

rt_mean = df["ReactionTime"].mean()
rt_std = df["ReactionTime"].std()

df["ReactionTimeZScore"] = (df["ReactionTime"] - rt_mean) / rt_std
df

In [None]:
# Optional Exercise: find the three mean RTs separately for each type of trial (this
# one is a little more advanced)

trial_type_list = ['dog', 'bird', 'cat']

for trial_type in trial_type_list:
    df_cur = df[ df["TrialType"] == trial_type ]
    print(f'For { trial_type } trials, the mean reaction time is { df_cur["ReactionTime"].mean() } seconds')

In [None]:
# Optional Exercise: find the two mean RTs separately for correct and incorrect
# trials (this one is harder still, but follows from the previous exercise)

is_correct = df["TrialType"] == df["Response"]
df_correct = df[ is_correct ]
df_incorrect = df[ ~is_correct ]  # ~ performs "logical not"

print(f'The mean reaction time on correct trials is { df_correct["ReactionTime"].mean() } seconds')
print(f'The mean reaction time on incorrect trials is { df_incorrect["ReactionTime"].mean() } seconds')

## Paths for loading code

Sometimes we'd prefer to use our own custom code across multiple notebooks, for example to do some standardized data loading or pre-processing. We then put that code into a module (bassically just a `.py` file) or package, and then `import` it in the notebook.

However, the python interpreter needs to know where to look for that code. It searches a "path list", that is automatically set when python is launched. 

In [None]:
import sys  # Access python system properties

In [None]:
path_list = sys.path

print('Python will search for code in the following directories (in order):\n')
for path in path_list:
    print(path)

Note the empty line (it is not missing, it is actually an empty string). Python interprets the empty string as meaning whatever the current working directory is.

We thus have several choices to load our code:
- Put the module in the same directory as the notebook
- Change the working directory to wherever the module is
- Change the system path to include the location of the module
- Package the module in a way that can be placed in a "global" location

There are many opinions about best practice, and it often depends on the details of your project (for example, how globally useful the extra code is, and how many people are sharing it). For code that will be reused often, making a package may be the best option, though it takes a little extra work.

We will demonstrate a kind of hack that can be useful in simple situations: we change directories to where the code should be, import it, then change back to where we started.

In [None]:
# First try to import with just the module name (will fail !)

#import datahandling as dh

In [None]:
# Hack to import code in another directory without changing the path list

wd_orig = os.getcwd()

wd_code = os.path.abspath( os.path.join('..','utils') )
print(f'Attempt to load module in:\n{wd_code}\n')

# We put in a try-except block so even if there's an error during import 
# we still get to code that will restore the working directory
try:
    os.chdir(wd_code)
    import datahandling as dh
    print('Import succeeded')
except:  # If the try block crashes we end up here 
    print('Failed to load module; attempting to restore working directory ')

os.chdir(wd_orig)

In [None]:
print(f'Current working directory is {os.getcwd()}')

Now let's use the utilities in `datahandling` to load the data.

In [None]:
df2 = dh.load_subject(filepath)
df2.info()

In [None]:
# Check that error handling in dh works as expected
# This is a simple, informal example of using a "unit test"

df3 = dh.load_subject('/nonsense/file/path') # Prints an error message
assert df3 is None  # Nothing should happen