# Crossmodal Capture Data Analysis

I have pilot data for my first-year project (a multisensory attention study) and I want to create a data analysis pipleine that will allow me to import and clean my data, perform statistics, and visualize my results.  

# Data Cleaning

### Behavioral Data (PsychoPy output)
1. Import data by having it search for all the CSV files in a given folder and concatenate them into a single dataframe
2. Create a function that will calculate accuracy for each participant:

def SubjectAccuracy(df):<br>
count number of correct trials for each subject<br>
count number of incorrect trials for each subject<br>
calulate accuracy for each subject<br>
return (dataframe that shows accuracy for each subject)

3. Rename the variables in my dataframe to be easier to interpret, remove unnecessary columns
4. Create a function that will remove outliers by participant by condition. E.g.:

def RemoveOutliers(dataframe):<br>
    for each condition, for each subject:<br>
        calculate and remove trials with reaction times that are outliers (based on and interquartile range <br>
        calculation)<br>
    return (cleaned dataframe with outlier RT trials removed)<br>
    
5. Remove incorrect trials (or create two dataframes, one with and one without incorrect trials)

### Eye-Tracking Data
1. Import data by having it search for all the CSV files in a given folder and concatenate them into a single dataframe
2. Rename columns to something more readable
3. Create a function to turn the dataframe into something more manageable. EyeLink outputs multiple rows for each trial, so I just want one summary row for each trial (e.g., first fixation, latency) to facilitate merging with the behavioral dataframe, running stats and creating figures. Something like this:

def EyeDataframeCleaner(df):<br>
make a dataframe with columns for trial, participant, total # of fixations, dwell time on each fixation, latency, interest areas visited on first, second, and third fixations<br>
return (TidyNewEyeTrackingDataframe)


### Merge Behavioral and Eye-tracking Data
Merge the two dataframes on participant and trial
Output this nice dataframe to CSV

# Data Analysis

Since I do not have a full dataset yet, I do not want to over analyze these data, but I do want to create the pipeline for analysis once I can collect the rest. The analyses that I want to conduct based on these data are:
1. For viewing purposes, look at mean reaction times, mean latency, most frequent first fixation, and mean dwell time on each fixation for each condition across subjects
    - This will require me to create a dataframe/table that lists all of these summary statistics as columns with condition labels as rows
2. Conduct a 3-way ANOVA for reaction time across 8 conditions (study is a 2x2x2 design)
3. Test the difference in proportion of first saccades to each interest area by condition
    - This will require me to create a function to calculate proportions of first saccades to each interest area by condition and test the differences between conditions. My function will look something like this:
def FirstFixProportions(dataframe):<br>
count first fixations to each interest area by condition<br>
calculate the proportion of first fixations to each interest area by condition<br>
return (dataframe of proportions of first saccades to each interest area by condition)<br>

The first step in this can be made to fit different datasets for future use, the analyses for #\s 2 and 3 may be more specific to this study

# Figures

1. Accuracy by condition
    - Categorical bar plot with % accuracy on the y-axis and condition on the x-axis
2. Proportion of first saccades made in the direction of the target or the distractor for each condition
    - Create a stacked bar plot with conditions on the x-axis, and proportion of first saccades on the y-axis, which will add up to 100%
3. Reaction time by condition
    - bar plot with reaction time (in seconds) on the y-axis and condition on the x-axis
4. Reaction time by condition for trials in which participants first fixate on the high probability target location (this is a little hard to explain without explaining the study)
    - categorical bar plot, this will require me to subset data into trials where participants look at the high probability target location on their first fixation
5. Reaction time by condition for trials in which participants first fixate on the low probability target location
    - categorical bar plot, this will require me to subset data into trials where participants look at the low probability target location on their first fixation
6. Plot reaction time by subject
    - line plot to look at variability across participants
7. Plot distributions of reaction times across conditions
    - violin plots to look for systematic variability in different conditions
8. Mean dwell time on first fixation by condition.
    - categorical bar plot
9. Mean latency by condition.
    - categorical bar plot


Because many of these plots will be categorical bar plots with condition labels on the x-axis and some time variable on the y-axis, I think it would be most efficient to create a function to make these plots that allows me to input arguments for data, y-axis variable, title, y-limit, and color palette. This would be cleaner code than reproducing very similar seaborn barplot functions that take up a lot of room. However, if this seems redundant given seaborn's capabilities, I might just create each one individually.