# Cognitive Neuroscience: Group Project 2022

## Final Group Project Code Instructions

Marijn van Wingerden, Department of Cognitive Science and Artificial Intelligence – Tilburg University Academic Year 21-22

## Handing in of your code

You can adapt this script template and hand it in for the weekly Group Project Assignments. Whenever you encounter ... in the code, you should complete the code in place (of course you can add lines before and after). "Your code here" indicates where code blocks should go.

Whenever you are asked to make a plot, it should be completed with a meaningful plot title, xlabel and ylabel texts. Figures are started with a Matplotlib figure handle: "fig_Q2A, ax = plt.subplots;". This indicates that a link (called handle) to your figure will be saved in the variable, so we can easily check it when checking your scripts. Whenever a naming convention for a variable is given, use it, because it will allow semi-automatic grading of your project script.

## Group members:

Please list the contributors and their U-numbers here in comments:

- 
-
- 
- 
- 

## Setting up: list your modules to import
For loading/saving puroposes, we will make use of the **os** package.
An example worksheet with instructions on how to use the os package will be provided

In [43]:
%matplotlib notebook

import os
import numpy as np
from pprint import pprint
import pandas as pd 
import matplotlib.pyplot as plt
import scipy.fft as fft

## Data loading

In your assignment, you will compare neural data in different trial conditions from the same participant: this is a *within-subject* comparison. You can think of this as a contrast: which spectral features are more present in condition A vs. condition B?

The second level of the analysis focuses on group statistics. You will answer a question like: "As a group, do the participants in group [RM/RB/RL] show more spectral power in the [delta/theta/alpha/beta/gamma] band in the ambiguous sentences vs. the non-ambiguous sentences?"

The analysis will start with setting up data structures (refer to WorkSheet 0) that will hold the relevant data. Because EEG activity can differ between participants (due to e.g. anatomical differences like skull thickness or skin conductivity), the **absolute** voltages that we record are not completely informative. Instead, we will be looking at **relative** differences within an individual to remove the between-subject effects that we cannot control. 

Each datafile that you have been given has the trials related to a particular condition (NA-IR and AM-IR, for example). "Control" refers to the non-ambiguous conditions, and "Experimental" to the ambiguous condition. 
Please note that the SF and OF trial types have been ignored (that is, added together). The datafiles are NumPy arrays that have been saved to disk. These arrays are 3D arrays: 
- the 0 dimension is the trial repetitions
- the 1st dimension is the number of channels
- the 2nd dimension if the number of samples in a trial
    - for the baseline, this is 0.55s of data (276 samples: from -0.05s to +0.5s)
    - for the evoked period, this is 1.5s of data (751 samples: from -0.5s to +1.0s)

You will need to load the datafiles from all participants and add them all together so that we end up with a 4D matrix that has nParticipants x nTrials x nChannels x nTime. You can make your work easier by organising the datafiles in such a way that you put the control.npy files in their own subdirectory, and the experimental.npy files as well. 

In order to load the files, we can use the os package.

Adapt the following so that it works on your machine:

In [44]:
# enter the path to the base directory where the folder called group_xx is located
path_base = os.path.normpath('...')

In [47]:
group = 'group_04/' # update this to reflect the name of your group with 2 digits or SX for S1, S2, S3

# path_base + group
path_data = os.path.join(path_base,group)
files = os.listdir(path_data)
control_files = list()
experimental_files = list()
control_files_baseline = list()
experimental_files_baseline = list()


for f in files:
	# check the files that end with specific extention 
    # if a given file would need to be excluded, this is how to do it
    #if f.rfind("part_10") > -1:
    #    continue
    if f.endswith("control.npy"):
        control_files.append(f)
    elif f.endswith("experimental.npy"):
        experimental_files.append(f)
    elif f.endswith("control_baseline.npy"):
        control_files_baseline.append(f)
    elif f.endswith("experimental_baseline.npy"):
        experimental_files_baseline.append(f)
            

# check that the length of your files list matches the provided datafiles, and contains only .npy datafiles

## EVOKED files
control_files.sort()
pprint(control_files)
print("the number of control files is: ", len(control_files), "\n")
experimental_files.sort()
pprint(experimental_files)
print("the number of experimental files is: ", len(experimental_files), "\n")

## BASELINE files
control_files_baseline.sort()
pprint(control_files_baseline)
print("the number of baseline control files is: ", len(control_files_baseline), "\n")
experimental_files_baseline.sort()
pprint(experimental_files_baseline)
print("the number of baseline experimental files is: ", len(experimental_files_baseline), "\n")

['group_04_part_01_control.npy',
 'group_04_part_02_control.npy',
 'group_04_part_03_control.npy',
 'group_04_part_04_control.npy',
 'group_04_part_05_control.npy',
 'group_04_part_06_control.npy',
 'group_04_part_07_control.npy',
 'group_04_part_08_control.npy',
 'group_04_part_09_control.npy',
 'group_04_part_10_control.npy',
 'group_04_part_11_control.npy',
 'group_04_part_12_control.npy',
 'group_04_part_13_control.npy',
 'group_04_part_14_control.npy',
 'group_04_part_15_control.npy']
the number of control files is:  15 

['group_04_part_01_experimental.npy',
 'group_04_part_02_experimental.npy',
 'group_04_part_03_experimental.npy',
 'group_04_part_04_experimental.npy',
 'group_04_part_05_experimental.npy',
 'group_04_part_06_experimental.npy',
 'group_04_part_07_experimental.npy',
 'group_04_part_08_experimental.npy',
 'group_04_part_09_experimental.npy',
 'group_04_part_10_experimental.npy',
 'group_04_part_11_experimental.npy',
 'group_04_part_12_experimental.npy',
 'group_04_

## Combining data and matrix pre-allocation
next, you will need to load these files one by one and extract the data for this participant. 
The data in the NumPy arrays are stored as Trials x Channels x Time. To aggregate across participants, you will thus need to add a 4th dimension to store the data.

To be able to adequately pre-allocate the data from the different subjects, we will load one trial subject manually to have a look at the shape/dimensionality of the data:

In [None]:
EEG = np.load(os.path.join(path_data,control_files[0]))
              
# control_files is a list of strings, so indexing its first element returns a string
# in this case, we are loading the first entry of control_files, i.e. participant 1

# verify that the number of trials equals 44, 
# verify that the number of channels equals 64 or 65 
# and verify that there are 751 samples per trace

print("Number of trials = ",...)
print("Number of channels = ", ...)
print("Number of timepoints = ", ...)

# do the same for one of the baseline datafiles (they have a different number of samples)

EEG_base = np.load(os.path.join(path_data,control_files_baseline[0]))

print("Number of trials (base) = ", ...)
print("Number of channels  (base) = ", ...)
print("Number of timepoints (base) = ", ...)

## Q1 - setting up the data structure and loading data from all participants

The EEG data is currently stored as a 3-dimensional NumPy array. But to run our time-frequency analysis, we need some more information like the sampling rate and the time axis that corresponds to the stimulus-locked analysis window. In order to set up (=pre-allocate) a matrix that will hold all traces for all participants, we need to know the sizes of the dimensions of this 4-dimensional matrix, and fill up this matrix by looping over participants:

In [None]:
# There are 64 or 65 channels in the dataset. Only channels 1-59 (not python indexes!) are EEG channels
# the remaining channels are EMG and EOG channels that we will ignore in this analysis
# subset your EEG array so that only the EEG channels remain

##
## Your code here
##

# Define nTrials, nChans (=channels), nSamples, nSamples_base and nParticipants. 
nTrials = ...
nChans = ...
nSamples = ...
nSamples_base = ...
nParticipants = ...

# Then, pre-allocate a matrix filled with zeros and with size nParticipants x nTrials x nChans x nSamples
# one each for the control, experimental, control_baseline and experimental_baseline data. 
# Name them: 
# data_control 
# data_experimental
# data_control_base
# data_experimental_base

data_control = ...
data_experimental = ...
data_control_base = ...
data_experimental_base = ...


# next, we need to loop over all participant datafiles and add them to the appropriate slice in your 4-D arrays
# For this, you need to use specific array indexing to indicate where in comb_data_(control/experimental)
# each participant's data needs to go. You can and should reuse the data-reading code above.

# CAREFUL! Not every participant may have the same number of (correct) trials in their dataset. 
# So for each newly loaded datafile, you need to establish the current number of trials again

# loop over participants, and within each iteration of the loop, load the
# next datafile and fill comb_data_(control/experimental) with the EEG traces (nTrials x nChans x nSamples)
# check the shape of the matrices after filling them

for iPart in range(len(control_files)):
    ##
    ## Your code here
    ##
    
print("Shape of data_control:",...)
print("Shape of data_experimental:",...)
print("Shape of data_control_base:",...)
print("Shape of data_experimental_base:",...)

Congratulations on completing this question of the Group Assignment!
Please check the instructions for submission of this code in the Canvas Assignment. You may need to upload two files for certain assignments