# Subject level windowed mean analysis

This notebook demonstrates a simple analysis of epoched EEG data. It introduces the concept of a windowed mean, where two conditions are compared by taking the mean of the signal in a certain time window in relevant sensors and comparing the means. This includes some basic plotting and statistical testing using t-tests.

## Setting up Python
First of all, we need to make sure that we are working in the `env` environment.


1. Run `bash env_to_ipynb_kernel.sh` from the `EEG` folder if you have not already done so. This will make sure that the `env` environment is available as a kernel in this notebook.

2. Press `Select Kernel`, then `Jupyter kernel...` and select `env`. If `env` does not show up, press the little refresh symbol!

**Note:** You might have to install the Jupyter extension for VScode to be able to select the kernel.

In [None]:
# import libraries
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import mne

## Loading the data
First, we load in preprocessed EEG data from a single subject.

In [None]:
# load in the data

data_path = Path("/work/EEG_lab/example_data")

epochs = mne.read_epochs(data_path / f"Group1-epo.fif", verbose=False, preload=True)

# only keep eeg channels
epochs.pick(["eeg"])

## Single participant analysis
In this note book we will compare the correct and incorrect button presses for one participant. We will do this by taking the mean of the signal in a certain time window and comparing the means. We will also plot the data and perform a t-test to see if the difference is significant.


### Extracting data from the epochs

In [None]:
# extract the trials you are interested in
epochs_incorrect = epochs["Incorrect"] # all the button presses for incorrect trials
epochs_correct = epochs["Correct"] # all the button presses for correct trials

Now we have our two conditions: Correct and incorrect button presses. One of the simplest way in which we can determine whether the signal in our two conditions are statistically significant is by:

1. Segmenting our data to channels and a time window chosen a priori.

2. Taking the mean of that window time window across channels.

3. Running statistical tests on the windowed means from the two conditions.

In an experiment with multiple participants we would also average over trials from individual participants, in order to only have one data point per participant (and thereby avoid multiple comparisons). However, since we have one participant (for now), we can keep one dimension of the individual data, i.e. the trials.

If you are conducting a windowed mean analysis, you should rely on previous literature to determine which channels and time windows to use. 

**Note:** The time window and channels used in this example are arbitrarily chosen for the purpose of demonstrating how to calculate the windowed mean and run a statistical test on it.

### Preparing the data for t-test
The aim is to conduct a t-test on the averaged data, to establish whether the means of the two conditions (correct and incorrect) are different.

We can use the `get_data()` function to get the numerical values of the signal for the t-test. tmin and tmax are used to define the time window, and the picks are the channels that we expect to see an effect in.

In [None]:
# chosen channels
picks = ["Fz", "Cz", "Pz", "Oz"]

# time window in seconds
tmin = 0.2
tmax = 0.4

In [None]:
# HINT 
# you can use the following to get the channel names
channel_names = epochs_incorrect.ch_names
print(channel_names)

In [None]:
# extract the data
data_incorrect = epochs_incorrect.get_data(picks = picks, tmin = tmin, tmax = tmax)
data_correct = epochs_correct.get_data(picks = picks, tmin = tmin, tmax = tmax)

Investigating the resulting data; how many dimensions does the data have? What do you think they represent (i.e. which dimension is channels, trials, etc.)? Is there a difference in the number of trials between the two conditions?

In [None]:
print(data_incorrect.shape)
print(data_correct.shape)

Now we can average over the time window and channels to get one data point per trial. We can use the `np.mean()` function to do this, and specify the axis over which to average. 

In [None]:
data_incorrect_mean = np.mean(data_incorrect, axis=2) # averaging over the third dimension of the data (time)
print(data_incorrect_mean.shape)

data_incorrect_mean = np.mean(data_incorrect_mean, axis=1) # averaging over the second dimension of the data (channels)
print(data_incorrect_mean.shape)

# you can also do this in one line
data_correct_mean = np.mean(data_correct, axis=(1,2)) # averaging over the second and third dimension of the data (channels and time)
print(data_correct_mean.shape)

### Running the t-test
Now that we have one numerical value per trial, we can compare the means of the two conditions using a t-test. We can use the `scipy.stats.ttest_ind()` function to do this.

In [None]:
from scipy import stats

In [None]:
stats.ttest_ind(data_correct_mean, data_incorrect_mean)

### Plotting
Now lets create a plot of the time courses averaged over time and channels, where we also plot the time window that we used for the t-test.

In [None]:
plot_data_incorrect = epochs_incorrect.get_data(copy = True).mean(axis=(0, 1))
plot_data_correct = epochs_correct.get_data(copy = True).mean(axis=(0, 1))

times = epochs_incorrect.times

fig, ax = plt.subplots(1, figsize=(10, 5), dpi=300)

# plot the time window
ax.axvspan(tmin, tmax, color="grey", alpha=0.2)

# plot the time course
ax.plot(times, plot_data_incorrect.T, label="Incorrect")
ax.plot(times, plot_data_correct.T, label="Correct")

# vertical line at 0
ax.axvline(x=0, color="black", linestyle="--", label = "Button press", linewidth=1)

ax.set(xlabel="Time (s)", ylabel="Amplitude", title="ERP")
ax.legend()