# EEG Statistics

Using some of the data you collected, we are going to go through a couple of different ways to discern whether the difference in signal between your conditions is statistically significant.

#### Setting up Python
Before starting to analyse our own EEG data, we need to make sure we have our virtual environment we created during the `MNE-tutorial`.

1. Press `Select Kernel`, then `Python Environments...` and then choose any Python kernel. 
2. Run the code chunk below
3. Change the kernel used to run the code in this notebook. Press where it says `Python X.XX.XX` in the top right corner, then `Select Another Kernel`, then `Jupyter kernel...` and then select `env`. If `env` does not show up, press the little refresh symbol! 

In [None]:
!bash ../env_to_ipynb_kernel.sh

#### Import packages

In [None]:
import mne
import numpy as np
import pandas as pd
import os


## Epochs
We need epochs for the statistical tests, so you can copy the code from another notebook to create epochs. You can also run the code where you create your epochs in another notebook and save the epochs by using this code:

```epochs.save("../SUPERCOOLFILENAMEHERE-epo.fif", overwrite=True)```

Then you can read the saved epochs into this file by using:

```epochs = mne.read_epochs("../SUPERCOOLFILENAMEHERE-epo.fif", preload=True)```

In [None]:
# LOAD OR CREATE EPOCHS HERE!

In [None]:
# Extract the contrast you are interested in using the following code as an example

# epochs_condition1 = epochs["incongruent"]
# epochs_condition2 = epochs["congruent"]

## Windowed mean
Now we have our two conditions: trials with words vs images. One of the simplest way in which we can determine whether the signal in our two conditions are statistically significant is by:

1) Segmenting our data using only certain channels in a specific time window. Keep in mind that which time window and channels should be established a priori, for instance according to the literature. 
2) Taking the mean of that window across channels and and samples.
3) Running statistical tests on the windowed means from the two conditions.

In an experiment with multiple participants we would also average over trials from individual participants, in order to only have one data point per participant (and thereby avoid multiple comparisons). However, since we have one participant, we can keep one dimension of the individual data, i.e. the trials.

### T-test
We can now do a t-test on the trials from the two conditions, to establish whether the means of the two groups are statistically significant.

We can use the get_data() function to get the numerical values of the signal (in microvolts) for the t-test. tmin and tmax are used to define the size of the window, and the picks are the channels that we expect to see an effect in.

In [None]:
# INSERT CHANNEL NAMES YOU ARE INTERESTED IN BELOW
picks = ["", "", ""]


# DETERMINE THE TIME FRAME YOU WANT TO LOOK AT (remember to write it in seconds )
tmin = # INSERT NUMBER HERE
tmax = # INSERT NUMBER HERE


In [None]:
# Now we can extract the data using the following logic
# data_condition1 = epochs_codition1.get_data(picks = picks, tmin = tmin, tmax = tmax)
# data_condition2 = epochs_codition2.get_data(picks = picks, tmin = tmin, tmax = tmax)

Investigating the resulting data; how many dimensions does the data have? What do you think they represent (i.e. which dimension is channels, trials, etc.)?

In [None]:
print(data_condition1.shape)
print(data_condition2.shape)

Now we can average over the data so we only have the trials dimension.

In [None]:
data_condition1_mean = np.mean(data_condition1, axis=2) # averaging over the third dimension of the data
print(data_condition1_mean.shape)

data_condition1_mean = np.mean(data_condition1_mean, axis=1) # averaging over the second dimension of the data
print(data_condition1_mean.shape)

## Now do the same for the second condition!

# INSERT CODE HERE

### Running the t-test

In [None]:
# installing additional packages
!pip install scipy
from scipy import stats as st
import statistics as stats

In [None]:
# running the t-test
st.ttest_ind(a=data_condition1_mean, b=data_condition2_mean)