# EEG Statistics

## Epochs
We need epochs for the statistical tests, so you can insert your preprocessed data from earlier tutorials under this section. I'll read in some epochs from your FaceWord which I prepared earlier and I'll be looking at the words/images contrast.

(Tip: You can save the epochs object you created in your preprocessing notebook by using epochs.save('your_epochs-epo.fif'))

(Extra tip: You can run terminal commands from cells using the os.system() function or simply writing an exclamation mark before the command)

## Importing modules

In [4]:
!python -m pip install mne --quiet
!pip install scikit-learn --quiet
!pip install pandas --quiet


import numpy as np
import pandas as pd
import mne

## Importing epochs

In [6]:
epochs = mne.read_epochs('ownExperiment_epochs-epo.fif')

Reading /work/studybuddies_neuroScience/own_experiment/ownExperiment_epochs-epo.fif ...
    Found the data of interest:
        t =    -200.00 ...     496.00 ms
        0 CTF compensation matrices available
0 bad epochs dropped
Not setting metadata
523 matching events found
No baseline correction applied
0 projection items activated


### Dividing into different conditions

In [7]:
incorr2_epocs = epochs['recog_phase/incorr2/second']
incorr4_epocs = epochs['recog_phase/incorr2/second']
corr5_epocs = epochs['recog_phase/corr/fifth']

## Windowed mean
Now we have our two conditions: trials with words vs images. One of the simplest way in which we can determine whether the signal in our two conditions are statistically significant is by:

1) Segmenting our data using only certain channels in a specific time window. **Keep in mind which time window and channels should be established a priori, for instance according to the literature.** 
2) Taking the mean of that window across channels and and samples.
3) Running statistical tests on the windowed means from the two conditions.

In an experiment with multiple participants we would also average over trials from individual participants, in order to only have one data point per participant (and thereby avoid multiple comparisons). However, since we have one participant, we can keep one dimension of the individual data, i.e. the trials.

### T-test
We can now do a t-test on the trials from the two conditions, to establish whether the means of the two groups are statistically significant.

We can use the get_data() function to get the numerical values of the signal (in microvolts) for the t-test. tmin and tmax are used to define the size of the window, and the picks are the channels that we expect to see an effect in.

In [19]:
incorr2_data = incorr2_epocs.get_data(picks=['O1', 'Oz', 'O2'], tmin=.1, tmax=.2) #01,Oz,02 centered around visCortex
print(incorr2_data.shape)

incorr4_data = incorr4_epocs.get_data(picks=['O1', 'Oz', 'O2'], tmin=.1, tmax=.2)
print(incorr4_data.shape)

corr5_data = corr5_epocs.get_data(picks=['O1', 'Oz', 'O2'], tmin=.1, tmax=.2)
print(corr5_data.shape)

(25, 3, 25)
(25, 3, 25)
(31, 3, 25)


I've also tried adding 
```, 'P3', 'P4', 'Pz'``` 
to the electrodes (picks) without any luck

Investigating the resulting data; how many dimensions does the data have? What do you think they represent (i.e. which dimension is channels, trials, etc.)?

Now we can average over the data so we only have the trials dimension.

In [20]:
# Incorrect second
incorr2_mean = np.mean(incorr2_data, axis=2) # averaging over the third dimension of the data
incorr2_mean = np.mean(incorr2_mean, axis=1) # averaging over the second dimension of the data
print(incorr2_mean.shape)

# Incorrect fourth
incorr4_mean = np.mean(incorr4_data, axis=2)
incorr4_mean = np.mean(incorr4_mean, axis=1)
print(incorr4_mean.shape)

# Correct fifth
corr5_mean = np.mean(corr5_data, axis=2)
corr5_mean = np.mean(corr5_mean, axis=1)
print(corr5_mean.shape)

(25,)
(25,)
(31,)


In [27]:
from scipy import stats as st
import statistics as stats

st.ttest_ind(a=incorr4_mean, b=corr5_mean)

Ttest_indResult(statistic=0.8060170719372484, pvalue=0.4237683846193888)