# Cluster-based permutation test

The cluster-based permutation test is a non-parametric test to reveal if two conditions are significantly different. I know that this test can be a bit difficult to wrap your head around. Here are some additional resources that I found helpful:

- [Intro to cluster permutation statistics](https://benediktehinger.de/blog/science/statistics-cluster-permutation-test/)
- [How not to interpret the results](https://www.fieldtriptoolbox.org/faq/how_not_to_interpret_results_from_a_cluster-based_permutation_test/)
- [MNE-tutorial](https://mne.tools/stable/auto_tutorials/stats-sensor-space/75_cluster_ftest_spatiotemporal.html#sphx-glr-auto-tutorials-stats-sensor-space-75-cluster-ftest-spatiotemporal-py)
- [Fieldtrip tutorial (includes a video)]([https://www.fieldtriptoolbox.org/tutorial/cluster_permutation_timelock/](https://www.fieldtriptoolbox.org/tutorial/cluster_permutation_timelock/))
- [The original paper introducing the test](https://pubmed.ncbi.nlm.nih.gov/17517438/)

## Introduction to the cluster-based permutation test


Shoutout to Mina for the following steps to perform the cluster-based permutation test:
1. Set threshold for t-values (contrast) for clustering
2. Compare conditions: get t-values for the contrast (for the difference in the two conditions) for each time-stamp
3. Form cluster out of t-values for the contrast (two conditions) that go beyond the threshold. Sum the t-values of that particular cluster together with all the neighbouring channels (to get the space dimension). Do this for all the data
4. Take the highest t-value sum among all the clusters you identify. Save this number.
5. PERMUTATION: Shuffle the two conditions i.e., making an empirical null distribution
7. Repeat step 2-4 for the new distribution (data that was shuffled) - recommended to repeat this step 10,000). Save the highest t-value sums of the clusters for each run (i.e., you will end up with 10,000 highest t-value sums)
8. Make a distribution of the 10,000 highest t-value sums 
9. If initial highest t-value value (*from step 4*) is outside the distribution (larger or smaller) then you say that you have a significant difference between the two conditions

Luckily, we don't have to do this by hand and we can use one of the `mne-python` functions to do this. However, it is important to understand the steps above to understand the output of the function. 

In [None]:
import mne
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

In [None]:
# load in the data
all_epochs = []
# relative path to the data 2 directories up
data_path = Path(Path.cwd()).parents[1] / "data" / "preprocessed"

for participant in ["Group1", "Group5", "Group6"]:
    epochs = mne.read_epochs(data_path / f"{participant}-epo.fif", verbose=False, preload=True)

    # only keep eeg channels
    epochs.pick_types(eeg=True)

    all_epochs.append(epochs)


print(type(all_epochs)) # we have now created a list of epochs objects
print(len(all_epochs)) # we have 3 epochs objects in the list
print(type(all_epochs[0])) # we can access the first epochs object in the list which is a Epochs object

In [None]:
# find the channel names that are not the same across all participants
channel_names = []
for epochs in all_epochs:
    channel_names.append(epochs.ch_names)



In [None]:
print(channel_names[0])
print(channel_names[1])
print(channel_names[2])

For the purpose of this example we will be checking if there is a difference between incorrect and correct button presses.

In [None]:
epochs_incorrect = [epochs["Incorrect"] for epochs in all_epochs] # we can use a list comprehension to extract the incorrect trials for each participant
epochs_correct = [epochs["Correct"] for epochs in all_epochs] # we can use a list comprehension to extract the correct trials for each participant

# obtain the data as a 3D matrix and transpose it such that
# the dimensions are as expected for the cluster permutation test:
# n_epochs × n_times × n_channels
data_incorrect = [np.transpose(epochs.get_data(), (0, 2, 1)) for epochs in epochs_incorrect]
data_correct = [np.transpose(epochs.get_data(), (0, 2, 1)) for epochs in epochs_correct]

print(data_correct[1].shape) # we can see that the dimensions are now as expected
print(data_incorrect[1].shape) # we can see that the dimensions are now as expected

In [None]:
# get the adjacency of the sensors
adjacency, ch_names = mne.channels.find_ch_adjacency(epochs.info, ch_type="eeg")

# plot the adjacency
mne.viz.plot_ch_adjacency(epochs.info, adjacency, ch_names)

In [None]:
X_incorrect = np.concatenate(data_incorrect, axis=0) # concatenate the data for all participants
X_correct = np.concatenate(data_correct, axis=0) # concatenate the data for all participants

In [None]:
f_obs, clusters, cluster_p_values, H0 = mne.stats.spatio_temporal_cluster_test(
    [X_correct, X_incorrect], # the data for the two conditions
    n_permutations=100, # the number of permutations
    threshold=dict(start=.5, step=.2), # the threshold for the clusters
    n_jobs=1, # we use parallelization
    adjacency=adjacency, # the adjacency matrix   
)


# ADD PLOTTING OF THE CLUSTERS