# EEG Preprocessing using independent component analysis (ICA)
The code in this notebook uses ICA to remove artifacts, and epochs the EEG data both to the time at which the stimuli was presented and the time at which the response was given. 

ICA can be used for artefact detection, since it identifies seperate components of the signal that have been combined during recording. That means that we can actually separate noise compoenents, such as eye blinks, from the rest of the signal, and thereby exclude them.


**Links:** https://arnauddelorme.com/ica_for_dummies/


## Loading modules & data

In [None]:
# importing modules
import numpy as np
import mne
import pandas as pd
import helper_functions as hf
#! pip install mne
#! pip install scikit-learn

In [None]:
raw = mne.io.read_raw_brainvision('Stroop_mouse_EEG_data/EEG/Group7_own.vhdr', eog=('EOG1', 'EOG2'), preload = True)

### Removing EEG data which was recorded before and after the experiment

In [None]:
raw.crop(tmin=0.0, tmax=410, include_tmax=True)

### Specifing the channel locations using the montage-related functions

In [None]:
montage = mne.channels.make_standard_montage('standard_1020') 
raw.set_montage(montage, verbose=False)

## Redefine the reference to a common average

In [None]:
raw.set_eeg_reference('average', projection=False, verbose=False)

## Preparing data for ICA
High-pass filtering the data at 0.1 Hz and subsequently low-pass filtering at 40 Hz

In [None]:
# removing two noisy channels
raw.info['bads'] = ['Fp1', 'Fp2']

In [None]:
filt_raw = raw.copy().filter(l_freq=1., h_freq=None)

## Setting up and fitting the ICA
Fitting the ICA with 800 iterations with a random seed at 97. n_components=0.95 ensures that the number of components selected explain at least 95% of the variance in the data

In [None]:
ica = mne.preprocessing.ICA(n_components=0.95, random_state=97, max_iter=800)
ica.fit(filt_raw)

## Plotting of ICA
### Plotting of components

In [None]:
ica.plot_components();

### Plotting of the time series of the ICA components that are assumed to be noise

In [None]:
ica.plot_sources(raw, picks = [0,1], show_scrollbars=False, start = 40, stop = 45);


## Exclusion of components
The blinks can be seen very clearly in ICA001, especially in the time series plot. Therefore this component is removed. 
Furthermore, ICA000 seems to capture the effects of eye-movement.  In the time series plot, you can see evidence of saccades by the discontinuities surrounded by relative stationarity. This is reinforced by the scalp topography plot, as they indicate that the source origin is near the eyes.


In [None]:
ica.exclude = [0,1]
ica.plot_properties(raw, picks=ica.exclude, dB = False, verbose = False)

## Applying ICA to the data

In [None]:
ica.apply(raw)

# High and low pass filtering after applying the ICA
raw = raw.filter(0.1, None)
raw = raw.filter(None, 40)

In [None]:
# plotting the data after ICA
raw.plot(n_channels = 33, scalings = {'eeg': 50e-6}, duration = 10, start = 90);

### Epoching the data
Using the `events_from_annotations` function an array of the events is extracted. The first column is the time stamp in samples, and the third column contains the event id.

In [None]:
events, _  = mne.events_from_annotations(raw)

#### Deleting triggers from practise trials and incorrect trials

Incorrect trials: 22, 48, 86, 145, 153

In [None]:
incorrect_trials = [22, 48, 86, 145, 153] 
incorrectinds = [(i*2+17) for i in incorrect_trials]
incorrectinds.extend([i +1 for i in incorrectinds])
practisetrials = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
remove =  practisetrials + incorrectinds

In [None]:
events = np.delete(events, remove, 0)

#### Adding events
This is done by importing some data extracted from the mousetracking data in R using the `mousetrap` package. 


TODO:
* **Find out which variable** does not seem to be the right one at least

In [None]:
# Loading in csv with info
mouse_df_inc = pd.read_csv('Stroop_mouse_EEG_data/additional_triggers_incongruent.csv')
mouse_df_neu = pd.read_csv('Stroop_mouse_EEG_data/additional_triggers_neutral.csv')
mouse_df_con = pd.read_csv('Stroop_mouse_EEG_data/additional_triggers_congruent.csv')

##### Aligning timing from behavioural data with EEG


In [None]:
# Determining timing for display of first image in EEG (measured in samples)
sample_time_first_image = events[1][0]          # one since first trigger is when the recording started

# Sampling rate
sampling_rate_eeg = 1000/1                      # 1000 Hz

# Timing for display of first images in mousetracking (measured in seconds, first trial is in the incongruent dataframe)
mouse_first_img_display = mouse_df_inc.iloc[0,3]

# Adding trigger for max MAD in incongruent condition
events = hf.add_trigger(mouse_df_inc, 
                        trigger = 33,
                        array = events, 
                        first_img_display = mouse_first_img_display, 
                        sr_eeg = sampling_rate_eeg, 
                        st_first_image = sample_time_first_image, 
                        columnname = 'MAD_time')

Now an array is created with these additional triggers

In [None]:
# Determining all the unique triggers
np.unique(events[:,2])

In [None]:
# Creating a dictionary with event ids
event_id = {'Image/cNeu': 11, # Image trigger neutral condition 
            'Image/cCon': 21, # Image trigger congruent condition  
            'Image/cInc': 31, # Image trigger incongruent condition 
            'Word/cNeu': 12, # Word trigger neutral condition
            'Word/cCon': 22, # Word trigger congruent condition
            'Word/cInc': 32, # Word trigger incongruent condition
            'Max_MAD/cInc': 33 # Word trigger incongruent conditionn
} 

In [None]:
mne.viz.plot_events(events, first_samp=raw.first_samp, event_id=event_id);

**Note:** Determine sensible time window

In [None]:
# establishing time window
tmin, tmax = -0.2, 0.5

In [None]:
# rejecting all epochs with values exeeding 150 micro volts - cannot be brain data
reject = {'eeg': 150e-6}

In [None]:
# choosing only EEG channels for epoching
picks = mne.pick_types(raw.info, eeg=True, eog=False)

In [None]:
# creating the epochs using the variables created in the cell above, and timelocking to the events
# baseline time interval spans from beginning of the data (-0.2 s) to 0 s (stimulus onset)
# we use the reject variable we created earlier in order to remove artefacts
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, picks=picks, baseline=(None, 0), reject=reject, preload=True, verbose = False)


# downsampling to 250 Hz
epochs = epochs.resample(250)
epochs.save("Stroop_mouse_EEG_data/epochs/epochs_epo.fif", overwrite=True)

### Plot epochs sorted by reaction times
In order to plot the epochs sorted by reaction time, we need to provide the `plot_epochs_image` function with the overlay time and order. 

#### Overlay time and order

**Overlay time:**
Times (in seconds) at which to draw a line on the corresponding row of the image (e.g., a reaction time associated with each epoch). Note that overlay_times should be ordered to correspond with the Epochs object (i.e., overlay_times[0] corresponds to epochs[0], etc).

**Order_**
Order is used to reorder the epochs along the y-axis of the image. If it is an array of int, its length should match the number of good epochs. If it is a callable it should accept two positional parameters (times and data, where data.shape == (len(good_epochs), len(times))) and return an array of indices that will sort data along its first axis.

In [None]:
# reading in data frame with reaction times
data_mouse = pd.read_csv('Stroop_mouse_EEG_data/behavioural/trial_info.csv')

# extracting reation times
overlay_times_mouse = data_mouse['rt']

# gettting the order (argsort returns the indicies which sorts the data)
order = np.argsort(overlay_times_mouse)

In [None]:
# getting events
events1, _  = mne.events_from_annotations(raw)

# deleting the 18 first events which are practise trials
events1 = np.delete(events1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], 0)


# for loop to delete events that are not images being shown (wordtriggers, and triggers that starts the experiment)
delete_index = []

# looping over the events
for i in range(len(events1)):
    trigger = events1[i][2]

    if trigger in [12, 22, 32, 99999]:
        # appending the index to the delete_index list
        delete_index.append(i)


# delete the events not wanted
events1 = np.delete(events1, delete_index, axis=0)

In [None]:
# dictionary containing information of the events
event_id1 = {'Image/cNeu': 11, # Image trigger neutral condition 
            'Image/cCon': 21, # Image trigger congruent condition  
            'Image/cInc': 31  # Image trigger incongruent condition 
}

# tmin and tmax
tmin_rt, tmax_rt = -0.2, 1.75

# choosing only EEG channels for epoching
picks1 = mne.pick_types(raw.info, eeg=True, eog=False)

# creating epocks
epochs_overlay = mne.Epochs(raw, events1, event_id1, tmin_rt, tmax_rt, picks = picks1, baseline=(None, 0), reject=None, preload=True, verbose = True)
epochs_overlay = epochs_overlay.resample(250)

In [None]:
# Plotting select channels
fig = mne.viz.plot_epochs_image(epochs_overlay, order=order, overlay_times=overlay_times_mouse, group_by={'FC5, FC1, C3' : [5, 6, 10]}, combine = 'mean')#, vmin = -30, vmax = 30)
fig[0].size = (20, 10)
fig[0].dpi = (300)
fig[0].savefig('figures/sorted_rt.png')

## Epoching with RT

In [None]:
# reading in data frame with reaction times
data_mouse = pd.read_csv('Stroop_mouse_EEG_data/behavioural/trial_info.csv')

# deleting incorrect trials
incorrect_trials = [22, 48, 86, 145, 153] 
data_mouse.drop([trial - 1 for trial in incorrect_trials])

rt = data_mouse['rt']


# Sampling rate
sampling_rate_eeg = 1000/1                      # 1000 Hz


# getting events
events, _  = mne.events_from_annotations(raw)

# deleting the 18 first events which are practise trials and incorrect trials
incorrectinds = [(i*2+17) for i in incorrect_trials]
incorrectinds.extend([i+1 for i in incorrectinds])

practisetrials = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

remove =  practisetrials + incorrectinds

events = np.delete(events, remove, 0)



# for loop to delete events that are not images being shown (wordtriggers, and triggers that starts the experiment)
delete_index = []

# looping over the events
for i in range(len(events)):
    trigger = events[i][2]

    if trigger in [12, 22, 32, 99999]:
        # appending the index to the delete_index list
        delete_index.append(i)


# delete the events not wanted
events = np.delete(events, delete_index, axis=0)

for i in range(len(events)):
    sample_time_img_trigger = events[i][0]
    additional_samples = rt[i] * sampling_rate_eeg

    events[i][0] = sample_time_img_trigger + additional_samples


# dictionary containing information of the events
event_id = {'cNeu': 11, # Image trigger neutral condition 
            'cCon': 21, # Image trigger congruent condition  
            'cInc': 31, # Image trigger incongruent condition 
}

# tmin and tmax
tmin_tf, tmax_tf = -0.7, 0.7

# baseline
baseline = (0.3, 0.5)

picks = mne.pick_types(raw.info, eeg=True, eog=False)

# creating epocks
epochs_tf = mne.Epochs(raw, events, event_id, tmin_tf, tmax_tf, picks = picks, baseline=baseline, reject=None, preload=True, verbose = True)
epochs_tf = epochs_tf.resample(250)

epochs_tf.save("Stroop_mouse_EEG_data/epochs/epochs_RT_epo.fif", overwrite=True)