<a href="https://colab.research.google.com/github/myielin/EEG_analysis_toolkit/blob/main/preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preprocessing

standard pipeline: set a generic electrode montage, set annotations from a CSV file, filter and exclude noisy components via ICA.

Download this file in File -> Download (upper left corner) to use it

### 1.
installs and imports

In [None]:
!pip install mne
!pip install mne-icalabel

In [None]:
import mne
import warnings
warnings.filterwarnings("ignore")

import numpy as np
from pandas import read_csv
from sklearn.preprocessing import OneHotEncoder as ohe
from numpy.lib.stride_tricks import sliding_window_view
import matplotlib.pyplot as plt
import mne_icalabel


### 2.
Define functions

In [None]:
def set_stnd_mon(data, mon='standard_1020'):
  mne.channels.make_standard_montage(mon)
  data.set_montage(mon)
  return data

In [None]:
def create_csv_annotations(fname, dur=2, desc="mark"):
  anns_csv = read_csv(fname).values.reshape(-1)
  ann_arr = np.array([int(i[3:5]) + int(i[:2])*60 for i in anns_csv])

  return mne.Annotations(ann_arr, dur, desc)


In [None]:
def preprocess(data,lfreq=1, hfreq=80, notch=60, ica=True):
  data = data.filter(l_freq=lfreq, h_freq=hfreq, picks='all')
  data = data.notch_filter(notch, picks='all')
  if ica:
    c = mne.preprocessing.ICA(n_components=len(data.ch_names), max_iter="auto", method="infomax", fit_params=dict(extended=True))
    c.fit(data, picks='all')

    components = mne_icalabel.label_components(data, c, method='iclabel')
    print(f"\n\nICA components: {components['labels']}\nCertainty: {components['y_pred_proba']}")
    # component removal will be carried on if the component is classified as an artifact with more than 90% probability
    artifact_indexes = [i for i in range(len(data.ch_names)) if components['y_pred_proba'][i] > 0.9 and components['labels'][i] != 'brain']
    c.exclude = artifact_indexes
    print("\nRemoved artifacts:: ", artifact_indexes)

  return data


### 3.
Execute the appropriate functions

First, you will have to set the data path (or use the standard data path from the repo) and basic information. Notice that ch_names have to be compatible with the standard montage that will be used. Comment the second line if the data already has its info set by default

In [None]:
p = "/content/drive/MyDrive/tcc/data/"
info = mne.create_info(ch_names = ['P4', 'P3', 'Fp2', 'Fp1'], sfreq = 200, ch_types='eeg')

Then, the data is loaded as a mne object. In this example, the raw data comes as a CSV file, so if your is of another file type you can delete the first line and use mne.io.read_raw() in the second one. See more in https://mne.tools/stable/generated/mne.io.read_raw.html#mne.io.read_raw

The last line adds annotations from a CSV file in the pattern mins:secs:ms (so an annotation like 01:04:00 means an event occured 1min and 4secs after the beggining of the recording). Refer to default parameters in the function definition above or remove this line if it doesen't apply to the context


In [None]:
raw_train = read_csv(p+"sampleEEG2.txt")[[' EXG Channel 0', ' EXG Channel 1', ' EXG Channel 2', ' EXG Channel 3']].values.T
mne_train = mne.io.RawArray(raw_train, info)

annotated = mne_train.set_annotations(create_csv_annotations(p+"sampleEVS2.csv", 1))

Creating RawArray with float64 data, n_channels=4, n_times=170020
    Range : 0 ... 170019 =      0.000 ...   850.095 secs
Ready.


For the preprocessing pipeline, first the standard electrode scheme is set. Remove this line if the data already has a montage. Then, the preprocess function applies notch, lowpass and highpass filters, followed by removal of noisy components. Refer to the default parameters in the function definition above.

**Standard montage list:** These are the standard electrode positioning schemes from MNE. You can choose one and use as argument to the set_stnd_mon function. If no argument is passed the 10-20 scheme will be used.

`
'standard_1005', 'standard_1020', 'standard_alphabetic', 'standard_postfixed', 'standard_prefixed', 'standard_primed',
 'biosemi16', 'biosemi32',
 'biosemi64', 'biosemi128',
 'biosemi160', 'biosemi256',
 'easycap-M1', 'easycap-M10',
 'easycap-M43', 'EGI_256',
 'GSN-HydroCel-32', 'GSN-HydroCel-64_1.0',
 'GSN-HydroCel-65_1.0',
 'GSN-HydroCel-128',
 'GSN-HydroCel-129',
 'GSN-HydroCel-256',
 'GSN-HydroCel-257',
 'mgh60',
 'mgh70',
 'artinis-octamon',
 'artinis-brite23',
 'brainproducts-RNP-BA-128'
 `

In [None]:
train_processed = set_stnd_mon(mne_train, mon='standard_1020')
train_processed = preprocess(mne_train)

Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 80 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 80.00 Hz
- Upper transition bandwidth: 20.00 Hz (-6 dB cutoff frequency: 90.00 Hz)
- Filter length: 661 samples (3.305 s)

Filtering raw data in 1 contiguous segment
Setting up band-stop filter from 59 - 61 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandstop filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 59.35
- Lower transition bandwidth: 0.50 Hz (-6 dB cutoff frequency: 59.10 Hz)
- Upper passband edge

Finally, you can save the preprocessed data to use in practical analyses. The saving path is the same as provided for data loading, but it can be changed by updating the value of the variable p.

In [None]:
annotated.save(fname=p+"train1.fif", overwrite=True)

Writing /content/drive/MyDrive/tcc/data/train1.fif
Closing /content/drive/MyDrive/tcc/data/train1.fif
[done]


[PosixPath('/content/drive/MyDrive/tcc/data/train1.fif')]