Discussion of correctness of IClabel-python vs EEGLab #5

adam2392 · 2022-03-04T18:24:32Z

@mscheltienne reposting here as an issue to begin discussion again.

Quick update: our intern should have started on the 1st of March. But he did not yet get his work permit (mandatory in Switzerland), thus his start has been delayed and we hope he will be on board next week.

I finally took the time to test it! It looks very promising but seems to favor a lot 'Other'.
Am I correct in assuming that the classes outputted are in the order:

classes = ['Brain', 'Muscle', 'Eye', 'Heart', 'Line Noise', 'Channel Noise', 'Other']

I ran it on this file quickly:

from matplotlib import pyplot as plt
import mne
import numpy as np

from mne_icalabel.ica_label import ica_eeg_features
from mne_icalabel.ica_net import run_iclabel

#%% Load
raw = mne.io.read_raw_fif('ica-test-raw.fif', preload=True)

#%% ICA
ica = mne.preprocessing.ICA(method='picard', max_iter='auto')
ica.fit(raw, picks='eeg')

#%% Features
features = ica_eeg_features(raw, ica)
topo = features[0].astype(np.float32)
psds = features[1].astype(np.float32)
autocorr = features[2].astype(np.float32)
labels = run_iclabel(topo, psds, autocorr)

classes = ['Brain', 'Muscle', 'Eye', 'Heart', 'Line Noise', 'Channel Noise', 
           'Other']

#%% Plot
f, axes = plt.subplots(1, 7, sharex=True, sharey=True, figsize=(20, 5))
for k, ax in enumerate(axes):
    ax.bar(np.arange(0, labels.shape[0]), labels[:, k])
    ax.set_title(classes[k])

for i, label in enumerate(labels):
    k = np.argmax(label)
    axes[k].bar(i, label[k], color='crimson')
    
f.tight_layout()

#%% ICA plots
ica.plot_components()
ica.plot_sources(raw)

Eye and heartbeat are correct. The last time we talked, you told me it failed at classifying even simple blinks, did you find what caused this?

Originally posted by @mscheltienne in #4 (comment)

The text was updated successfully, but these errors were encountered:

adam2392 · 2022-03-04T18:28:36Z

@mscheltienne reposting here as an issue to begin discussion again.
Quick update: our intern should have started on the 1st of March. But he did not yet get his work permit (mandatory in Switzerland), thus his start has been delayed and we hope he will be on board next week.

Okay let us know when they are on board and we can schedule another time to chat.

I finally took the time to test it! It looks very promising but seems to favor a lot 'Other'. Am I correct in assuming that the classes outputted are in the order:
classes = ['Brain', 'Muscle', 'Eye', 'Heart', 'Line Noise', 'Channel Noise', 'Other']

Yes this order is correct.

So last time @jacobf18 and I spoke, we identified this issue as well. There are three main components to the ICLabel pipeline:

generate features from data (features)
format the features to be fed into the network (formatting)
actually run the data through the network (network)

So according to Jacob, he verified the network weights match that of EEGlab by running through various examples and confirming the outputs match numerically. However, a unit test that verifies this would be nice.

He also manually tested the formatting part.

We unit tested the generation of features on simulation data + raw loaded EEGLab data and all are matching numerically with EEGlab except for rpsd, which has a random component to it, which makes it difficult to match up w/ EEGlab.

I believe the differences in output can stem from any of these three parts, so we need to really be systematic in our testing of each of the three components of the pipeline against EEGLab in order to pinpoint exactly where are the differences.

Eye and heartbeat are correct. The last time we talked, you told me it failed at classifying even simple blinks, did you find what caused this?

Yes that was an incorrect specification of the sampling rate, which I fixed.

mscheltienne · 2022-03-08T15:08:44Z

Alright, I think the first task for @anandsaini024 will be to run the network on Python and MATLAB, compare the outputs, familiarize himself with the code. He can also work on the unit test to verify that the network weights match and to verify the formatting part (if you don't beat him to it 😉). As you said, we have to be very systematic in our testing approach and we have to compare MATLAB and Python output at every stage. For rpsd, I wonder if there is a way to use the same random generator between both implementations, but let's put that one aside for now.

At the moment, he is still waiting for his work permit, but he will pick up his laptop and a MATLAB license this week or at the beginning of next week.

adam2392 · 2022-03-23T14:19:22Z

Notes 3/23/22: cc: @jacobf18

Short term goals (to aim to complete by summer)

Our short term goal is to exactly replicate all aspects of EEGLab as possible. This would result in v0.1 of mne-icalabel, which we could submit a short paper to JOSS for generating a DOI for this initial version.

It is possible again that the two runs through Matlab/Python do not match. For example, a unit test for network weights, feature generation and data pipeline would be enough to convince us that all differences in the two networks are solely from a "randomness" injected in the rpsd feature sets.

Anand to work on a unit test of EEGLab's weights vs Pytorch weights that Jacob ported under assets/
API design TODO: should be compatible with MNE-python.

Longer term goals

The longer term goal is to really improve the IClabeler. This would involve two efforts: improving the model and improving the benchmarking. The model needs to be benchmarked, so the benchmark is the bulk of the effort. The benchmarking needs to account for real-world variability and requires high-quality training/testing data. The result of this would be v0.2+ of mne-icalabel and could result in a full-fledged study.

Future improvements of the network: propose other designs that are benchmarked on a diverse dataset. See point below.
Benchmarking: incorporate a suite of montages, recording systems, number of electrodes to determine the true "performance" of an ICLabel labeler. It's unclear from the original publication. @mscheltienne can provide 1 subject per system (up to say 20 subjects). Note that all these datasets would be initially "Unlabeled"

Datasets available: NK (I think 1020), EGI, ...

Other "semi-labeled" datasets could come from Alexandre G. from https://swipe4ica.github.io/#/.

Reference:

Adding automatic ICA labeling capability ported from EEGLab mne-python#9206

adam2392 · 2022-04-05T02:31:59Z

@agramfort we are thinking of leveraging https://swipe4ica.github.io/#/ to build out an independent and bigger dataset to improve the ICLabel ported over from EEGLab.

I was thinking of having a hs student interested in working with me to contribute to this.

How easy is it to get access to this dataset from the web app?
Is it ready for usage?

@mscheltienne in addition, you mentioned you have some external data. How difficult would it be to convert that data to BIDS and then we could have someone just run through all the data (i.e. this hs student) and label the IC components by hand? The main work on me/us would be we need to set up an almost-tutorial like ipython notebook for him to understand and sort of conceptualize what he's doing.

agramfort · 2022-04-05T06:14:42Z

@agramfort <https://github.com/agramfort> we are thinking of leveraging https://swipe4ica.github.io/#/ to build out an independent and bigger dataset to improve the ICLabel ported over from EEGLab. I was thinking of having a hs student interested in working with me to contribute to this. - How easy is it to get access to this dataset from the web app?

the EEG data is not public but the rendered images from the app are the source code is here https://github.com/Swipe4ICA/swipe4ica.github.io Message ID: ***@***.***>

…

On Tue, Apr 5, 2022 at 4:32 AM Adam Li ***@***.***> wrote: @agramfort <https://github.com/agramfort> we are thinking of leveraging https://swipe4ica.github.io/#/ to build out an independent and bigger dataset to improve the ICLabel ported over from EEGLab. I was thinking of having a hs student interested in working with me to contribute to this. - How easy is it to get access to this dataset from the web app? - Is it ready for usage? @mscheltienne <https://github.com/mscheltienne> in addition, you mentioned you have some external data. How difficult would it be to convert that data to BIDS and then we could have someone just run through all the data (i.e. this hs student) and label the IC components by hand? The main work on me/us would be we need to set up an almost-tutorial like ipython notebook for him to understand and sort of conceptualize what he's doing. — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABHKHALARU2NI4WQ3DXDZ3VDOQ2XANCNFSM5P6H4KEQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mscheltienne · 2022-04-05T08:45:21Z

All our datasets have to be converted to a common format anyway, so let's go with BIDS. For the ANT Neuro and EGI datasets I can provide, I don't think it will be too much work to convert them to BIDS and to save the ICA decomposition in a BIDS format.

adam2392 · 2022-04-06T14:30:13Z

Note for @adam2392 I need to push up the datafiles under tests/data and modify the .gitignore

adam2392 · 2022-04-06T17:58:50Z

FYI: @mscheltienne and @anandsaini024 #11

adam2392 · 2022-04-13T03:51:53Z

FYI @jacobf18 and I talked briefly today. He found an issue that he'll push up a PR for in the autocorrelation function.

adam2392 · 2022-04-30T15:40:07Z

Now that this is "correct", should we release v0.1 on pypi?

@mscheltienne

mscheltienne · 2022-04-30T16:18:34Z

I guess we can. Just one question, do we want to be backward compatible/use deprecation cycle from release 0.1 or can we keep some flexibility until a later release, e.g. when moved to mne-tools?

I'm thinking about the output of the main label_components function, currently, an array of shape (n_components, n_classes), which is general and can be applied to any model; but might not be super user friendly, since you still have to figure out which component is in which class. Although I am not favorable either to a unique method, e.g. a hard threshold, to attribute a component to a class.

adam2392 · 2022-04-30T16:25:52Z

Fair point. Let's finalize this API issue first before moving to v0.1.

Do you know where EEGLab documents the output order of the predicted class probabilities? I think the one I wrote in the examples is right, but it would be nice to double check this and have a reference to something.

Re how to output:

We can perhaps structure it like a sklearn Transformer, since it just transforms the raw data
We can have a kwarg to either return predicted probabilities or the argmax and string labels?
Other ideas?

mscheltienne · 2022-04-30T16:29:32Z

Does this count as documentation? It's probably where I found it the first time..
https://github.com/sccn/ICLabel/blob/e8abc99e0c371ff49eff115cf7955fafc7f7969a/iclabel.m#L60-L62

I'll have a look at the Transformer, I am not familiar with it.

adam2392 · 2022-04-30T16:38:28Z

Does this count as documentation? It's probably where I found it the first time..

https://github.com/sccn/ICLabel/blob/e8abc99e0c371ff49eff115cf7955fafc7f7969a/iclabel.m#L60-L62

I'll have a look at the Transformer, I am not familiar with it.

Ah yes that counts :p

mscheltienne mentioned this issue Mar 17, 2022

Installation of mne_icalabel #6

Closed

adam2392 mentioned this issue Apr 5, 2022

BIDSification of Winkler et al. dataset and other datasets for training/validation of ICA labeling models #8

Open

adam2392 mentioned this issue May 1, 2022

[ENH] Moredocs and experiment with api implementation #31

Merged

6 tasks

mscheltienne closed this as completed May 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion of correctness of IClabel-python vs EEGLab #5

Discussion of correctness of IClabel-python vs EEGLab #5

adam2392 commented Mar 4, 2022

adam2392 commented Mar 4, 2022

mscheltienne commented Mar 8, 2022 •

edited

Loading

adam2392 commented Mar 23, 2022 •

edited

Loading

adam2392 commented Apr 5, 2022

agramfort commented Apr 5, 2022 via email

mscheltienne commented Apr 5, 2022

adam2392 commented Apr 6, 2022

adam2392 commented Apr 6, 2022

adam2392 commented Apr 13, 2022

adam2392 commented Apr 30, 2022

mscheltienne commented Apr 30, 2022

adam2392 commented Apr 30, 2022

mscheltienne commented Apr 30, 2022

adam2392 commented Apr 30, 2022

Discussion of correctness of IClabel-python vs EEGLab #5

Discussion of correctness of IClabel-python vs EEGLab #5

Comments

adam2392 commented Mar 4, 2022

adam2392 commented Mar 4, 2022

mscheltienne commented Mar 8, 2022 • edited Loading

adam2392 commented Mar 23, 2022 • edited Loading

Short term goals (to aim to complete by summer)

Longer term goals

adam2392 commented Apr 5, 2022

agramfort commented Apr 5, 2022 via email

mscheltienne commented Apr 5, 2022

adam2392 commented Apr 6, 2022

adam2392 commented Apr 6, 2022

adam2392 commented Apr 13, 2022

adam2392 commented Apr 30, 2022

mscheltienne commented Apr 30, 2022

adam2392 commented Apr 30, 2022

mscheltienne commented Apr 30, 2022

adam2392 commented Apr 30, 2022

mscheltienne commented Mar 8, 2022 •

edited

Loading

adam2392 commented Mar 23, 2022 •

edited

Loading