MRG: Clean-up and add comparison/all implementations for feature extraction #18

mscheltienne · 2022-04-22T15:42:48Z

Seeing all the good progress made, I wanted to finalize the first big step of this project: a completely tested and similar port of the network and features in Python. In this PR:

I replaced ALL the test files with small tests files that were generated from the EEGLAB sample dataset. I also added the MATLAB scripts to generate those files and replicate all the results.
Since we now know that the network has been perfectly ported, I focus on the feature extractions. I intend to go over all 3 and their implementations, in the coming couple of days (finally got some free time to work on this!).
And a bit of clean-up / re-structuring

Progress:

Topo
PSD
Autocorr

For each, I replicated as close as possible the MATLAB outputs and added the corresponding tests to compare the data arrays. Contrary to your code @jacobf18 that I used, I did not change the shapes and instead matched the shapes from MATLAB to simplify testing and comparison.

And finally, the ICLabel feature extraction function differs depending on:

raw or epoch
If raw, how long? (3 behaviors for [0, 1] seconds for [1, 5] seconds and for above 5 seconds)

So I added test files for each scenario:

-epo for epochs
-raw for raws (above 5 secs)
-short-raw for raws (1 to 5 secs)
-very-short-raw for raws (0 to 1 sec)

Side note:

it is possible to match the random number generator between numpy and MATLAB in most cases. So we should be able to get good tests also for the PSD feature.
trials represents epochs: 1 -> Raw; above 1 -> Epochs

…w smaller test files

adam2392 · 2022-04-27T14:41:45Z

Whoot! Nice @mscheltienne if it matches, that's exciting news. This is a large diff, so I'll take some time soon to do an in-depth review.

@jacobf18 and @anandsaini024 should sanity check as well.

I have to think a bit about this re-referencing and its impact. If you have any suggestion, please comment :)

Hmm so my opinion is that if it's happening in EEGLab, it's a "bug" but solely on their end. We simply want to match the EEGLab's first pipeline run (before saving the data to disc). Basically, I think it's fine.

A quick question, so if we re-reference to average multiple times it shouldn't change right? Perhaps misunderstanding, but why does icawinv change then?

adam2392 · 2022-04-27T14:44:16Z

I will review tonight if possible.

Some spelling mistakes and missing items to add to the MANIFEST file from the CI.

mne_icalabel/features.py:106: Additionnaly ==> Additionally
[13](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:13)
mne_icalabel/features.py:130: degress ==> degrees, digress
[14](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:14)
make[1]: *** [Makefile:105: codespell-error] Error 65
[15](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:15)
check-manifest --ignore .circleci*,doc,logo,.DS_Store
[16](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:16)
lists of files in version control and sdist do not match!
[17](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:17)
missing from sdist:
[18](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:18)
  mne_icalabel/tests/data/network/netICL.txt
[19](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:19)
suggested MANIFEST.in rules:
[20](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:20)
  recursive-include mne_icalabel *.txt

Perhaps recursive-exclude mne_icalabel *.txt since I don't think we need the .txt files in the distribution.

mscheltienne · 2022-04-27T14:58:08Z

I should have pinged after writing this comment 😅

That covers it entirely. I think it's as close as it gets, and this test suit will be a good base for any further improvement. The code here does not focus on efficiency but focuses on reproducing the MATLAB output. Any improvement that does not fail the test will maintain equivalence with MATLAB.

For the common average reference, as said the feature extraction starts with:

% assuming chanlocs are correct
if ~strcmp(EEG.ref, 'averef')
    [~, EEG] = evalc('pop_reref(EEG, [], ''exclude'', setdiff(1:EEG.nbchan, EEG.icachansind));');
end

which is buggy for the following reasons:

'averef' is not the only value that seems to denote a CAR. For instance, the sample dataset's EEG.ref field is set to 'common'; and the EEG.ref field of a dataset outputted by pop_reref is set to 'average'. Issue related: Check if data is referenced to CAR sccn/ICLabel#28
pop_reref breaks the ICA solution. Even if you forget the I/O part (weird in itself), after calling pop_reref, the equality between EEG.icawinv and inv(EEG.icaweights * EEG.icasphere) is lost!

-> IMO, those are several bugs on the EEGLAB side, so let's just assume that we have a dataset that is referenced with a CAR. For all the tests, I've changed the field EEG.ref from 'common' to 'averef' to prevent the use of pop_reref.

Note that if you call pop_reref before running the ICA decomposition, pop_reref does not impact the empty ICA fields. Thus, after the ICA decomposition, the equality between EEG.icawinv and inv(EEG.icaweights * EEG.icasphere) is respected. BUT the ICA activation computed in MATLAB and in Python do not match well anymore. Instead of a 1e-5/6 tolerance, you drop to.. 0.1? 0.5? And I have no idea why. This could be caused either by EEGLAB, or by MNE in read_ica_eeglab or by the ICA activation computation in python. If someone wants to investigate, I can drop a code example.

The code here assumes that all the channels have been provided to the ICA. I did not implement the picks and channel selection (which was partially implemented in @jacobf18 code). Basically:

raw = ...
raw.info['bads'] = [...]
raw.pick_types(eeg=True, exclude='bads')
ica = mne.preprocessing.ICA(...)
ica.fit(raw)

-> Run ICLabel with this picked raw and this ICA that uses all the channels from raw.

It would be nice to test this on a couple of datasets where we also look at the component to make sure the output looks correct.

mscheltienne · 2022-04-27T15:02:54Z

For the MANIFEST.in, we can fix it in a follow-up. It needs more clean-up than just this .txt, and also do we include the data for the test files or not?

Let me know if the previous comment answers your question for the CAR issue.

mne_icalabel/tests/test_features.py

adam2392 · 2022-04-28T15:24:55Z

mne_icalabel/tests/test_features.py

+    assert len(set(list(subset_eeglab[0, :] - 1)).difference(set(list(subset)))) == 0
+
+
+def test_eeg_rpsd_compute_psdmed():


It seems we don't need this extra test given the below test_eeg_rpsd(?)

Fine by me, the only added information was that if only test_eeg_rpsd fails, then the problem lies within _eeg_rpsd_format. But that is not worth an additional test and can be debugged on the fly if test_eeg_rpsd ever fails.

mne_icalabel/tests/test_features.py

mne_icalabel/features.py

adam2392

Awesome refactor and sprint @mscheltienne !

It looks great to me. Just a few minor changes to maybe make things easier to maintain down the line.

Perhaps the next step is to tidy up the example with a bunch of different simulations and also tests on real sample data from MNE-Python.

Co-authored-by: Adam Li <adam2392@gmail.com>

mscheltienne · 2022-04-28T15:45:46Z

I am getting an error with make black following the last commit. I am not familiar with it and the traceback is.. lacking. Any idea?

adam2392 · 2022-04-28T15:47:58Z

I am getting an error with make black following the last commit. I am not familiar with it and the traceback is.. lacking. Any idea?

Sorry the make recipe only "checks" for formatting errors.

You'll need to run "black ."

I think I can fix that in another PR to make the dev exp easier.

mscheltienne · 2022-04-28T15:50:33Z

I am really exhausted today... why the hell am I typing make black... anyway done.

adam2392 · 2022-04-28T15:52:33Z

Sorry... just noticed that. I've previously set it up to just format it.

adam2392 · 2022-04-28T15:59:20Z

@jacobf18 and @anandsaini024 skim the changes to see if you spot any issues. Meanwhile ill merge this.

mscheltienne and others added 30 commits April 21, 2022 13:37

clean gitignore

d0dc772

remove large sample fif file

db1809b

add sample dataset processed in MATLAB from EEGLAB sample dataset

3f5ecf9

better

c0938d4

clean-up test_iclabel_net.py and remove associated data files, add ne…

72af249

…w smaller test files

remove unused imports

33ff72d

remove old

be1e130

fix wrong files

d46cd2d

add sha1 hash

eb49bd0

improve notes

d6fa064

improvements

245fd4a

simpler

d6e04c9

add retrieve_eeglab_icawinv

205eb02

add icaact.mat file

b9bacc8

add compute icaact

406f060

update docstring

0a00a4f

run black

5195960

clean-up data

2bc80c8

fix paths and add epoch support

7a561e9

fix typos

83a5427

better order

acd02c8

update test

09b33e9

add eeglab sample dataset and simplify test datasets description

3a81de4

regen files and fix paths

1305a8c

fix missed

acfa756

fix ica activation

b775a2a

add structure

8c0094f

add short raw to test autocorr

7cc7423

add autocorr for short raw

31bfb82

add test for very short raws (less than 1 sec)

75031ad

mscheltienne and others added 4 commits April 27, 2022 15:53

add formatted feature files

34e5497

add test for the entire pipeline

93cf051

add sha checksum

0de6140

run black

888a3e1

mscheltienne marked this pull request as ready for review April 27, 2022 14:39

mscheltienne requested a review from adam2392 April 27, 2022 14:40

fix spelling

5435f90

mscheltienne changed the title ~~WIP: Clean-up and add comparison/all implementations for feature extraction~~ MRG: Clean-up and add comparison/all implementations for feature extraction Apr 27, 2022

adam2392 reviewed Apr 28, 2022

View reviewed changes

mne_icalabel/tests/test_features.py Outdated Show resolved Hide resolved

adam2392 reviewed Apr 28, 2022

View reviewed changes

mne_icalabel/tests/test_features.py Outdated Show resolved Hide resolved

adam2392 reviewed Apr 28, 2022

View reviewed changes

mne_icalabel/features.py Outdated Show resolved Hide resolved

adam2392 approved these changes Apr 28, 2022

View reviewed changes

adam2392 mentioned this pull request Apr 28, 2022

Tidy Manifest #20

Closed

mscheltienne and others added 3 commits April 28, 2022 17:36

move _next_power_of_2 to utils

db969cb

remove intermediate psdmed test

ee70189

Update mne_icalabel/tests/test_features.py

2315f0e

Co-authored-by: Adam Li <adam2392@gmail.com>

adam2392 mentioned this pull request Apr 28, 2022

Discussion about the API for mne_icalabel and mne #19

Open

run black

2aa63ca

adam2392 merged commit c1eaedf into mne-tools:main Apr 28, 2022

mscheltienne deleted the clean_up branch April 28, 2022 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG: Clean-up and add comparison/all implementations for feature extraction #18

MRG: Clean-up and add comparison/all implementations for feature extraction #18

mscheltienne commented Apr 22, 2022 •

edited

Loading

adam2392 commented Apr 27, 2022

adam2392 commented Apr 27, 2022

mscheltienne commented Apr 27, 2022 •

edited

Loading

mscheltienne commented Apr 27, 2022 •

edited

Loading

adam2392 Apr 28, 2022

mscheltienne Apr 28, 2022

adam2392 left a comment •

edited

Loading

mscheltienne commented Apr 28, 2022

adam2392 commented Apr 28, 2022

mscheltienne commented Apr 28, 2022

adam2392 commented Apr 28, 2022

adam2392 commented Apr 28, 2022

		assert len(set(list(subset_eeglab[0, :] - 1)).difference(set(list(subset)))) == 0


		def test_eeg_rpsd_compute_psdmed():

MRG: Clean-up and add comparison/all implementations for feature extraction #18

MRG: Clean-up and add comparison/all implementations for feature extraction #18

Conversation

mscheltienne commented Apr 22, 2022 • edited Loading

adam2392 commented Apr 27, 2022

adam2392 commented Apr 27, 2022

mscheltienne commented Apr 27, 2022 • edited Loading

mscheltienne commented Apr 27, 2022 • edited Loading

adam2392 Apr 28, 2022

Choose a reason for hiding this comment

mscheltienne Apr 28, 2022

Choose a reason for hiding this comment

adam2392 left a comment • edited Loading

Choose a reason for hiding this comment

mscheltienne commented Apr 28, 2022

adam2392 commented Apr 28, 2022

mscheltienne commented Apr 28, 2022

adam2392 commented Apr 28, 2022

adam2392 commented Apr 28, 2022

mscheltienne commented Apr 22, 2022 •

edited

Loading

mscheltienne commented Apr 27, 2022 •

edited

Loading

mscheltienne commented Apr 27, 2022 •

edited

Loading

adam2392 left a comment •

edited

Loading