Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: Clean-up and add comparison/all implementations for feature extraction #18

Merged
merged 83 commits into from
Apr 28, 2022
Merged

MRG: Clean-up and add comparison/all implementations for feature extraction #18

merged 83 commits into from
Apr 28, 2022

Conversation

mscheltienne
Copy link
Member

@mscheltienne mscheltienne commented Apr 22, 2022

Hey! @jacobf18 @adam2392 @anandsaini024

Seeing all the good progress made, I wanted to finalize the first big step of this project: a completely tested and similar port of the network and features in Python. In this PR:

  • I replaced ALL the test files with small tests files that were generated from the EEGLAB sample dataset. I also added the MATLAB scripts to generate those files and replicate all the results.
  • Since we now know that the network has been perfectly ported, I focus on the feature extractions. I intend to go over all 3 and their implementations, in the coming couple of days (finally got some free time to work on this!).
  • And a bit of clean-up / re-structuring

Progress:

  • Topo
  • PSD
  • Autocorr

For each, I replicated as close as possible the MATLAB outputs and added the corresponding tests to compare the data arrays. Contrary to your code @jacobf18 that I used, I did not change the shapes and instead matched the shapes from MATLAB to simplify testing and comparison.

And finally, the ICLabel feature extraction function differs depending on:

  • raw or epoch
  • If raw, how long? (3 behaviors for [0, 1] seconds for [1, 5] seconds and for above 5 seconds)

So I added test files for each scenario:

  • -epo for epochs
  • -raw for raws (above 5 secs)
  • -short-raw for raws (1 to 5 secs)
  • -very-short-raw for raws (0 to 1 sec)

Side note:

  • it is possible to match the random number generator between numpy and MATLAB in most cases. So we should be able to get good tests also for the PSD feature.
  • trials represents epochs: 1 -> Raw; above 1 -> Epochs

@mscheltienne mscheltienne marked this pull request as ready for review April 27, 2022 14:39
@adam2392
Copy link
Member

Whoot! Nice @mscheltienne if it matches, that's exciting news. This is a large diff, so I'll take some time soon to do an in-depth review.

@jacobf18 and @anandsaini024 should sanity check as well.

I have to think a bit about this re-referencing and its impact. If you have any suggestion, please comment :)

Hmm so my opinion is that if it's happening in EEGLab, it's a "bug" but solely on their end. We simply want to match the EEGLab's first pipeline run (before saving the data to disc). Basically, I think it's fine.

A quick question, so if we re-reference to average multiple times it shouldn't change right? Perhaps misunderstanding, but why does icawinv change then?

@adam2392
Copy link
Member

I will review tonight if possible.

Some spelling mistakes and missing items to add to the MANIFEST file from the CI.

mne_icalabel/features.py:106: Additionnaly ==> Additionally
[13](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:13)
mne_icalabel/features.py:130: degress ==> degrees, digress
[14](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:14)
make[1]: *** [Makefile:105: codespell-error] Error 65
[15](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:15)
check-manifest --ignore .circleci*,doc,logo,.DS_Store
[16](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:16)
lists of files in version control and sdist do not match!
[17](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:17)
missing from sdist:
[18](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:18)
  mne_icalabel/tests/data/network/netICL.txt
[19](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:19)
suggested MANIFEST.in rules:
[20](https://github.com/jacobf18/iclabel-python/runs/6195445912?check_suite_focus=true#step:5:20)
  recursive-include mne_icalabel *.txt

Perhaps recursive-exclude mne_icalabel *.txt since I don't think we need the .txt files in the distribution.

@mscheltienne
Copy link
Member Author

mscheltienne commented Apr 27, 2022

I should have pinged after writing this comment 😅


That covers it entirely. I think it's as close as it gets, and this test suit will be a good base for any further improvement. The code here does not focus on efficiency but focuses on reproducing the MATLAB output. Any improvement that does not fail the test will maintain equivalence with MATLAB.


For the common average reference, as said the feature extraction starts with:

% assuming chanlocs are correct
if ~strcmp(EEG.ref, 'averef')
    [~, EEG] = evalc('pop_reref(EEG, [], ''exclude'', setdiff(1:EEG.nbchan, EEG.icachansind));');
end

which is buggy for the following reasons:

  • 'averef' is not the only value that seems to denote a CAR. For instance, the sample dataset's EEG.ref field is set to 'common'; and the EEG.ref field of a dataset outputted by pop_reref is set to 'average'. Issue related: Check if data is referenced to CAR sccn/ICLabel#28
  • pop_reref breaks the ICA solution. Even if you forget the I/O part (weird in itself), after calling pop_reref, the equality between EEG.icawinv and inv(EEG.icaweights * EEG.icasphere) is lost!

-> IMO, those are several bugs on the EEGLAB side, so let's just assume that we have a dataset that is referenced with a CAR. For all the tests, I've changed the field EEG.ref from 'common' to 'averef' to prevent the use of pop_reref.

Note that if you call pop_reref before running the ICA decomposition, pop_reref does not impact the empty ICA fields. Thus, after the ICA decomposition, the equality between EEG.icawinv and inv(EEG.icaweights * EEG.icasphere) is respected. BUT the ICA activation computed in MATLAB and in Python do not match well anymore. Instead of a 1e-5/6 tolerance, you drop to.. 0.1? 0.5? And I have no idea why. This could be caused either by EEGLAB, or by MNE in read_ica_eeglab or by the ICA activation computation in python. If someone wants to investigate, I can drop a code example.


The code here assumes that all the channels have been provided to the ICA. I did not implement the picks and channel selection (which was partially implemented in @jacobf18 code). Basically:

raw = ...
raw.info['bads'] = [...]
raw.pick_types(eeg=True, exclude='bads')
ica = mne.preprocessing.ICA(...)
ica.fit(raw)

-> Run ICLabel with this picked raw and this ICA that uses all the channels from raw.

It would be nice to test this on a couple of datasets where we also look at the component to make sure the output looks correct.

@mscheltienne
Copy link
Member Author

mscheltienne commented Apr 27, 2022

For the MANIFEST.in, we can fix it in a follow-up. It needs more clean-up than just this .txt, and also do we include the data for the test files or not?

Let me know if the previous comment answers your question for the CAR issue.

@mscheltienne mscheltienne changed the title WIP: Clean-up and add comparison/all implementations for feature extraction MRG: Clean-up and add comparison/all implementations for feature extraction Apr 27, 2022
assert len(set(list(subset_eeglab[0, :] - 1)).difference(set(list(subset)))) == 0


def test_eeg_rpsd_compute_psdmed():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't need this extra test given the below test_eeg_rpsd(?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me, the only added information was that if only test_eeg_rpsd fails, then the problem lies within _eeg_rpsd_format. But that is not worth an additional test and can be debugged on the fly if test_eeg_rpsd ever fails.

mne_icalabel/features.py Outdated Show resolved Hide resolved
Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome refactor and sprint @mscheltienne !

It looks great to me. Just a few minor changes to maybe make things easier to maintain down the line.

Perhaps the next step is to tidy up the example with a bunch of different simulations and also tests on real sample data from MNE-Python.

@adam2392 adam2392 mentioned this pull request Apr 28, 2022
@mscheltienne
Copy link
Member Author

I am getting an error with make black following the last commit. I am not familiar with it and the traceback is.. lacking. Any idea?

@adam2392
Copy link
Member

I am getting an error with make black following the last commit. I am not familiar with it and the traceback is.. lacking. Any idea?

Sorry the make recipe only "checks" for formatting errors.

You'll need to run "black ."

I think I can fix that in another PR to make the dev exp easier.

@mscheltienne
Copy link
Member Author

I am really exhausted today... why the hell am I typing make black... anyway done.

@adam2392
Copy link
Member

Sorry... just noticed that. I've previously set it up to just format it.

@adam2392
Copy link
Member

@jacobf18 and @anandsaini024 skim the changes to see if you spot any issues. Meanwhile ill merge this.

@adam2392 adam2392 merged commit c1eaedf into mne-tools:main Apr 28, 2022
@mscheltienne mscheltienne deleted the clean_up branch April 28, 2022 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants