Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Split ICA into three steps #857

Closed
SophieHerbst opened this issue Feb 23, 2024 · 27 comments · Fixed by #865
Closed

ENH: Split ICA into three steps #857

SophieHerbst opened this issue Feb 23, 2024 · 27 comments · Fixed by #865

Comments

@SophieHerbst
Copy link
Collaborator

SophieHerbst commented Feb 23, 2024

I wanted to re-run the apply_ICA step after changing the thresholds for automatic component rejection in my config.

I had three surprises/ observations:

  • mismatch between documentation website and actual nme of the step: 07 versus 08
  • the output prints each run (raw), while I expected the ICA being applied to epochs given the order in the pipeline
  • nothing was computed (Computation unnecessary (cached) …)

@hoechenberger how do I rerun this step? If the other points are just a matter of changing the doc, I can do it with a few how-to pointers from you

(versions: latest main of mne, pipeline, and mne-bids)

(mne) sh254795@is152869:/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized$ mne_bids_pipeline --config=/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/code/config_sophie.py --steps=preprocessing/apply_ica
┌────────┬ Welcome aboard MNE-BIDS-Pipeline! 👋 ────────────────────────────────────────────────────────────────
│13:51:23│ 📝 Using configuration: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/code/config_sophie.py
└────────┴ 
┌────────┬ init/_01_init_derivatives_dir ───────────────────────────────────────────────────────────────────────
│13:51:23│ ✅ Output directories already exist …
└────────┴ done (6s)
┌────────┬ init/_02_find_empty_room ────────────────────────────────────────────────────────────────────────────
│13:51:29│ ✅ sub-155 run-01 Computation unnecessary (cached) …
└────────┴ done (2s)
┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────────────────────────────────────
│13:51:36│ ✅ sub-155 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-04 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-05 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-06 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-07 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-08 Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-rest Computation unnecessary (cached) …
│13:51:36│ ✅ sub-155 run-noise Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-01 Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-03 Computation unnecessary (cached) …
│13:51:41│ ✅ sub-155 run-02 Computation unnecessary (cached) …
└────────┴ done (12s)
@SophieHerbst SophieHerbst changed the title Confused about pipeline steps and re-run Pipeline stepnames and re-run Feb 23, 2024
@larsoner
Copy link
Member

This first cached line is for the epochs

┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────────────────────────────────────
│13:51:36│ ✅ sub-155 Computation unnecessary (cached) …

The thresholds affect fitting ICA, so you need to rerun that step and the applying step

@SophieHerbst
Copy link
Collaborator Author

Thanks for the reply @larsoner. I don't understand why the thresholds affect the fitting.
I meant those parameters:
ica_ctps_ecg_threshold = 0.05
ica_eog_threshold = 2.0

@hoechenberger
Copy link
Member

The thresholds affect fitting ICA, so you need to rerun that step and the applying step

Isn't that a bug then? Shouldn't we auto-rerun ICA fitting if config values related to the ICA step change?

I don't understand why the thresholds affect the fitting.

They don't, but fitting and ExG artifact detection are performed in the same step (run-ica), hence that step needs to be re-run. Maybe it should be split into two steps, a) fitting and b) artifact detection

@SophieHerbst
Copy link
Collaborator Author

I thought that the whole point of having two separate files for run and apply ICA was to run the ICA once and be able to adjust the thresholds for eog/ecg rejection without recomputing ICA

@larsoner
Copy link
Member

@larsoner larsoner reopened this Feb 23, 2024
@SophieHerbst
Copy link
Collaborator Author

that is how it was in 'the old pipeline': https://github.com/brainthemind/CogBrainDyn_MEG_Pipeline/blob/master/06a-apply_ica.py
Did it get lost?

@larsoner
Copy link
Member

larsoner commented Feb 23, 2024

Isn't that a bug then? Shouldn't we auto-rerun ICA fitting if config values related to the ICA step change?

To detect this in the general case this would require traversing the whole processing tree. In other words, if a user selects a single step, don't treat it like "just run this" but instead like a stopping step, "run everything up to this". Like if a user changed their filter settings but only asks to run source estimation, quite a few preceding steps actually need to be rerun.

@hoechenberger
Copy link
Member

I thought that the whole point of having two separate files for run and apply ICA was to run the ICA once and be able to adjust the thresholds for eog/ecg rejection without recomputing ICA

The report generated through run-ica is supposed to help you determine which components to remove in the next step (apply-ica)

To assist you, we run the automated artifact detection

The results end up together with all components in the ica+components report, e.g.
http://mne.tools/mne-bids-pipeline/stable/examples/ERP_CORE/sub-015_ses-ERN_task-ERN_proc-ica+components_report.html

The apply-ica step then looks into the ICA component CSV file to figure which components you would like removed

Like I said above, I think your point of separating the artifact detection step from the fitting step is fair. I think we should do, in three steps:

  1. fit ICA
  2. detect artifacts
  3. apply ICA

so you could fiddle with artifact detection settings (step 2) without having to re-fit ICA (step 1)

@hoechenberger
Copy link
Member

To detect this in the general case this would require traversing the whole processing tree. In other words, if a user selects a single step, don't treat it like "just run this" but instead like a stopping step, "run everything up to this".

So would would have been a viable solution for @SophieHerbst now?
Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

@hoechenberger
Copy link
Member

So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

We don't have a "please run everything up to this step" functionality right now, do we? Do you think we could implement this somehow?

@SophieHerbst
Copy link
Collaborator Author

And why is the run-ica step before make_epochs?
As I see from the code it creates epochs and runs ICA on them.

@hoechenberger
Copy link
Member

hoechenberger commented Feb 23, 2024

And why is the run-ica step before make_epochs? As I see from the code it creates epochs and runs ICA on them.

ICA creates an entirely separate set of epochs to allow for applying different filters before fitting (e.g., 1 Hz high-pass for ICA while you'll maybe want 0.1 Hz high-pass for your "actual" data)

@SophieHerbst
Copy link
Collaborator Author

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

@hoechenberger
Copy link
Member

hoechenberger commented Feb 23, 2024

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

Yes so it iterates over the raw files in order to create epochs

it's actually a good sign this caught your attention – Eric just recently changed the order of things to what it is now to highlight the fact that ICA is not fitted to your "regular" epochs

but it appears this needs better documentation and / or logging output

@SophieHerbst
Copy link
Collaborator Author

but it appears this needs better documentation and / or logging output

And the steps in the documentation need to be renamed accordingly

@larsoner
Copy link
Member

larsoner commented Feb 23, 2024

Like I said above, I think your point of separating the artifact detection step from the fitting step is fair. I think we should do, in three steps:

Yes this is doable. We could probably consider renaming the steps a bit so we don't have to keep renaming the steps that follow if possible (like with 05a1, 05a2 or something instead of 05a, 06a but only a 05a for SSP and no 06a at all)

I was just confused to see that it appears before epochs in the steps and that the output talks about the raw runs. But I get it now.

Yes so it iterates over the raw files in order to create epochs... it's actually a good sign this caught your attention...

Yes this is exactly why I reordered it, I initially also thought ICA was fit on the task epochs but it's not 🙂

So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire preprocessing pipeline and rely on caching to skip e.g. filtering etc?

Yes personally nowadays I always just mne_bids_pipeline config.py anytime I change anything and rely on caching to skip stuff. Data on SSDs this only has ~10s overhead for ~20 subjects data. Maybe it's not as viable on NFS-stored data or something -- @SophieHerbst can you give some information on how long it takes mne_bids_pipeline config.py to run on your data without changing anything, i.e., to be a no-op that just says "cached" for every step? Maybe the mtime checks are expensive or something.

We don't have a "please run everything up to this step" functionality right now, do we? Do you think we could implement this somehow?

Yeah it would be possible in principle to implement this, I'll put this in a new issue

And the steps in the documentation need to be renamed accordingly

@SophieHerbst what would you like to work on, if anything? Looks like we have

  1. Update docs
  2. Reorder/reconfigure ICA steps as per ENH: Split ICA into three steps #857 (comment)

I'm happy to do any of these

@SophieHerbst
Copy link
Collaborator Author

@larsoner I am happy to give both a try but likely will need some supervision to get it right. Also, I might not get to it before next week, so if you have time to start go ahead.
For the documentation, can you point me to how the website is updated?

@larsoner
Copy link
Member

Docs are all built from https://github.com/mne-tools/mne-bids-pipeline/tree/main/docs/source and also the contents of https://github.com/mne-tools/mne-bids-pipeline/blob/main/mne_bids_pipeline/_config.py , so depends on what you need to update. I would use git grep with a few words from the offending text to find the right file

@larsoner
Copy link
Member

Upon reflecting a bit... I think we should probably change both docs/source/features/steps.md and docs/source/features/overview.md to be generated using Python code. We can use the module __doc__ (ideally the title, but could be some other private var). @SophieHerbst okay for me to do this? That way you don't have to manually update anything and we'll automatically stay up-to-date...

@larsoner
Copy link
Member

Hah, looks like we already do that with steps.md. So we need to do it with features.md as well...

@larsoner larsoner changed the title Pipeline stepnames and re-run ENH: Split ICA into three steps Feb 23, 2024
@larsoner
Copy link
Member

Since the doc part is in #860 now (I hope), I re-titled this issue to be about splitting ICA into three steps:

  1. ICA.fit (which is done on task epochs, but they're separate from the ones eventually used because filtering differs)
  2. Artifact identification
  3. ICA.apply

@hoechenberger
Copy link
Member

ICA.fit (which is done on task epochs, but they're separate from the ones eventually used because filtering differs)

yes this is what I meant to say, sorry if there was any confusion!

@SophieHerbst
Copy link
Collaborator Author

@hoechenberger @larsoner do you think a new release could be done with the changes in ICA?
Then I could run with the stable version once more, hand over to the student, and hopefully not touch preprocessing for this dataset again.

@larsoner
Copy link
Member

I'd like to get #863 in, then I don't see why not 👍

@hoechenberger
Copy link
Member

somebody would need to implement the "ICA splitting" first 😅

@larsoner
Copy link
Member

I can tackle that this week I think, I don't think it'll be too bad

@SophieHerbst
Copy link
Collaborator Author

sorry, I didn't have the time to contribute to this. let me know I can test anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants