Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run-noise Output file hash mismatch for _task-noise_scores.json #876

Closed
SophieHerbst opened this issue Mar 7, 2024 · 11 comments · Fixed by #890
Closed

run-noise Output file hash mismatch for _task-noise_scores.json #876

SophieHerbst opened this issue Mar 7, 2024 · 11 comments · Fixed by #890
Labels
bug Something isn't working

Comments

@SophieHerbst
Copy link
Collaborator

I just finished a complete pipeline run (1.6), and now wanted to improve ica_cleaning.
The only parameter I changed is ica_ctps_ecg_threshold, so I did not expect any steps before that being rerun, but I receive:

│11:31:44│ 🚫 sub-215 run-noise Output file hash mismatch for /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-emptyroom/ses-19230318/meg/sub-emptyroom_ses-19230318_task-noise_scores.json, will recompute …

This takes a lot more time and happens for every participant.

I never observed this behavior before.
In the new complete run of the pipeline, I started using
find_flat_channels_meg = True
find_noisy_channels_meg = True
Do these modify the empty room information in a later step, which triggers the re-run?

Happy about any insights on whether it is possible to avoid the re-run.

@hoechenberger
Copy link
Member

hoechenberger commented Mar 7, 2024

In the new complete run of the pipeline, I started using find_flat_channels_meg = True find_noisy_channels_meg = True Do these modify the empty room information in a later step, which triggers the re-run?

Without looking at the documentation or code, I would say yes, because this can change the information about which channels are to be marked as bad before running Maxwell-filter… and we try to keep the bad channels in sync between experimental runs and empty-room recordings

@SophieHerbst
Copy link
Collaborator Author

Hm ok. So no way to prevent the lengthy recomputation?

@hoechenberger
Copy link
Member

hoechenberger commented Mar 7, 2024

Ah wait. You first finished a complete pipeline run, then adjusted the ECG threshold, and when you re-run now, some earlier step is being re-run? Which one is that, preprocessing/_01_data_quality? That should not happen, no. And it only appears for the empty-room recording??

@SophieHerbst
Copy link
Collaborator Author

yep, it happens in preprocessing/_01_data_quality

@SophieHerbst
Copy link
Collaborator Author

and only for empty room, yes
also, it happens only the first time, when I re-rerun it, it does not happen anymore

@hoechenberger
Copy link
Member

this shouldn't happen… I don't have time to reproduce or look into this now, though, sorry

@SophieHerbst
Copy link
Collaborator Author

No problem, I just wait for it to be finished once, I wouldn't want to use the development version anyways for this project. But it would be good to fix it in the future.

@larsoner
Copy link
Member

larsoner commented Mar 7, 2024

Can you upload one subject's raw bids data plus your config.py? I can look

@hoechenberger hoechenberger added the bug Something isn't working label Mar 11, 2024
@larsoner
Copy link
Member

@SophieHerbst given this is an issue with the empty-room data can you upload sub-emptyroom/ses-19230318 (not the derivatives one but the bids_root / original one)?

@larsoner
Copy link
Member

Okay I think I see how this can happen. If two subjects A and B match to the same empty room recording you can run the bad channel finding for that file twice, first for A then for B (assuming n_jobs=1). Then when you re-run the pipeline, a problem will be detected with the output file modified time, because both A and B will have written e.g. :

$ ls -l ~/mne_data/derivatives/mne-bids-pipeline/ds000117/sub-emptyroom/ses-20090409/meg/
total 176
-rw-rw-r-- 1 larsoner larsoner     12 Mar 14 15:15 sub-emptyroom_ses-20090409_task-noise_bads.tsv
-rw-rw-r-- 1 larsoner larsoner 174558 Mar 14 15:15 sub-emptyroom_ses-20090409_task-noise_scores.json

Although it will cause redundant calculations, the cleanest solution here is probably to save the _bads.tsv in subject A and B's derivatives folders separately. This is what ends up happening in the maxwell filter step anyway, since it can use different sets of bads for the two subjects.

@SophieHerbst
Copy link
Collaborator Author

sorry, I was completely offline for some days. will try the fixes now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants