Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue preprocessing multiple multi echo / multi band data with fmriprep update #2728

Closed
njr175 opened this issue Mar 3, 2022 · 14 comments · Fixed by #2898
Closed

Issue preprocessing multiple multi echo / multi band data with fmriprep update #2728

njr175 opened this issue Mar 3, 2022 · 14 comments · Fixed by #2898
Labels
Milestone

Comments

@njr175
Copy link

njr175 commented Mar 3, 2022

What happened?

We updated fMRIPrep to version 21.0.0. We have been using fMRIPrep for a few years now, and we have preprocessed multiple datasets with this pipeline. After making the upgrade, we wanted to re preprocess 2 of the datasets. We were able to re-preprocess the first no problem (NOT multi echo data). We went to preprocess the 2nd dataset - multiband/multiecho - and we can't preprocess more than 1 participant at a time. As I mentioned, these data have been preprocessed with fMRIPrep in the past no problem; I could run 10 at a time. (All work directories have been deleted, 'old' data have been moved out of the fMRIPrep path.) I am testing with 2-3 participants at a time, and have tried the following flags:
--low-mem
--mem-mb55000
--use-plugin plugin.yml (made a corresponding file - plugin: LegacyMultiProc
plugin_args: {maxtasksperchild: 1, memory_gb: 60, n_procs: 10, raise_insufficient: false}

Without the flags, and with the '--low-mem' flag, I get a 'broken process pull' and fMRIPrep doesn't finish. With the other 2 flags, I get a Node Name error: fmriprep_wf.single_subject_3156_wf.func_preproc_ses_1_task.... _ echo_1_wf.bold_t2smap_wf.t2nmap_node
(full error below)
I get this node error for each participant, different number of nodes.
If I run the participants separately, they run no problem.

What command did you use?

singularity run --cleanenv -B /data/MNA/data /data/MNA/fmriprep_21.0.0.simg /data/MNA/data/Nifti /data/MNA/data/Preprocessed participant --participant_label sub-3158 3159 3162 --ignore slicetiming sbref --fs-license-file /home/nicoler/license.txt

('original' command used. I also tried the various flags I outlined above)

What version of fMRIPrep are you running?

21.0.0

How are you running fMRIPrep?

Singularity

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

No

Please copy and paste any relevant log output.

When fMRIPrep doesn't finish/just hangs error is as follows: 
ile "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 437, in result
   return self.__get_result()
File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise         self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. 

When it runs, example of the node error: 
File: /data/MNA/data/Preprocessed/sub-3158/log/20220301-105737_4b639652-12b4-452d-9125-1d5c84795b5b/crash-20220301-135236-nicoler-t2smap_node-473aa1f0-7183-40bb-b5f9-29a4a3e11102.txt
Working Directory: /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/bold_t2smap_wf/t2smap_node
Inputs:

    args:
    echo_times: [0.0132, 0.03119, 0.04918]
    environ: {}
    fittype: curvefit
    in_files:
    mask_file:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/plugins/legacymultiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 516, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 741, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 428, in run
    runtime = self._run_interface(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 822, in _run_interface
self.raise_exception(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 749, in raise_exception
    raise RuntimeError(
RuntimeError: Command:
t2smap -d /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz -e 13.2 31.189999999999998 49.18 --mask /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/brainextraction_wf/_echoidx_0/masker/clipped_mask.nii.gz --fittype curvefit
Standard output:

Standard error:
INFO     t2smap:t2smap_workflow:229 Using output directory: /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/bold_t2smap_wf/t2smap_node
INFO     t2smap:t2smap_workflow:239 Loading input data: ['/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz', '/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz', '/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz']
INFO     t2smap:t2smap_workflow:255 Using user-defined mask
INFO     t2smap:t2smap_workflow:258 Computing adaptive T2* map
Killed
Return code: 137

Additional information / screenshots

No response

@njr175 njr175 added the bug label Mar 3, 2022
@effigies
Copy link
Member

effigies commented Mar 3, 2022

This could be an issue from using a more recent version of tedana. @tsalo could I bug you to take a look?

@effigies
Copy link
Member

effigies commented Mar 3, 2022

It may also be that we need unwarp_wf to output uncompressed NIfTI files when --low-mem is passed. It's possible that in previous versions the inputs to t2smap were uncompressed.

@tsalo
Copy link
Collaborator

tsalo commented Mar 3, 2022

It looks like tedana's memory usage is the problem here. tedana should be able to handle compressed or uncompressed NIfTI files equally well, since it just loads them into numpy arrays with nibabel. tedana does have a --lowmem option that could be passed from fMRIPrep when --low-mem is used, although I don't think it would have an effect on the problematic step here.

Two options I could see are:

  1. Within tedana, we could see if we can improve the memory usage of the curvefit option when --lowmem is used..
  2. Within fMRIPrep, set --fittype to loglin instead of curvefit when --low-mem is used. The log-linear T2* estimation method should be less accurate, but faster and less memory intensive.

@njr175
Copy link
Author

njr175 commented Mar 7, 2022

Is there anything I can do to help test, next steps, etc.?

Thanks so much for taking a look at this issue - obviously we don't want to have to preprocess an entire dataset one participant at a time!

@tsalo
Copy link
Collaborator

tsalo commented Mar 10, 2022

@njr175 I've opened an issue in the tedana repo (ME-ICA/tedana#856), but it will probably be a while before we can propagate any changes to an fMRIPrep release, since we'll need to (1) fix the problem on tedana's side, (2) make a new release, (3) pin the new version in fMRIPrep's requirements, and (4) make a new fMRIPrep release.

If you're able to build the Docker image from a fork, you could manually switch from nonlinear T2* estimation to the log-linear version. Basically, you would just need to switch the order here:

fittype = traits.Enum('curvefit', 'loglin',
argstr='--fittype %s',
position=4,
usedefault=True,
desc=('Desired fitting method: '
'"loglin" means that a linear model is fit '
'to the log of the data. '
'"curvefit" means that a more computationally '
'demanding monoexponential model is fit '
'to the raw data.'))

Instead of 'curvefit', 'loglin', which makes curvefit (the nonlinear approach) the default, you would have 'loglin', 'curvefit' so that the log-linear method is the default.

@njr175
Copy link
Author

njr175 commented Mar 21, 2022

Thanks for offering a solution while we wait for the fix, @tsalo. Do I need to include one of the flags I was using before (
--low-mem
--mem-mb55000
--use-plugin plugin.yml )

I switched the order as you indicated, and tried to re-run fMRIPrep (without any of the above flags), and I got the broken process pool error again. I was testing with 3 individuals.

ERROR
File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in getresult rasie self.exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was termindated abruptly while the future was running or pending

@njr175
Copy link
Author

njr175 commented Apr 21, 2022

Just wanted to send an update: We still have not been able to get the new version of fMRIPrep to work on more than 1 participant with multi echo data.
With the fix you mentioned, I tried with and without the 3 flags - still doesn't work.
To double check something didn't happen with our ubuntu server, I ran a handful of participants through an old version of fMRIPrep and they ran without error.

@tsalo
Copy link
Collaborator

tsalo commented Apr 25, 2022

I'm sorry for the delay in responding. If you look at the logs, do they show the t2smap command used?

@mateuszpawlik
Copy link

mateuszpawlik commented Jul 5, 2022

Yesterday I tried fmriprep 22.0.0rc3 on one subject (one session) and got the following error. It seems to me it's relevant but if it's not, please let me know and I delete my comment.

Error message:
Node: fmriprep_22_0_wf.single_subject_s003_wf.func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf.bold_t2smap_wf.t2smap_node
Working directory: /work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node

Node inputs:

args = <undefined>
echo_times = [0.0124, 0.03013, 0.04786, 0.06559]
environ = {}
fittype = curvefit
in_files = <undefined>
mask_file = <undefined>

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 524, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 642, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 750, in _run_command
    raise NodeExecutionError(
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node t2smap_node.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 454, in aggregate_outputs
    setattr(outputs, key, val)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
    self.error(objekt, name, str(value))
  File "/opt/conda/lib/python3.9/site-packages/traits/base_trait_handler.py", line 74, in error
    raise TraitError(
traits.trait_errors.TraitError: The 't2star_map' trait of a T2SMapOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node/T2starmap.nii.gz' <class 'str'> was specified.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 401, in run
    outputs = self.aggregate_outputs(runtime)
  File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 461, in aggregate_outputs
    raise FileNotFoundError(msg)
FileNotFoundError: No such file or directory '/work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node/T2starmap.nii.gz' for output 't2star_map' of a T2SMap interface

Command I executed:

singularity run --cleanenv --bind /mnt/dbgroup-share/mpawlik/data-ccns/anclang:/data --bind /mnt/dbgroup-share/mpawlik/scratch:/work --bind /home/mpawlik/bids/freesurfer_license.txt:/opt/freesurfer/license.txt fmriprep_22.0.0rc3 /data /data/derivatives/fmriprep participant --nprocs 16 --me-output-echos --fs-no-reconall --notrack --skip_bids_validation -v -w /work

Update: fmriprep 21.0.2 threw numpy.core._exceptions.MemoryError. This was most probably due to too many parallel jobs requesting too much memory.

Here's the error:
Node: fmriprep_wf.single_subject_s003_wf.func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf.bold_t2smap_wf.t2smap_node
Working directory: /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/bold_t2smap_wf/t2smap_node

Node inputs:

args = <undefined>
echo_times = [0.0124, 0.03013, 0.04786, 0.06559]
environ = {}
fittype = curvefit
in_files = <undefined>
mask_file = <undefined>

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 516, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 741, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 428, in run
    runtime = self._run_interface(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 822, in _run_interface
    self.raise_exception(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 749, in raise_exception
    raise RuntimeError(
RuntimeError: Command:
t2smap -d /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_3/merge/vol0000_unwarped_merged.nii.gz -e 12.4 30.13 47.86 65.58999999999999 --mask /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/brainextraction_wf/_echoidx_0/masker/clipped_mask.nii.gz --fittype curvefit
Standard output:

Standard error:
INFO     t2smap:t2smap_workflow:241 Using output directory: /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/bold_t2smap_wf/t2smap_node
INFO     t2smap:t2smap_workflow:251 Loading input data: ['/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_3/merge/vol0000_unwarped_merged.nii.gz']
INFO     t2smap:t2smap_workflow:267 Using user-defined mask
INFO     t2smap:t2smap_workflow:270 Computing adaptive T2* map
Traceback (most recent call last):
  File "/opt/conda/bin/t2smap", line 8, in <module>
    sys.exit(_main())
  File "/opt/conda/lib/python3.8/site-packages/tedana/workflows/t2smap.py", line 343, in _main
    t2smap_workflow(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/tedana/workflows/t2smap.py", line 272, in t2smap_workflow
    (t2s_limited, s0_limited, t2s_full, s0_full) = decay.fit_decay(
  File "/opt/conda/lib/python3.8/site-packages/tedana/decay.py", line 372, in fit_decay
    data_masked = data[mask, :, :]
numpy.core._exceptions.MemoryError: Unable to allocate 1.73 GiB for an array with shape (90714, 4, 640) and data type float64
Return code: 1

@zachladwig
Copy link

Hi @mateuszpawlik @tsalo

Did you ever get this working? I am having the same problem with t2smap but within a subject (many runs per subject)
Thanks

@mateuszpawlik
Copy link

We successfully executed fmriprep 21.0.2. Here's my command:

singularity run --cleanenv --bind /mnt/dbgroup-share/mpawlik/data-ccns/anclang:/data --bind /mnt/dbgroup-share/mpawlik/scratch:/work --bind /home/mpawlik/bids/freesurfer_license.txt:/opt/freesurfer/license.txt fmriprep_21.0.2.sif /data /data/derivatives/fmriprep participant --me-output-echos --nprocs 4 --resource-monitor --fs-no-reconall --notrack --skip-bids-validation --task-id vw -v -w /work

We haven't tried with newer versions.

@effigies
Copy link
Member

effigies commented Dec 2, 2022

So the problem here is that T2SMap is not annotated with memory consumption: https://github.com/nipreps/fmriprep/blob/master/fmriprep/workflows/bold/t2s.py#L101 which means we will assume it will use 100MiB. We can see from @mateuszpawlik's node that we seem to have voxels x echos x time points x 64-bit memory complexity.

@tsalo are there any extra coefficients or multiplicative factors that should be considered?

At the very least, taking the full image set size and throwing in a factor of two will be a better estimate than 100M.

@tsalo
Copy link
Collaborator

tsalo commented Dec 2, 2022

I ran a local test and the relevant code (the bit that seemed to be failing) was maxing out at ~2 GB on a 288 MB file with three echoes. The proposed change in #2898 came out to 1.73 GB, so it might need a bit more? The function also uses scipy.optimize.curve_fit, but I have no clue what the memory usage tends to be for that, sorry.

@effigies
Copy link
Member

effigies commented Dec 2, 2022

Okay, a factor of 2.5 would get pretty close to 2gb.

@effigies effigies added this to the 22.1.0 milestone Dec 7, 2022
effigies added a commit that referenced this issue Dec 11, 2022
## Changes proposed in this pull request

T2SMap was tagged with the default memory consumption (~100MB), when it
should be based on the size of the input files. This implements #1100 in
order to get a direct estimate of that, and then passes that along, with
an additional factor based on the number of echos and a safety factor of
2.

Fixes #1100.
Fixes #2728.

## Documentation that should be reviewed
<!--
Please summarize here the main changes to the documentation that the
reviewers should be aware of.
-->



<!--
Welcome, new contributors!

We ask you to read through the Contributing Guide:
https://github.com/nipreps/fmriprep/blob/master/CONTRIBUTING.md

These are guidelines intended to make communication easier by describing
a consistent process, but
don't worry if you don't get it everything exactly "right" on the first
try.

To boil it down, here are some highlights:

1) Consider starting a conversation in the issues list before submitting
a pull request. The discussion might save you a
   lot of time coding.
2) Please use descriptive prefixes in your pull request title, such as
"ENH:" for an enhancement or "FIX:" for a bug fix.
(See the Contributing guide for the full set.) And consider adding a
"WIP" tag for works-in-progress.
3) Any code you submit will be licensed under the same terms (Apache
License 2.0) as the rest of fMRIPrep.
4) We invite every contributor to add themselves to the `.zenodo.json`
file
(https://github.com/nipreps/fmriprep/blob/master/.zenodo.json), which
will result in your being listed as an author
at the next release. Please add yourself as the next-to-last entry, just
above Russ.

A pull request is a conversation. We may ask you to make some changes
before accepting your PR,
and likewise, you should feel free to ask us any questions you have.

-->
@effigies effigies mentioned this issue Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants