Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singularity "No such file or directory" issue #780

Closed
celstark opened this issue Oct 19, 2017 · 14 comments

Comments

@celstark
Copy link

@celstark celstark commented Oct 19, 2017

I'm a bit stumped here.

  • I have a local cluster that can run fmriprep just fine under Docker.
docker run -ti --rm \
    -v /tmp/mridata2/testdata/BIDS_singleclean:/data:ro \
    -v /tmp/mridata2/testdata/singularity_out:/out \
    poldracklab/fmriprep:latest \
    /data /out/out \
    participant --participant_label 02 --omp-nthreads 4
  • If I build the Singularity image and copy it to my HPC, I can run just fine
  • If I use that same Singularity image and run it on my local cluster, I fail
PYTHONPATH="" singularity run /tmp/mridata2/testdata/poldracklab_fmriprep_latest-2017-09-25-80b6c6d68bd3.img     /tmp/mridata2/testdata/BIDS_singleclean /tmp/mridata2/testdata/singularity_out     participant     --participant-label 02 --debug
  • I've tried unsetting numerous environmental variables -- anything I could see that might be linked to this.
  • Both systems are running Singularity 2.3.1

Any thoughts?

Log:

171019-22:55:50,210 cli WARNING:
	 Captured warning (<class 'UserWarning'>): [Errno 2] No such file or directory.  joblib will operate in serial mode
171019-22:55:50,227 cli WARNING:
	 Captured warning (<class 'UserWarning'>): [Errno 2] No such file or directory.  joblib will operate in serial mode
171019-22:55:51,270 cli INFO:
	 
Running fMRIPREP version 1.0.0-rc5:
  * Participant list: ['02'].
  * Run identifier: 20171019-225551_219905f7-475a-4d55-aca0-59d9488828f8.

171019-22:55:52,393 workflow INFO:
	 Creating bold processing workflow for "/tmp/mridata2/testdata/BIDS_singleclean/sub-02/func/sub-02_task-4min_bold.nii".
171019-22:55:52,638 workflow INFO:
	 Slice-timing correction will be included.
171019-22:55:53,79 workflow INFO:
	 Fieldmap estimation: type "epi" found
171019-22:55:53,964 workflow INFO:
	 Creating BOLD surface-sampling workflow.
171019-22:55:55,488 workflow INFO:
	 Creating bold processing workflow for "/tmp/mridata2/testdata/BIDS_singleclean/sub-02/func/sub-02_task-5min_bold.nii".
171019-22:55:55,669 workflow INFO:
	 Slice-timing correction will be included.
171019-22:55:56,116 workflow INFO:
	 Fieldmap estimation: type "epi" found
171019-22:55:56,871 workflow INFO:
	 Creating BOLD surface-sampling workflow.
Traceback (most recent call last):
  File "/usr/local/miniconda/bin/fmriprep", line 11, in <module>
    load_entry_point('fmriprep==1.0.0rc5', 'console_scripts', 'fmriprep')()
  File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/cli/run.py", line 207, in main
    create_workflow(opts)
  File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/cli/run.py", line 322, in create_workflow
    fmriprep_wf.run(**plugin_settings)
  File "/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/engine/workflows.py", line 568, in run
    runner = plugin_mod(plugin_args=plugin_args)
  File "/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/nipype/pipeline/plugins/multiproc.py", line 161, in __init__
    self.pool = NonDaemonPool(processes=self.processors)
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/pool.py", line 150, in __init__
    self._setup_queues()
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/pool.py", line 243, in _setup_queues
    self._inqueue = self._ctx.SimpleQueue()
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/context.py", line 112, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/queues.py", line 324, in __init__
    self._rlock = ctx.Lock()
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/context.py", line 67, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/synchronize.py", line 163, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/local/miniconda/lib/python3.6/multiprocessing/synchronize.py", line 60, in __init__
    unlink_now)
FileNotFoundError: [Errno 2] No such file or directory
@chrisgorgo

This comment has been minimized.

Copy link
Contributor

@chrisgorgo chrisgorgo commented Oct 19, 2017

Could you try setting the working directory explicitly (using -w) to somewhere in \tmp?

@oesteban This reminds me our discussion - we should revisit the idea of setting \tmp as a default working directory.

@oesteban

This comment has been minimized.

Copy link
Contributor

@oesteban oesteban commented Oct 19, 2017

@chrisfilo it seems that the problem is when multiprocessing tries to write to /dev/shm the semaphores for multiproc.

@celstark , please confirm that /dev/shm is writable. Additionally, you can try with --n_procs 1 so that the Linear runner is used (and you'll have trouble somewhere else, because I bet some other standard operations of python write to /dev/shm).

@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Oct 19, 2017

-w /tmp doesn't help but --nthreads 1 does. --omp-nthreads doesn't affect it. So, there's likely some multi-threading library not installed.

@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Oct 19, 2017

@oesteban - you may be on to something here:

stark@titan:/tmp/mridata2/testdata$ ls -l /dev/shm
lrwxrwxrwx 1 root root 8 Oct 9 21:12 /dev/shm -> /run/shm
stark@titan:/tmp/mridata2/testdata$ ls -l /run
total 12
-rw-r--r-- 1 root root 0 Jun 30 16:45 init.upgraded
drwxrwxrwt 2 root root 4096 Dec 13 2016 lock
drwxr-xr-x 2 root root 4096 Dec 13 2016 mount
drwxr-xr-x 2 root root 4096 Dec 15 2016 systemd
-rw-rw-r-- 1 root utmp 0 Dec 13 2016 utmp

So, /dev/shm points to nothing

@oesteban

This comment has been minimized.

Copy link
Contributor

@oesteban oesteban commented Oct 19, 2017

There is an alternative hypothesis: that your docker processes are run as non-root. In that case docker does not allow writing to /dev/shm.

EDIT: I see you were using singularity in the cluster, so scratch this alternative.

@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Oct 19, 2017

In Docker on my local cluster, I've added myself as a user to the docker group and run just as my user. What's odd to me is that on the HPC here, the Singularity container runs just fine.

@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Oct 19, 2017

Just to be clear - that "ls" of /dev/shm is from inside the container having done a singularity shell. On the host machine /run/shm is there and as a user I can touch files in there.

@oesteban

This comment has been minimized.

Copy link
Contributor

@oesteban oesteban commented Oct 20, 2017

@chrisgorgo

This comment has been minimized.

Copy link
Contributor

@chrisgorgo chrisgorgo commented Oct 20, 2017

@chrisgorgo

This comment has been minimized.

Copy link
Contributor

@chrisgorgo chrisgorgo commented Oct 20, 2017

With a bit of digging I found this: sylabs/singularity#875

If you can confirm it's the same issue we should move the conversation over there.

@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Oct 23, 2017

Yes, that does seem to be the same issue.

@oesteban

This comment has been minimized.

Copy link
Contributor

@oesteban oesteban commented Oct 23, 2017

Thanks a lot @celstark, let's move the conversation over sylabs/singularity#875

@oesteban oesteban closed this Oct 23, 2017
@celstark

This comment has been minimized.

Copy link
Author

@celstark celstark commented Dec 4, 2017

FWIW, the issue crept back in the latest releases and was not fixed by --nthreads 1. What I figured out did fix it was to bind /run/shm, passing in -B /run/shm:/run/shm to singularity.

@oesteban

This comment has been minimized.

Copy link
Contributor

@oesteban oesteban commented Dec 4, 2017

Thanks for posting the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.