Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running a container on multi-node #35

Closed
Smahane opened this issue May 12, 2021 · 5 comments
Closed

Error running a container on multi-node #35

Smahane opened this issue May 12, 2021 · 5 comments
Labels
bug Something isn't working needs more info

Comments

@Smahane
Copy link

Smahane commented May 12, 2021

I'm running a sandbox version of a lammps container on 4 nodes in an HPC cluster. I'm getting the errors bellow. Any ideas please?

Version of Singularity:

What version of Singularity are you using? Run:

$ singularity version

singularity version 3.7.1

Expected behavior

What did you expect to see when you do...?

run the container without errors on multi-node

Actual behavior

00007ffd72d2e8e0: <0000000000000000  00007ffd72d2e928 FATAL:   container creation failed: mount tmpfs->${path}/singularity/3.7.1/var/singularity/mnt/session error: while mounting tmpfs: can't mount tmpfs filesystem to ${path}/singularity/3.7.1/var/singularity/mnt/session: write unix @->@: write: broken pipe

00007ffd72d2e8f0:  000055ce447f233a <runtime.mmap.func1+90>  0000000000000000
00007ffd72d2e900:  0000000000210808  00007ffd72d2e950
00007ffd72d2e910:  00007ffd72d2e960  0000000000000040
00007ffd72d2e920:  0000000000000040  0000000000000001
00007ffd72d2e930:  000000006e43a318  000055ce44de0f6c
00007ffd72d2e940:  0000000000000000  000055ce447fa69e <runtime.callCgoMmap+62>
00007ffd72d2e950:  00007ffd72d2e950  0000000000000000
00007ffd72d2e960:  fffffffe7fffffff  ffffffffffffffff
00007ffd72d2e970:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e980:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e990:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e9a0:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e9b0:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e9c0:  ffffffffffffffff  ffffffffffffffff
00007ffd72d2e9d0:  ffffffffffffffff  ffffffffffffffff

goroutine 1 [chan receive, locked to thread]:
runtime.gopark(0x55ce45260dc8, 0xc000054058, 0x170e, 0x2)
        runtime/proc.go:304 +0xe6
runtime.chanrecv(0xc000054000, 0x0, 0xc000000101, 0x55ce447a0101)
        runtime/chan.go:535 +0x2f9
runtime.chanrecv1(0xc000054000, 0x0)
        runtime/chan.go:412 +0x2b
runtime.gcenable()
        runtime/mgc.go:217 +0xae
runtime.main()
        runtime/proc.go:166 +0x11d
runtime.goexit()
        runtime/asm_amd64.s:1373 +0x1

rax    0x0
rbx    0x6
rcx    0x1477e97627ff
rdx    0x0
rdi    0x2
rsi    0x7ffd72d2e8e0
rbp    0x55ce44eee9b9
rsp    0x7ffd72d2e8e0
r8     0x0
r9     0x7ffd72d2e8e0
r10    0x8
r11    0x246
r12    0x55ce46006580
r13    0x0
r14    0x55ce44ed0fb4
r15    0x0
rip    0x1477e97627ff
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
FATAL:   container creation failed: mount tmpfs->${path}/singularity/3.7.1/var/singularity/mnt/session error: while mounting tmpfs: can't mount tmpfs filesystem to ${path}/singularity/3.7.1/var/singularity/mnt/session: write unix @->@: write: broken pipe
FATAL:   container creation failed: mount tmpfs->${path}/singularity/3.7.1/var/singularity/mnt/session error: while mounting tmpfs: can't mount tmpfs filesystem to ${path}/singularity/3.7.1/var/singularity/mnt/session: read unix @->@: read: connection reset by peer
.. .. .. .. .. /opt/intel/psxe_runtime_2020.4.17/linux/mpi/intel64/bin/mpirun

Steps to reproduce this behavior

How can others reproduce this issue/problem?
$ mpiexec.hydra -hostfile hostfile -ppn $ppn -np $np singularity run <sandboxContainer>

$ cat /etc/os-release
### What OS/distro are you running
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"

How did you install Singularity

Write here how you installed Singularity. Eg. RPM, source.
Not sure

@Smahane Smahane added the bug Something isn't working label May 12, 2021
@dtrudg
Copy link
Member

dtrudg commented May 12, 2021

Hello. The mount tmpfs->${path} portion of the error message likely indicates the problem here. The ${path} portion should instead be part of the path under which singularity was installed.

Please could you give the output of singularity buildcfg so we can see the configuration with which singularity was compiled?

@Smahane
Copy link
Author

Smahane commented May 12, 2021

Hello,

The ${path} is the full path where singularity is installed. I just replaced it with a variable.

Here is the the configuration:

$singularity buildcfg
PACKAGE_NAME=singularity
PACKAGE_VERSION=3.7.1
BUILDDIR=/home/rdah/singularity/builddir
PREFIX=/opt/cr/singularity/3.7.1
EXECPREFIX=/opt/cr/singularity/3.7.1
BINDIR=/opt/cr/singularity/3.7.1/bin
SBINDIR=/opt/cr/singularity/3.7.1/sbin
LIBEXECDIR=/opt/cr/singularity/3.7.1/libexec
DATAROOTDIR=/opt/cr/singularity/3.7.1/share
DATADIR=/opt/cr/singularity/3.7.1/share
SYSCONFDIR=/opt/cr/singularity/3.7.1/etc
SHAREDSTATEDIR=/opt/cr/singularity/3.7.1/com
LOCALSTATEDIR=/opt/cr/singularity/3.7.1/var
RUNSTATEDIR=/opt/cr/singularity/3.7.1/var/run
INCLUDEDIR=/opt/cr/singularity/3.7.1/include
DOCDIR=/opt/cr/singularity/3.7.1/share/doc/singularity
INFODIR=/opt/cr/singularity/3.7.1/share/info
LIBDIR=/opt/cr/singularity/3.7.1/lib
LOCALEDIR=/opt/cr/singularity/3.7.1/share/locale
MANDIR=/opt/cr/singularity/3.7.1/share/man
SINGULARITY_CONFDIR=/opt/cr/singularity/3.7.1/etc/singularity
SESSIONDIR=/opt/cr/singularity/3.7.1/var/singularity/mnt/session
PLUGIN_ROOTDIR=/opt/cr/singularity/3.7.1/libexec/singularity/plugin
SINGULARITY_CONF_FILE=/opt/cr/singularity/3.7.1/etc/singularity/singularity.conf
SINGULARITY_SUID_INSTALL=0

@Smahane
Copy link
Author

Smahane commented May 17, 2021

hello @dtrudg, do you have any idea on this issue please?

Thank you

@dtrudg
Copy link
Member

dtrudg commented May 17, 2021

Can you paste output of debug logs.. i.e. with singularity -d run ... as there isn't a lot to see here so far. The write unix @->@: write: broken pipe indicates that it hasn't been possible for communication of the mount instruction to be made across the socket that is used for that.

It's not something that has been seen before tbh. I cannot reproduce under the MPI environments I have available to me here. If you can verify it works under a different MPI stack, e.g. OpenMPI / MVAPICH etc. that'd be useful also.

@dtrudg
Copy link
Member

dtrudg commented Jan 12, 2022

Closing as there hasn't been further information provided. Please re-open if you are still having the problem and are able to provide additional diagnostic information.

@dtrudg dtrudg closed this as completed Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs more info
Projects
None yet
Development

No branches or pull requests

2 participants