Skip to content

Started jobs hang at "Running calculations ..." #261

Answered by mschubert
mhesselbarth asked this question in Q&A
Discussion options

You must be logged in to vote

Ok, that makes it easier because now we know the issue is a connection problem from the workers to the login node, and not related to ssh.

Your login node likely has multiple network interfaces, and if a worker tries to connect to Sys.info()["nodename"] it resolves to the wrong interface.

You likely need to set options(clustermq.host="<interface that accepts worker connections>").

You can list your network interfaces using the ifconfig command, which will look something like the following:

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        ...

em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  …

Replies: 3 comments 5 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@c1au6i0
Comment options

@mschubert
Comment options

@c1au6i0
Comment options

@mschubert
Comment options

@c1au6i0
Comment options

Answer selected by mschubert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #259 on April 29, 2021 13:11.