New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"localhost" does not always work for "multicore" #192
Comments
Thanks for the report. I definitely want to support containers as much out of the box as possible, so maybe referencing the IP instead of the host name is a good idea here. The only drawback I see is that if a network is IPv6-only (quite unlikely), this would fail. In the meantime, does If not, does starting your container adding docker run --add-host localhost:127.0.0.1 |
I checked, and
On my laptop, it appears as
I am using singularity, and I can't find an equivalent to I did quickly check if I can specify an IPv6 address, and it doesn't work for In the mean time, I have forked and made the change I suggested. I won't submit a pull request for now, because it might not be a general solution. |
Another option: use That succeeded on the HPC for me, but not on my laptop. The inverse problem. |
This will be solved more generally with #170. Thank you for reporting! |
clustermq/R/qsys_multicore.r
Line 11 in a080a61
I am using clustermq v0.8.8.1 inside a container on a HPC, with drake.
I have not been able to use the PBS scheduler, because I haven't figured out how to get "qsub" to work from the container, I can't just bind "qsub" in from the host system.
I would also prefer not to SSH from my laptop to the server for the long running job, I'm moving a lot of data around.
I would be happy to use the multicore scheduler, which will still let me use the 24 core nodes on the HPC.
However, while everything runs well on my laptop, when I move to the HPC, I get "Invalid Argument" from each worker when I try to call
workers(5)
on the HPC. After a bit of poking, it seems like "localhost" will not work on the HPC, while "127.0.0.1" does.Would there be any issues with changing the
node
argument to "127.0.0.1", or allowing it to be passed in as a parameter?Some web pages that helped me debug the issue:
http://api.zeromq.org/2-1:zmq-tcp
https://stackoverflow.com/questions/6024003/why-doesnt-zeromq-work-on-localhost
The text was updated successfully, but these errors were encountered: