Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with long hostname resolution #10

Open
GoogleCodeExporter opened this issue Mar 22, 2015 · 0 comments
Open

Problem with long hostname resolution #10

GoogleCodeExporter opened this issue Mar 22, 2015 · 0 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?

1. Run a mutli-node scoop run using using full domain names in the --hosts line
e.g.,
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain 
node2.default.domain -n 32 scoopCode.py 


What is the expected output? 

I would expect this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain 
node2.default.domain -n 32 scoopCode.py

to do the same this as this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1 node2 -n 32 
scoopCode.py

What do you see instead?

using long host names I get the following error

ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 469, in main
    rootTaskExitCode = thisScoopApp.run()
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 258, in run
    backend=self.backend,
  File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launch/brokerLaunch.py", line 148, in __init__
    "SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:

Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
Connection to cl2n091.default.domain closed.

But it runs perfectly fine with only the sort host names


What version of the product are you using? 

Python 2.7.5
Scoop version 0.7.2


On what operating system?

SUSE Linux 11 


Please provide any additional information below.

I am actually try to run this on out SGI cluster (SGI customized SUSE11), it 
uses PBS Pro as the scheduler. If I submit a job with how the hosts line, scoop 
detects the hosts PBS has given the job correctly, but it provides the full 
hostnames. If I submit a multinode interactive job and manually provide the 
short names it works fine, but this is really not ideal as it should be able to 
go through the batch system properly. 

Original issue reported on code.google.com by david.wa...@qut.edu.au on 27 Aug 2014 at 10:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant