Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBS job templates not picked up? #37

Closed
paulmelis opened this issue Sep 10, 2015 · 7 comments
Closed

PBS job templates not picked up? #37

paulmelis opened this issue Sep 10, 2015 · 7 comments
Milestone

Comments

@paulmelis
Copy link

I'm using ipyparallel 4.0.2 and friends (with Python 3.4) installed in a virtualenv with pip. I'm trying to get a set of engines+controller running on a system with the Torque batch system, i.e. PBS. I've followed the documentation at https://ipyparallel.readthedocs.org/en/stable/process.html#using-ipcluster-in-pbs-mode to get the configuration set up.

When I launch with "ipcluster start --profile=1engine-per-node -n 2" my job templates are not being used (the actual job name is different from what I specify in the templates with -N and the walltime is incorrect).

The steps I've done:

  1. ipython profile create --parallel --profile=1engine-per-node
  2. Edit in ~/.ipython/profile_1engine-per-node
    a) ipcluster_config.py:

c.IPClusterEngines.engine_launcher_class = 'PBS'
c.IPClusterStart.controller_launcher_class = 'PBS'
c.IPClusterEngines.n = 4
c.PBSControllerLauncher.batch_file_name = 'controller.template'
c.PBSEngineSetLauncher.batch_file_name = 'engine.template'

b) ipcontroller_config.py:

c.HubFactory.ip = '*'

  1. Added templates:
    a) .ipython/profile_1engine-per-node/controller.template
    #PBS -N ipyparallel-controller
    #PBS -j oe
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR

source $HOME/pyenv/3.4/bin/activate

ipcontroller --profile-dir={profile_dir}

b) .ipython/profile_1engine-per-node/engine.template

PBS -N ipyparallel-engine

PBS -j oe

PBS -l walltime=01:00:00

PBS -l nodes={n}:ppn=1

cd $PBS_O_WORKDIR

source $HOME/pyenv/3.4/bin/activate

module load openmpi/gnu/1.6.5

which mpiexec -n {n} ipengine --profile-dir={profile_dir}

Btw, are the profiles that get generated with "profile create" written based on inspecting Python classes or something? I generated a new dummy parallel profile so I could compare what I had changed in my 1engine-per-node profile, but a diff show vastly different order of the config items, making direct comparison hard.

@minrk
Copy link
Member

minrk commented Sep 10, 2015

Btw, are the profiles that get generated with "profile create" written based on inspecting Python classes or something?

The unstable ordering is a bug that will be fixed in traitlets 4.1.

@paulmelis
Copy link
Author

Okay, so I was using the wrong configuration item. c.PBSControllerLauncher.batch_file_name should be c.PBSControllerLauncher.batch_template_file (and similar for the engine template).

But the template file name needs to be a full path to a file, its seems. As otherwise it won't be found. I.e. with

c.PBSControllerLauncher.batch_template_file = 'controller.template'
c.PBSEngineSetLauncher.batch_template_file = 'engine.template'

and both template files located in $HOME/.ipython/profile_1engine-per-node I get:

(3.4) paulm@login:~$ ipcluster start --profile=1engine-per-node -n 2
2015-09-11 14:09:35.377 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-09-11 14:09:35.378 [IPClusterStart] Creating pid file: /home/paulm/.ipython/profile_1engine-per-node/pid/ipcluster.pid
2015-09-11 14:09:35.379 [IPClusterStart] Starting Controller with PBS
2015-09-11 14:09:35.380 [IPClusterStart] ERROR | Controller start failed
Traceback (most recent call last):
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/ipclusterapp.py", line 503, in start_controller
    self.controller_launcher.start()
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1188, in start
    return super(PBSControllerLauncher, self).start(1)
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1135, in start
    self.write_batch_script(n)
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1099, in write_batch_script
    with open(self.batch_template_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'controller.template'
ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.wrapped at 0x7f462a5de0d0>)
Traceback (most recent call last):
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/zmq/eventloop/minitornado/ioloop.py", line 463, in _run_callback
    callback()
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/zmq/eventloop/minitornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "<string>", line 3, in raise_exc_info
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/zmq/eventloop/minitornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/ipclusterapp.py", line 548, in start
    self.start_controller()
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/ipclusterapp.py", line 503, in start_controller
    self.controller_launcher.start()
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1188, in start
    return super(PBSControllerLauncher, self).start(1)
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1135, in start
    self.write_batch_script(n)
  File "/home/paulm/pyenv/3.4/lib/python3.4/site-packages/ipyparallel/apps/launcher.py", line 1099, in write_batch_script
    with open(self.batch_template_file) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'controller.template'

@minrk
Copy link
Member

minrk commented Sep 11, 2015

And if you specify them as full paths?

@paulmelis
Copy link
Author

Well, yes, that's what I wrote. I need to specify their full paths, and then they are found.

@minrk
Copy link
Member

minrk commented Sep 11, 2015

Then this issue can be closed?

@minrk minrk added this to the no action milestone Sep 11, 2015
@paulmelis
Copy link
Author

Well, the documentation might still need a remark added, as it doesn't show full paths to the template files.

@minrk
Copy link
Member

minrk commented Jun 2, 2021

Hi! I’m going through and cleaning up old/stale issues on this repo. Sorry for leaving it forever. I'm hoping to bring this repo back to a healthier state.

Template paths are evaluated relative to the cwd, like most file path arguments (e.g. cat pbs.engine.template). Absolute paths always work, and relative paths are not treated specially, but are sensitive to the working directory.

@minrk minrk closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants