Skip to content

Commit

Permalink
Merge pull request #5605 from garlick/rexec_shell
Browse files Browse the repository at this point in the history
shell: add rexec plugin
  • Loading branch information
mergify[bot] committed Mar 1, 2024
2 parents 650223d + 6693d11 commit 40d6f4c
Show file tree
Hide file tree
Showing 12 changed files with 544 additions and 100 deletions.
2 changes: 0 additions & 2 deletions doc/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,6 @@ if ENABLE_DOCS
man_MANS = $(MAN1_FILES) $(MAN3_FILES) $(MAN5_FILES) $(MAN7_FILES)
$(RST_FILES): \
man1/common/resources.rst \
man1/common/nodeset.rst \
man1/common/job-param-additional.rst \
man1/common/job-param-batch.rst \
man1/common/job-param-common.rst \
Expand Down Expand Up @@ -461,7 +460,6 @@ EXTRA_DIST = \
$(RST_FILES) \
man1/index.rst \
man1/common/resources.rst \
man1/common/nodeset.rst \
man1/common/job-param-additional.rst \
man1/common/job-param-batch.rst \
man1/common/job-param-common.rst \
Expand Down
22 changes: 0 additions & 22 deletions doc/man1/common/nodeset.rst

This file was deleted.

115 changes: 76 additions & 39 deletions doc/man1/flux-exec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,66 +2,85 @@
flux-exec(1)
============


SYNOPSIS
--------
**flux** **exec** [--noinput] [*--label-io] [*—dir=DIR'] [*--rank=NODESET*] [*--verbose*] COMMANDS...
========

**flux** **exec** [*--noinput*] [*--label-io*] [*—dir=DIR*] [*--rank=IDSET*] [*--verbose*] *COMMAND...*

DESCRIPTION
===========

.. program:: flux exec

:program:`flux exec` runs commands across one or more Flux broker ranks using
the *broker.exec* service. The commands are executed as direct children
of the broker, and the broker handles buffering stdout and stderr and
sends the output back to :program:`flux exec` which copies output to its own
stdout and stderr.

On receipt of SIGINT and SIGTERM signals, :program:`flux exec` shall forward
the received signal to all currently running remote processes.

In the event subprocesses are hanging or ignoring SIGINT, two SIGINT
signals (typically sent via Ctrl+C) in short succession can force
:program:`flux exec` to exit.

:program:`flux exec` is meant as an administrative and test utility, and cannot
be used to launch Flux jobs.
:program:`flux exec` remotely executes one or more copies of *COMMAND*,
similar to :linux:man1:`pdsh`. It bypasses the scheduler and is intended
for launching administrative commands or tool daemons, not for launching
parallel jobs. For that, see :man1:`flux-run`.

By default, *COMMAND* runs across all :man1:`flux-broker` processes. If the
:option:`--jobid` option is specified, the commands are run across a job's
:man1:`flux-shell` processes. Normally this means that one copy of *COMMAND*
is executed per node, but in unusual cases it could mean more (e.g. if the
Flux instance was started with multiple brokers per node).

EXIT STATUS
===========

In the case that all processes are successfully launched, the exit status
of :program:`flux exec` is the largest of the remote process exit codes.

If a non-existent rank is targeted, :program:`flux exec` will return with
code 68 (EX_NOHOST from sysexits.h).

If one or more remote commands are terminated by a signal, then
:program:`flux exec` exits with exit code 128+signo.
Standard output and standard error of the remote commands are captured
and combined on the :program:`flux exec` standard output and standard error.
Standard input of :program:`flux exec` is captured and broadcast to standard
input of the remote commands.

On receipt of SIGINT and SIGTERM signals, :program:`flux exec` forwards
the received signal to the remote processes. When standard input of
:program:`flux exec` is a terminal, :kbd:`Control-C` may be used to send
SIGINT. Two of those in short succession can force :program:`flux exec`
to exit in the event that remote processes are hanging.

OPTIONS
=======

.. option:: -l, --label-io

Label lines of output with the source RANK.
Label lines of output with the source broker RANK. This option is not
affected by :option:`--jobid`.

.. option:: -n, --noinput

Do not attempt to forward stdin. Send EOF to remote process stdin.

.. option:: -d, --dir=DIR

Set the working directory of remote *COMMANDS* to *DIR*. The default is to
Set the working directory of remote *COMMAND* to *DIR*. The default is to
propagate the current working directory of flux-exec(1).

.. option:: -r, --rank=NODESET
.. option:: -r, --rank=IDSET

Target specific ranks, where *IDSET* is a set of zero-origin node ranks in
RFC 22 format. If :option:`--jobid` is specified, the ranks are interpreted
as an index into the list of nodes assigned to the job. Otherwise, they
refer to the nodes assigned to the Flux instance.

The default is to target all ranks. As a special case, :option:`--rank=all`
is accepted and behaves the same as the default.

.. option:: -x, --exclude=IDSET

Target specific ranks in *NODESET*. Default is to target "all" ranks.
See `NODESET FORMAT`_ below for more information.
Exclude specific ranks. *IDSET* is as described in :option:`--rank`.

.. option:: -j, --jobid=JOBID

Run *COMMAND* on the nodes allocated to *JOBID* instead of the nodes
assigned to the Flux instance.

This uses the exec service embedded in :man1:`flux-shell` rather than
:man1:`flux-broker`.

The interpretation of :option:`--rank` and :option:`--exclude` is adjusted
as noted in their descriptions. For example, :option:`flux exec -j ID -r 0`
will run only on the first node assigned to *JOBID*, and
:option:`flux exec -j ID -x 0` will run on all nodes assigned to *JOBID*
except the first node.

This option is only available when the job owner is the same as the Flux
instance owner.

.. option:: -v, --verbose

Expand All @@ -73,22 +92,40 @@ OPTIONS

.. option:: --with-imp

Prepend the full path to :program:`flux-imp run` to *COMMANDS*. This option
Prepend the full path to :program:`flux-imp run` to *COMMAND*. This option
is mostly meant for testing or as a convenience to execute a configured
``prolog`` or ``epilog`` command under the IMP. Note: When this option is
used, or if :program:`flux-imp` is detected as the first argument of
*COMMANDS*, :program:`flux exec` will use :program:`flux-imp kill` to
*COMMAND*, :program:`flux exec` will use :program:`flux-imp kill` to
signal remote commands instead of the normal builtin subprocess signaling
mechanism.

CAVEATS
=======

In a multi-user flux instance, access to the rank 0 broker execution
service is restricted to requests that originate from the local broker.
Therefore, :program:`flux exec` (without :option:`--jobid`) must be run
from the rank 0 broker if rank 0 is included in the target *IDSET*.

EXIT STATUS
===========

NODESET FORMAT
==============
In the case that all processes are successfully launched, the exit status
of :program:`flux exec` is the largest of the remote process exit codes.

.. include:: common/nodeset.rst
If a non-existent rank is targeted, :program:`flux exec` will return with
code 68 (EX_NOHOST from sysexits.h).

If one or more remote commands are terminated by a signal, then
:program:`flux exec` exits with exit code 128+signo.

RESOURCES
=========

.. include:: common/resources.rst

FLUX RFC
========

:doc:`rfc:spec_22`
3 changes: 1 addition & 2 deletions doc/test/spell.en.pws
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ graphviz
prepended
multipart
Tpng
nodesets
keepalives
emerg
gettimeofday
Expand Down Expand Up @@ -117,7 +116,6 @@ ary
baz
EPGM
modopts
nodeset
noexec
pre
slurm
Expand Down Expand Up @@ -831,3 +829,4 @@ unlinks
VM
fred
unmapped
kbd

0 comments on commit 40d6f4c

Please sign in to comment.