Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugging: add instructions for debugging with DDT #266

Merged
merged 1 commit into from
Mar 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
37 changes: 35 additions & 2 deletions jobs/debugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
Debugging Jobs
==============

Flux supports parallel debuggers such as Rogue Wave Software (RWS)'s
`TotalView parallel debugger <https://totalview.io>`_.
Debugging Flux jobs has been tested with Rogue Wave Software (RWS)'s
`TotalView parallel debugger <https://totalview.io>`_ and
Linaro `DDT <https://www.linaroforge.com/linaroDdt/>`_. More detailed
instructions for specific debuggers are included in the sections below.

----------------------------------
Parallel Debugging using TotalView
Expand Down Expand Up @@ -94,6 +96,37 @@ Notice that it is designed to support not only Flux but also Slurm's
srun and IBM JSM's jsrun commands. The ``regex`` syntax of
``exec_handling`` within TotalView can be found in `TotalView user guide`_.

---------------------------
Parallel Debugging with DDT
---------------------------

While at this time DDT does not have native support for Flux, small to
medium size jobs can be debugged with DDT using a combination of the
:core:man1:`flux job` :command:`hostpids` command and the :command:`ddt
--attach` option. For example, to attach :command:`ddt` to the previous
job

.. code-block:: console

$ ddt --attach=$(flux job hostpids $(flux job last))

Flux can launch jobs with every task stopped in :linux:man2:`exec` by
providing the ``stop-tasks-in-exec`` job shell option. Thus, launching a
job under control of DDT can be simulated by something like:

.. code-block:: console

$ ddt --attach=$(flux job hostpids $(flux submit -n 265 myapp))

The :command:`flux job hostpids` command will block until the job has started
running and the process IDs for all tasks are available, and therfore
:command:`ddt` will not launch until the job has started and is ready
for debugger attach. Since tasks have been stopped in :linux:man2:`exec`,
the debugger will have control of job tasks before execution begins.

.. note::

:command:`flux job hostpids` was added in flux-core v0.60.0.

------------
Known Issues
Expand Down