Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-32843: Backport v23 release notes #82

Merged
merged 4 commits into from
Dec 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
96 changes: 0 additions & 96 deletions doc/changes/DM-28653.feature.rst

This file was deleted.

6 changes: 0 additions & 6 deletions doc/changes/DM-29756.feature.rst

This file was deleted.

8 changes: 0 additions & 8 deletions doc/changes/DM-29893.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31476.other.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/changes/DM-31541.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31541.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31541.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31841.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31841.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31859.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31884.feature.rst

This file was deleted.

3 changes: 0 additions & 3 deletions doc/changes/DM-31887.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31887.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31944.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-31944.misc.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/changes/DM-31970.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32027.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32066.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32074.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32074.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32201.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32201.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32217.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32217.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32220.bugfix.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/changes/DM-32241.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32241.perf.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-32435.bugfix.rst

This file was deleted.

3 changes: 0 additions & 3 deletions doc/changes/DM-32594.other.rst

This file was deleted.

1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
html_title = project
html_short_title = project
doxylink = {}
exclude_patterns = ["changes/*"]
171 changes: 171 additions & 0 deletions doc/lsst.ctrl.bps/CHANGES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
ctrl_bps v23.0.0 2021-12-10
===========================

New Features
------------

- * Added bps htcondor job setting that should put jobs that
get the signal 7 when exceeding memory on hold. Held
message will say: "Job raised a signal 7. Usually means
job has gone over memory limit." Until bps has the
automatic memory exceeded retries, you can restart these
the same way as with jobs that htcondor held for exceeding
memory limits (condor_qedit and condor_release).

* Too many files were being written to single directories in
``job/<label>``. There is now a template for it defined in yaml:

.. code-block:: yaml

subDirTemplate: "{label}/{tract}/{patch}/{visit.day_obs}/{exposure.day_obs}/{band}/{subfilter}/{physical_filter}/{visit}/{exposure}"

To revert back to previous behavior, in your submit yaml set:

.. code-block:: yaml

subDirTemplate: "{label}"

* bps now has defaults so submit yamls should be a lot simpler and
require less changes when bps or pipetask changes. For default
values see ``${CTRL_BPS_DIR}/python/lsst/ctrl/bps/etc/bps_defaults.yaml``.
See ``${CTRL_BPS_DIR}/doc/lsst.ctrl.bps/pipelines_check.yaml`` for
an example of much simpler submit yaml.

Values in ``bps_defaults.yaml`` are overridden by values in submit
yaml (be careful of scoping rules e.g., values in a pipetask
section override the global setting).

STRONGLY recommend removing (commenting out) settings in the
submit yaml that are set in the default yaml (i.e., the settings
that are same across runs across repos, ...)

It would be helpful to know in what cases submit yamls have to
override default settings, in particular the command lines.

* With the above defaults one can more easily append options to the
pipetask command lines as variables in submit yaml:

* ``extraQgraphOptions``: Adds given string to end of command line for
creating QuantumGraph (e.g., for specifying a task wit -t)

* ``extraInitOptions``: Adds given string to end of pipetaskInit
command line

* ``extraRunQuantumOptions``: Adds given string to end of the pipetask
command line for running a Quantum (e.g., ``--no-versions``)

These can also be specified on the command line (see ``bps submit --help``).

* ``--extra-qgraph-options TEXT``
* ``--extra-init-options TEXT``
* ``--extra-run-quantum-options TEXT``

Settings on command line override values set in submit yaml.

The default commands no longer include ``--no-versions`` or saving
a dot version of the QuantumGraph. Use the appropriate new variable
or command-line option to add those back.

* Can specify some pipetask options on command line (see ``bps submit --help``):

* ``-b``, ``--butler-config TEXT``
* ``-i``, ``--input COLLECTION ...``
* ``-o``, ``--output COLL``
* ``--output-run COLL``
* ``-d``, ``--data-query QUERY``
* ``-p``, ``--pipeline FILE``
* ``-g``, ``--qgraph TEXT``

Settings on command line override values set in submit yaml.

* bps now saves yaml in run's submit directory. One is
just a copy of the submit yaml (uses original filename). And
one is a dump of the config after combining command-line options,
defaults and submit yaml (``<run>_config.yaml``).

* If pipetask starts reporting errors about database connections
(e.g., remaining connection slots are reserved for non-replication
superuser connections) ask on ``#dm-middleware-support`` about
using execution butler in bps. This greatly reduces the number of
connections to the central database per run. It is not yet the default
behavior of bps, but one can modify the submit yaml to use it. See
``${CTRL_BPS_DIR}/doc/lsst.ctrl.bps/pipelines_check_execution_butler.yaml``

The major differences visible to users are:

* bps report shows new job called ``mergeExecutionButler`` in detailed view.
This is what saves the run info into the central butler repository.
As with any job, it can succeed or fail. Different from other jobs, it
will execute at the end of a run regardless of whether a job failed or
not. It will even execute if the run is cancelled unless the cancellation
is while the merge is running. Its output will go where other jobs go (at
NCSA in ``jobs/mergeExecutionButler`` directory).

* See new files in submit directory:

* ``EXEC_REPO-<run>``: Execution butler (yaml + initial sqlite file)
* ``execution_butler_creation.out``: output of command to create execution butler
* ``final_job.bash``: Script that is executed to do the merging of the run info into the central repo.
* ``final_post_mergeExecutionButler.out``: An internal file for debugging incorrect reporting of final run status. (`DM-28653 <https://jira.lsstcorp.org/browse/DM-28653>`_)
- * Add ``numberOfRetries`` option which specifies the maximum number of retries
allowed for a job.
* Add ``memoryMultiplier`` option to allow for increasing the memory
requirements automatically between retries for jobs which exceeded memory
during their execution. At the moment this option is only supported by
HTCondor plugin. (`DM-29756 <https://jira.lsstcorp.org/browse/DM-29756>`_)
- * ``bps report``

* Columns now are as wide as the widest value/heading
and some other minor formatting changes.

* Detailed report (``--id``) now has an Expected column
that shows expected counts per PipelineTask label
from the QuantumGraph. (`DM-29893 <https://jira.lsstcorp.org/browse/DM-29893>`_)
- Create list of node ids for the ``pipetask --init-only`` job. (`DM-31541 <https://jira.lsstcorp.org/browse/DM-31541>`_)
- Add a new configuration option, ``preemptible``, which indicates whether a job can be safely preempted. (`DM-31841 <https://jira.lsstcorp.org/browse/DM-31841>`_)
- Add user-defined dimension clustering algorithm. (`DM-31859 <https://jira.lsstcorp.org/browse/DM-31859>`_)
- Add ``--log-label`` option to ``bps`` command to allow extra information to be injected into the log record. (`DM-31884 <https://jira.lsstcorp.org/browse/DM-31884>`_)
- Make using an execution butler the default. (`DM-31887 <https://jira.lsstcorp.org/browse/DM-31887>`_)
- Change HTCondor bps plugin to use HTCondor curl plugin for local job transfers. (`DM-32074 <https://jira.lsstcorp.org/browse/DM-32074>`_)


Bug Fixes
---------

- * Fix issue with accessing non-existing attributes when creating the final job.
* Fix issue preventing ``bps report`` from getting the run name correctly. (`DM-31541 <https://jira.lsstcorp.org/browse/DM-31541>`_)
- Fix issue with job attributes not being set. (`DM-31841 <https://jira.lsstcorp.org/browse/DM-31841>`_)
- * Fix variable substitution in merge job commands.
* Fix bug where final job doesn't appear in report.
* Fix bug in HTCondor plugin for reporting final job status when --id <path>. (`DM-31887 <https://jira.lsstcorp.org/browse/DM-31887>`_)
- Fix single concurrency limit splitting. (`DM-31944 <https://jira.lsstcorp.org/browse/DM-31944>`_)
- * Fix AttributeError during submission if explicitly not using execution butler.
* Fix bps report summary PermissionsError caused by certain runs with previous version in queue. (`DM-31970 <https://jira.lsstcorp.org/browse/DM-31970>`_)
- Fix the bug in the formula governing memory scaling. (`DM-32066 <https://jira.lsstcorp.org/browse/DM-32066>`_)
- Fix single quantum cluster missing node number. (`DM-32074 <https://jira.lsstcorp.org/browse/DM-32074>`_)
- Fix execution butler with HTCondor plugin bug when output collection has period. (`DM-32201 <https://jira.lsstcorp.org/browse/DM-32201>`_)
- Fix issues with bps commands displaying inaccurate timings (`DM-32217 <https://jira.lsstcorp.org/browse/DM-32217>`_)
- Disable HTCondor auto detection of files to copy back from jobs. (`DM-32220 <https://jira.lsstcorp.org/browse/DM-32220>`_)
- * Fixed bug when not using lazy commands but using execution butler.
* Fixed bug in ``htcondor_service.py`` that overwrote message in bps report. (`DM-32241 <https://jira.lsstcorp.org/browse/DM-32241>`_)
- * Fixed bug when a pipetask process killed by a signal on the edge node did not expose the failing status. (`DM-32435 <https://jira.lsstcorp.org/browse/DM-32435>`_)


Performance Enhancement
-----------------------

- Cache values by labels to reduce number of config lookups to speed up multiple submission stages. (`DM-32241 <https://jira.lsstcorp.org/browse/DM-32241>`_)


Other Changes and Additions
---------------------------

- Complain about missing memory limit only if memory autoscaling is enabled. (`DM-31541 <https://jira.lsstcorp.org/browse/DM-31541>`_)
- Persist bps DAG attributes across manual restarts. (`DM-31944 <https://jira.lsstcorp.org/browse/DM-31944>`_)
- Change ``outCollection`` in submit YAML to ``outputRun``. (`DM-32027 <https://jira.lsstcorp.org/browse/DM-32027>`_)
- Change default for bpsUseShared to True. (`DM-32201 <https://jira.lsstcorp.org/browse/DM-32201>`_)
- Switch default logging level from WARN to INFO. (`DM-32217 <https://jira.lsstcorp.org/browse/DM-32217>`_)
- Provide a cleaned up version of default config yaml for PanDA-pluging on IDF (`DM-31476 <https://jira.lsstcorp.org/browse/DM-31476>`_)
- Rolled back changes in BpsConfig that were added for flexibility when looking up config values
(e.g., snake case keys will no longer match camel case keys nor will either match lower case keys).
This also removed dependence on third-party inflection package. (`DM-32594 <https://jira.lsstcorp.org/browse/DM-32594>`_)
10 changes: 10 additions & 0 deletions doc/lsst.ctrl.bps/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ lsst.ctrl.bps

.. Paragraph that describes what this Python module does and links to related modules and frameworks.

.. _lsst.ctrl.bps-changes:

Changes
=======

.. toctree::
:maxdepth: 1

CHANGES.rst

.. _lsst.ctrl.bps-using:

Using lsst.ctrl.bps
Expand Down