Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-29041: Support repository names as ap_verify --dataset argument and deprecate old names #122

Merged
merged 5 commits into from
Mar 31, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/dataset_config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
datasets:
datasets: # TODO: remove in DM-29042
HiTS2015: ap_verify_hits2015
CI-HiTS2015: ap_verify_ci_hits2015
CI-CosmosPDR2: ap_verify_ci_cosmos_pdr2
Expand Down
10 changes: 5 additions & 5 deletions doc/lsst.ap.verify/command-line-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,16 +77,16 @@ Required arguments are :option:`--dataset` and :option:`--output`.
This option is identical to :option:`--id`, and will become the primary data ID argument as Gen 2 is retired.
It is recommended over :option:`--id` for :option:`--gen3` runs.

.. option:: --dataset <dataset_name>
.. option:: --dataset <dataset_package>

**Input dataset designation.**
**Input dataset package.**

The :doc:`input dataset <datasets>` is required for all ``ap_verify`` runs except when using :option:`--help`.

The argument is a unique name for the dataset, which can be associated with a repository in the :ref:`configuration file<ap-verify-configuration-dataset>`.
See :ref:`ap-verify-dataset-name` for more information on dataset names.
The argument is the name of the Git LFS repository containing the dataset to process.
The repository must be set up before running ``ap_verify``.

:ref:`Allowed names <ap-verify-datasets-index>` can be queried using the :option:`--help` argument.
This documentation includes a :ref:`list of supported datasets <ap-verify-datasets-index>`.

.. option:: --dataset-metrics-config <filename>

Expand Down
30 changes: 0 additions & 30 deletions doc/lsst.ap.verify/configuration.rst

This file was deleted.

10 changes: 0 additions & 10 deletions doc/lsst.ap.verify/datasets-creation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ Packaging data as a dataset

:doc:`datasets` is designed to be as generic as possible, and should be able to accommodate any collection of observations so long as the source observatory has an :ref:`observatory interface (obs) package<obs-framework>` in the LSST software stack.
This page describes how to create and maintain a dataset.
It does not include :ref:`configuring ap_verify to use the dataset<ap-verify-configuration-dataset>`.

.. _ap-verify-datasets-creation-gitlfs:

Expand Down Expand Up @@ -78,12 +77,3 @@ The observatory package must be named in two files:
If any other packages are required to process the data, they should have their own ``setupRequired`` lines.
* :file:`repo/_mapper` must contain a single line with the name of the obs package's mapper class.
For DECam data this is ``lsst.obs.decam.DecamMapper``.

.. _ap-verify-datasets-creation-name:

Registering a dataset name
==========================

In order to be recognized by :option:`ap_verify.py --dataset`, datasets must be registered in ``ap_verify``'s :ref:`configuration file<ap-verify-configuration-dataset>`.
The line for the new dataset should be committed to the ``ap_verify`` Git repository.
To avoid accidental downloads, datasets **should not** be registered as an EUPS dependency of ``ap_verify``, even an optional one.
2 changes: 1 addition & 1 deletion doc/lsst.ap.verify/datasets-install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Installation procedure
======================

Use the `LSST Software Build Tool <https://developer.lsst.io/stack/lsstsw.html>`_ to request the dataset by its package name.
A :ref:`list of existing datasets <ap-verify-datasets-index>` is maintained as part of this documentation.
A :ref:`list of supported datasets <ap-verify-datasets-index>` is maintained as part of this documentation.
Because of their large size (typically hundreds of GB), datasets are *never* installed as a dependency of another package; they must be requested explicitly.

For example, to install the `Cosmos PDR2 <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_ CI dataset,
Expand Down
13 changes: 7 additions & 6 deletions doc/lsst.ap.verify/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,15 @@ In depth

.. _ap-verify-datasets-index:

Existing datasets
=================
Supported datasets
==================

These datasets are also listed when running :option:`ap_verify.py -h`.
These datasets are maintained by the ``ap_verify`` group.
There may be other datasets :ref:`formatted<ap-verify-datasets-structure>` for use with ``ap_verify``.

* `HiTS2015 (HiTS 2015 with 2014 templates) <https://github.com/lsst/ap_verify_hits2015/>`_
* `CI-HiTS2015 (HiTS 2015 CI Subset) <https://github.com/lsst/ap_verify_ci_hits2015/>`_
* `CI-CosmosPDR2 (Cosmos DR2 ultradeep fields) <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_
* `ap_verify_hits2015 (HiTS 2015 with 2014 templates) <https://github.com/lsst/ap_verify_hits2015/>`_
* `ap_verify_ci_hits2015 (HiTS 2015 CI Subset) <https://github.com/lsst/ap_verify_ci_hits2015/>`_
* `ap_verify_ci_cosmos_pdr2 (Cosmos DR2 ultradeep fields) <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_

..
TODO: switch to toctree once these docs included in pipelines.lsst.io
Expand Down
1 change: 0 additions & 1 deletion doc/lsst.ap.verify/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ Using lsst.ap.verify
failsafe
new-metrics
command-line-reference
configuration

.. _lsst.ap.verify-contributing:

Expand Down
25 changes: 7 additions & 18 deletions doc/lsst.ap.verify/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,6 @@ While :command:`ap_verify.py` is not a :doc:`command-line task</modules/lsst.pip
This page describes the most common options used to run ``ap_verify``.
For more details, see the :doc:`command-line-reference` or run :option:`ap_verify.py -h`.

.. _ap-verify-dataset-name:

Datasets as input arguments
===========================

Since ``ap_verify`` begins with an uningested :doc:`dataset<datasets>`, the input argument is a dataset name rather than a repository.

Datasets are identified by a name that gets mapped to an :doc:`installed eups-registered package <datasets-install>` containing the data.
The mapping is :ref:`configurable<ap-verify-configuration-dataset>`.
The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ``ap_verify``.

.. _ap-verify-run-output:

How to run ap_verify in a new workspace (Gen 2 pipeline)
Expand All @@ -35,11 +24,11 @@ Using the `Cosmos PDR2`_ CI dataset as an example, one can run :command:`ap_veri

.. prompt:: bash

ap_verify.py --dataset CI-CosmosPDR2 --gen2 --id "visit=59150^59160 filter=HSC-G" --output workspaces/cosmos/
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --id "visit=59150^59160 filter=HSC-G" --output workspaces/cosmos/

Here the inputs are:

* :command:`CI-CosmosPDR2` is the ``ap_verify`` :ref:`dataset name <ap-verify-dataset-name>`,
* :command:`ap_verify_ci_cosmos_pdr2` is the ``ap_verify`` :ref:`dataset <ap-verify-datasets> to process,
* :option:`--gen2` specifies to process the dataset using the Gen 2 pipeline framework,
* :command:`visit=59150^59160 filter=HSC-G` is the :ref:`dataId<command-line-task-dataid-howto-about-dataid-keys>` to process,

Expand All @@ -53,7 +42,7 @@ It's also possible to run an entire dataset by omitting the :option:`--id` argum

.. prompt:: bash

ap_verify.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --output workspaces/cosmos/

.. note::

Expand All @@ -71,11 +60,11 @@ Using the `Cosmos PDR2`_ CI dataset as an example, one can run :command:`ap_veri

.. prompt:: bash

ap_verify.py --dataset CI-CosmosPDR2 --gen3 --data-query "visit in (59150, 59160) and band='g'" --output workspaces/cosmos/
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --data-query "visit in (59150, 59160) and band='g'" --output workspaces/cosmos/

Here the inputs are:

* :command:`CI-CosmosPDR2` is the ``ap_verify`` :ref:`dataset name <ap-verify-dataset-name>`,
* :command:`ap_verify_ci_cosmos_pdr2` is the ``ap_verify`` :ref:`dataset <ap-verify-datasets>` to process,
* :option:`--gen3` specifies to process the dataset using the Gen 3 pipeline framework,
* :command:`visit in (59150, 59160) and band='g'` is the :ref:`data ID query <daf_butler_dimension_expressions>` to process,

Expand All @@ -89,7 +78,7 @@ It's also possible to run an entire dataset by omitting the :option:`--data-quer

.. prompt:: bash

ap_verify.py --dataset CI-CosmosPDR2 --gen3 --output workspaces/cosmos/
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output workspaces/cosmos/

.. note::

Expand All @@ -110,7 +99,7 @@ Using the `Cosmos PDR2`_ dataset as an example, one can run ``ingest_dataset`` i

.. prompt:: bash

ingest_dataset.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/
ingest_dataset.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --output workspaces/cosmos/

The :option:`--dataset`, :option:`--output`, :option:`--gen2`, :option:`--gen3`, and :option:`--processes` arguments behave the same way as for :command:`ap_verify.py`.
Other options from :command:`ap_verify.py` are not available.
Expand Down
2 changes: 1 addition & 1 deletion python/lsst/ap/verify/ap_verify.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class _InputOutputParser(argparse.ArgumentParser):
def __init__(self):
# Help and documentation will be handled by main program's parser
argparse.ArgumentParser.__init__(self, add_help=False)
self.add_argument('--dataset', action=_DatasetAction, choices=Dataset.getSupportedDatasets(),
self.add_argument('--dataset', action=_DatasetAction,
required=True, help='The source of data to pass through the pipeline.')
self.add_argument('--output', required=True,
help='The location of the workspace to use for pipeline repositories.')
Expand Down
1 change: 1 addition & 0 deletions python/lsst/ap/verify/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from lsst.daf.persistence import Policy


# TODO: remove in DM-29042
class Config:
"""Confuration manager for ``ap_verify``.

Expand Down
24 changes: 17 additions & 7 deletions python/lsst/ap/verify/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
#

import os
import warnings

from deprecated.sphinx import deprecated

import lsst.daf.persistence as dafPersistence
import lsst.daf.butler as dafButler
Expand All @@ -43,15 +46,15 @@ class Dataset:
Parameters
----------
datasetId : `str`
A tag identifying the dataset.
The name of the dataset package. A tag identifying the dataset is also
accepted, but this usage is deprecated.

Raises
------
RuntimeError
Raised if `datasetId` exists, but is not correctly organized or incomplete
ValueError
Raised if `datasetId` is not a recognized ap_verify dataset. No side
effects if this exception is raised.
Raised if `datasetId` could not be loaded.
"""

def __init__(self, datasetId):
Expand All @@ -62,15 +65,18 @@ def __init__(self, datasetId):
datasetPackage = self._getDatasetInfo()[datasetId]
if datasetPackage is None:
raise KeyError
else:
warnings.warn(f"The {datasetId} name is deprecated, and will be removed after v24.0. "
f"Use {datasetPackage} instead.", category=FutureWarning)
except KeyError:
raise ValueError('Unsupported dataset: ' + datasetId)
# if datasetId not known, assume it's a package name
datasetPackage = datasetId

try:
self._dataRootDir = getPackageDir(datasetPackage)
except pexExcept.NotFoundError as e:
error = 'Dataset %s requires the %s package, which has not been set up.' \
% (datasetId, datasetPackage)
raise RuntimeError(error) from e
error = f"Cannot find the {datasetPackage} package; is it set up?"
raise ValueError(error) from e
else:
self._validatePackage()

Expand All @@ -87,7 +93,10 @@ def _initPackage(self, name):
# No initialization required at present
pass

# TODO: remove in DM-29042
@staticmethod
@deprecated(reason="The concept of 'supported' datasets is deprecated. This "
"method will be removed after v24.0.", version="v22.0", category=FutureWarning)
def getSupportedDatasets():
"""The ap_verify dataset IDs that can be passed to this class's constructor.

Expand All @@ -105,6 +114,7 @@ def getSupportedDatasets():
"""
return Dataset._getDatasetInfo().keys()

# TODO: remove in DM-29042
@staticmethod
def _getDatasetInfo():
"""Return external data on supported ap_verify datasets.
Expand Down
2 changes: 2 additions & 0 deletions python/lsst/ap/verify/testUtils.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ class DataTestCase(lsst.utils.tests.TestCase):
datasetKey = 'test'
"""The ap_verify dataset name that would be used on the command line (`str`).
"""
# TODO: remove datasetKey in DM-29042

@classmethod
def setUpClass(cls):
Expand All @@ -70,4 +71,5 @@ def setUpClass(cls):
# Hack the config for testing purposes
# Note that Config.instance is supposed to be immutable, so, depending on initialization order,
# this modification may cause other tests to see inconsistent config values
# TODO: remove in DM-29042
Config.instance._allInfo['datasets.' + cls.datasetKey] = cls.testDataset
6 changes: 5 additions & 1 deletion python/lsst/ap/verify/workspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import re
import stat

import lsst.skymap
import lsst.daf.persistence as dafPersist
import lsst.daf.butler as dafButler
import lsst.obs.base as obsBase
Expand Down Expand Up @@ -323,13 +324,16 @@ def workButler(self):
try:
# Hard-code the collection names because it's hard to infer the inputs from the Butler
queryButler = dafButler.Butler(self.repo, writeable=True) # writeable for _workButler
inputs = {"skymaps", "refcats"}
inputs = {
lsst.skymap.BaseSkyMap.SKYMAP_RUN_COLLECTION_NAME,
}
for dimension in queryButler.registry.queryDataIds('instrument'):
instrument = obsBase.Instrument.fromName(dimension["instrument"], queryButler.registry)
rawName = instrument.makeDefaultRawIngestRunName()
inputs.add(rawName)
self._ensureCollection(queryButler.registry, rawName, dafButler.CollectionType.RUN)
inputs.add(instrument.makeCalibrationCollectionName())
inputs.add(instrument.makeRefCatCollectionName())
inputs.update(queryButler.registry.queryCollections(re.compile(r"templates/\w+")))

# Create an output chain here, so that workButler can see it.
Expand Down