lsst · kfindeisen · Mar 31, 2021 · Mar 24, 2021 · Mar 24, 2021 · Mar 24, 2021
diff --git a/config/dataset_config.yaml b/config/dataset_config.yaml
@@ -1,5 +1,5 @@
 ---
-datasets:
+datasets:  # TODO: remove in DM-29042
     HiTS2015: ap_verify_hits2015
     CI-HiTS2015: ap_verify_ci_hits2015
     CI-CosmosPDR2: ap_verify_ci_cosmos_pdr2

diff --git a/doc/lsst.ap.verify/command-line-reference.rst b/doc/lsst.ap.verify/command-line-reference.rst
@@ -77,16 +77,16 @@ Required arguments are :option:`--dataset` and :option:`--output`.
    This option is identical to :option:`--id`, and will become the primary data ID argument as Gen 2 is retired.
    It is recommended over :option:`--id` for :option:`--gen3` runs.
 
-.. option:: --dataset <dataset_name>
+.. option:: --dataset <dataset_package>
 
-   **Input dataset designation.**
+   **Input dataset package.**
 
    The :doc:`input dataset <datasets>` is required for all ``ap_verify`` runs except when using :option:`--help`.
 
-   The argument is a unique name for the dataset, which can be associated with a repository in the :ref:`configuration file<ap-verify-configuration-dataset>`.
-   See :ref:`ap-verify-dataset-name` for more information on dataset names.
+   The argument is the name of the Git LFS repository containing the dataset to process.
+   The repository must be set up before running ``ap_verify``.
 
-   :ref:`Allowed names <ap-verify-datasets-index>` can be queried using the :option:`--help` argument.
+   This documentation includes a :ref:`list of supported datasets <ap-verify-datasets-index>`.
 
 .. option:: --dataset-metrics-config <filename>
 

diff --git a/doc/lsst.ap.verify/configuration.rst b/doc/lsst.ap.verify/configuration.rst
diff --git a/doc/lsst.ap.verify/datasets-creation.rst b/doc/lsst.ap.verify/datasets-creation.rst
@@ -12,7 +12,6 @@ Packaging data as a dataset
 
 :doc:`datasets` is designed to be as generic as possible, and should be able to accommodate any collection of observations so long as the source observatory has an :ref:`observatory interface (obs) package<obs-framework>` in the LSST software stack.
 This page describes how to create and maintain a dataset.
-It does not include :ref:`configuring ap_verify to use the dataset<ap-verify-configuration-dataset>`.
 
 .. _ap-verify-datasets-creation-gitlfs:
 
@@ -78,12 +77,3 @@ The observatory package must be named in two files:
   If any other packages are required to process the data, they should have their own ``setupRequired`` lines.
 * :file:`repo/_mapper` must contain a single line with the name of the obs package's mapper class.
   For DECam data this is ``lsst.obs.decam.DecamMapper``.
-
-.. _ap-verify-datasets-creation-name:
-
-Registering a dataset name
-==========================
-
-In order to be recognized by :option:`ap_verify.py --dataset`, datasets must be registered in ``ap_verify``'s :ref:`configuration file<ap-verify-configuration-dataset>`.
-The line for the new dataset should be committed to the ``ap_verify`` Git repository.
-To avoid accidental downloads, datasets **should not** be registered as an EUPS dependency of ``ap_verify``, even an optional one.
diff --git a/doc/lsst.ap.verify/datasets-install.rst b/doc/lsst.ap.verify/datasets-install.rst
@@ -27,7 +27,7 @@ Installation procedure
 ======================
 
 Use the `LSST Software Build Tool <https://developer.lsst.io/stack/lsstsw.html>`_ to request the dataset by its package name.
-A :ref:`list of existing datasets <ap-verify-datasets-index>` is maintained as part of this documentation.
+A :ref:`list of supported datasets <ap-verify-datasets-index>` is maintained as part of this documentation.
 Because of their large size (typically hundreds of GB), datasets are *never* installed as a dependency of another package; they must be requested explicitly.
 
 For example, to install the `Cosmos PDR2 <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_ CI dataset,

diff --git a/doc/lsst.ap.verify/datasets.rst b/doc/lsst.ap.verify/datasets.rst
@@ -34,14 +34,15 @@ In depth
 
 .. _ap-verify-datasets-index:
 
-Existing datasets
-=================
+Supported datasets
+==================
 
-These datasets are also listed when running :option:`ap_verify.py -h`.
+These datasets are maintained by the ``ap_verify`` group.
+There may be other datasets :ref:`formatted<ap-verify-datasets-structure>` for use with ``ap_verify``.
 
-* `HiTS2015 (HiTS 2015 with 2014 templates) <https://github.com/lsst/ap_verify_hits2015/>`_
-* `CI-HiTS2015 (HiTS 2015 CI Subset) <https://github.com/lsst/ap_verify_ci_hits2015/>`_
-* `CI-CosmosPDR2 (Cosmos DR2 ultradeep fields) <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_
+* `ap_verify_hits2015 (HiTS 2015 with 2014 templates) <https://github.com/lsst/ap_verify_hits2015/>`_
+* `ap_verify_ci_hits2015 (HiTS 2015 CI Subset) <https://github.com/lsst/ap_verify_ci_hits2015/>`_
+* `ap_verify_ci_cosmos_pdr2 (Cosmos DR2 ultradeep fields) <https://github.com/lsst/ap_verify_ci_cosmos_pdr2/>`_
 
 ..
    TODO: switch to toctree once these docs included in pipelines.lsst.io

diff --git a/doc/lsst.ap.verify/index.rst b/doc/lsst.ap.verify/index.rst
@@ -26,7 +26,6 @@ Using lsst.ap.verify
    failsafe
    new-metrics
    command-line-reference
-   configuration
 
 .. _lsst.ap.verify-contributing:
 

diff --git a/doc/lsst.ap.verify/running.rst b/doc/lsst.ap.verify/running.rst
@@ -13,17 +13,6 @@ While :command:`ap_verify.py` is not a :doc:`command-line task</modules/lsst.pip
 This page describes the most common options used to run ``ap_verify``.
 For more details, see the :doc:`command-line-reference` or run :option:`ap_verify.py -h`.
 
-.. _ap-verify-dataset-name:
-
-Datasets as input arguments
-===========================
-
-Since ``ap_verify`` begins with an uningested :doc:`dataset<datasets>`, the input argument is a dataset name rather than a repository.
-
-Datasets are identified by a name that gets mapped to an :doc:`installed eups-registered package <datasets-install>` containing the data.
-The mapping is :ref:`configurable<ap-verify-configuration-dataset>`.
-The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ``ap_verify``.
-
 .. _ap-verify-run-output:
 
 How to run ap_verify in a new workspace (Gen 2 pipeline)
@@ -35,11 +24,11 @@ Using the `Cosmos PDR2`_ CI dataset as an example, one can run :command:`ap_veri
 
 .. prompt:: bash
 
-   ap_verify.py --dataset CI-CosmosPDR2 --gen2 --id "visit=59150^59160 filter=HSC-G" --output workspaces/cosmos/
+   ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --id "visit=59150^59160 filter=HSC-G" --output workspaces/cosmos/
 
 Here the inputs are:
 
-* :command:`CI-CosmosPDR2` is the ``ap_verify`` :ref:`dataset name <ap-verify-dataset-name>`,
+* :command:`ap_verify_ci_cosmos_pdr2` is the ``ap_verify`` :ref:`dataset <ap-verify-datasets> to process,
 * :option:`--gen2` specifies to process the dataset using the Gen 2 pipeline framework,
 * :command:`visit=59150^59160 filter=HSC-G` is the :ref:`dataId<command-line-task-dataid-howto-about-dataid-keys>` to process,
 
@@ -53,7 +42,7 @@ It's also possible to run an entire dataset by omitting the :option:`--id` argum
 
 .. prompt:: bash
 
-   ap_verify.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/
+   ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --output workspaces/cosmos/
 
 .. note::
 
@@ -71,11 +60,11 @@ Using the `Cosmos PDR2`_ CI dataset as an example, one can run :command:`ap_veri
 
 .. prompt:: bash
 
-   ap_verify.py --dataset CI-CosmosPDR2 --gen3 --data-query "visit in (59150, 59160) and band='g'" --output workspaces/cosmos/
+   ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --data-query "visit in (59150, 59160) and band='g'" --output workspaces/cosmos/
 
 Here the inputs are:
 
-* :command:`CI-CosmosPDR2` is the ``ap_verify`` :ref:`dataset name <ap-verify-dataset-name>`,
+* :command:`ap_verify_ci_cosmos_pdr2` is the ``ap_verify`` :ref:`dataset <ap-verify-datasets>` to process,
 * :option:`--gen3` specifies to process the dataset using the Gen 3 pipeline framework,
 * :command:`visit in (59150, 59160) and band='g'` is the :ref:`data ID query <daf_butler_dimension_expressions>` to process,
 
@@ -89,7 +78,7 @@ It's also possible to run an entire dataset by omitting the :option:`--data-quer
 
 .. prompt:: bash
 
-   ap_verify.py --dataset CI-CosmosPDR2 --gen3 --output workspaces/cosmos/
+   ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output workspaces/cosmos/
 
 .. note::
 
@@ -110,7 +99,7 @@ Using the `Cosmos PDR2`_ dataset as an example, one can run ``ingest_dataset`` i
 
 .. prompt:: bash
 
-   ingest_dataset.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/
+   ingest_dataset.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --output workspaces/cosmos/
 
 The :option:`--dataset`, :option:`--output`, :option:`--gen2`, :option:`--gen3`, and :option:`--processes` arguments behave the same way as for :command:`ap_verify.py`.
 Other options from :command:`ap_verify.py` are not available.

diff --git a/python/lsst/ap/verify/ap_verify.py b/python/lsst/ap/verify/ap_verify.py
@@ -51,7 +51,7 @@ class _InputOutputParser(argparse.ArgumentParser):
     def __init__(self):
         # Help and documentation will be handled by main program's parser
         argparse.ArgumentParser.__init__(self, add_help=False)
-        self.add_argument('--dataset', action=_DatasetAction, choices=Dataset.getSupportedDatasets(),
+        self.add_argument('--dataset', action=_DatasetAction,
                           required=True, help='The source of data to pass through the pipeline.')
         self.add_argument('--output', required=True,
                           help='The location of the workspace to use for pipeline repositories.')

diff --git a/python/lsst/ap/verify/config.py b/python/lsst/ap/verify/config.py
@@ -24,6 +24,7 @@
 from lsst.daf.persistence import Policy
 
 
+# TODO: remove in DM-29042
 class Config:
     """Confuration manager for ``ap_verify``.
 

diff --git a/python/lsst/ap/verify/dataset.py b/python/lsst/ap/verify/dataset.py
@@ -22,6 +22,9 @@
 #
 
 import os
+import warnings
+
+from deprecated.sphinx import deprecated
 
 import lsst.daf.persistence as dafPersistence
 import lsst.daf.butler as dafButler
@@ -43,15 +46,15 @@ class Dataset:
     Parameters
     ----------
     datasetId : `str`
-       A tag identifying the dataset.
+       The name of the dataset package. A tag identifying the dataset is also
+       accepted, but this usage is deprecated.
 
     Raises
     ------
     RuntimeError
         Raised if `datasetId` exists, but is not correctly organized or incomplete
     ValueError
-        Raised if `datasetId` is not a recognized ap_verify dataset. No side
-        effects if this exception is raised.
+        Raised if `datasetId` could not be loaded.
     """
 
     def __init__(self, datasetId):
@@ -62,15 +65,18 @@ def __init__(self, datasetId):
             datasetPackage = self._getDatasetInfo()[datasetId]
             if datasetPackage is None:
                 raise KeyError
+            else:
+                warnings.warn(f"The {datasetId} name is deprecated, and will be removed after v24.0. "
+                              f"Use {datasetPackage} instead.", category=FutureWarning)
         except KeyError:
-            raise ValueError('Unsupported dataset: ' + datasetId)
+            # if datasetId not known, assume it's a package name
+            datasetPackage = datasetId
 
         try:
             self._dataRootDir = getPackageDir(datasetPackage)
         except pexExcept.NotFoundError as e:
-            error = 'Dataset %s requires the %s package, which has not been set up.' \
-                % (datasetId, datasetPackage)
-            raise RuntimeError(error) from e
+            error = f"Cannot find the {datasetPackage} package; is it set up?"
+            raise ValueError(error) from e
         else:
             self._validatePackage()
 
@@ -87,7 +93,10 @@ def _initPackage(self, name):
         # No initialization required at present
         pass
 
+    # TODO: remove in DM-29042
     @staticmethod
+    @deprecated(reason="The concept of 'supported' datasets is deprecated. This "
+                       "method will be removed after v24.0.", version="v22.0", category=FutureWarning)
     def getSupportedDatasets():
         """The ap_verify dataset IDs that can be passed to this class's constructor.
 
@@ -105,6 +114,7 @@ def getSupportedDatasets():
         """
         return Dataset._getDatasetInfo().keys()
 
+    # TODO: remove in DM-29042
     @staticmethod
     def _getDatasetInfo():
         """Return external data on supported ap_verify datasets.

diff --git a/python/lsst/ap/verify/testUtils.py b/python/lsst/ap/verify/testUtils.py
@@ -54,6 +54,7 @@ class DataTestCase(lsst.utils.tests.TestCase):
     datasetKey = 'test'
     """The ap_verify dataset name that would be used on the command line (`str`).
     """
+    # TODO: remove datasetKey in DM-29042
 
     @classmethod
     def setUpClass(cls):
@@ -70,4 +71,5 @@ def setUpClass(cls):
         # Hack the config for testing purposes
         # Note that Config.instance is supposed to be immutable, so, depending on initialization order,
         # this modification may cause other tests to see inconsistent config values
+        # TODO: remove in DM-29042
         Config.instance._allInfo['datasets.' + cls.datasetKey] = cls.testDataset
diff --git a/python/lsst/ap/verify/workspace.py b/python/lsst/ap/verify/workspace.py
@@ -27,6 +27,7 @@
 import re
 import stat
 
+import lsst.skymap
 import lsst.daf.persistence as dafPersist
 import lsst.daf.butler as dafButler
 import lsst.obs.base as obsBase
@@ -323,13 +324,16 @@ def workButler(self):
             try:
                 # Hard-code the collection names because it's hard to infer the inputs from the Butler
                 queryButler = dafButler.Butler(self.repo, writeable=True)  # writeable for _workButler
-                inputs = {"skymaps", "refcats"}
+                inputs = {
+                    lsst.skymap.BaseSkyMap.SKYMAP_RUN_COLLECTION_NAME,
+                }
                 for dimension in queryButler.registry.queryDataIds('instrument'):
                     instrument = obsBase.Instrument.fromName(dimension["instrument"], queryButler.registry)
                     rawName = instrument.makeDefaultRawIngestRunName()
                     inputs.add(rawName)
                     self._ensureCollection(queryButler.registry, rawName, dafButler.CollectionType.RUN)
                     inputs.add(instrument.makeCalibrationCollectionName())
+                    inputs.add(instrument.makeRefCatCollectionName())
                 inputs.update(queryButler.registry.queryCollections(re.compile(r"templates/\w+")))
 
                 # Create an output chain here, so that workButler can see it.