Implement framework for ap_verify.

The framework supports the current dataset system (see DM-11116). It has been designed to be easy to modify and extend, as the details of the verification framework are still being worked out.
lsst · Jul 28, 2017 · 1e210be · 1e210be
1 parent cf2fbc9
commit 1e210be
Show file tree

Hide file tree

Showing 11 changed files with 1,039 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -3,3 +3,33 @@
 This package manages end-to-end testing and metric generation for the LSST DM Alert Production pipeline. Metrics are tested against both project- and lower-level requirements, and will be deliverable to the SQuaSH metrics service.
 
 `ap_verify` is part of the LSST Science Pipelines. You can learn how to install the Pipelines at https://pipelines.lsst.io/install/index.html.
+
+## Configuration
+
+`ap_verify` is configured from `config/dataset_config.yaml`. The file currently must have a single dictionary named `datasets`, which maps from user-visible dataset names to the eups package that implements them (see `Setting Up a Package`, below). Other configuration options may be added in the future.
+
+### Setting Up a Package
+
+`ap_verify` requires that all data be in a [dataset package](https://github.com/lsst-dm/ap_verify_dataset_template). It will create a workspace modeled after the package's `data` directory, then process any data found in the `raw` and `ref_cats` in the new workspace. Anything placed in `data` will be copied to a `ap_verify` run's workspace as-is, and must at least include a `_mapper` file naming the CameraMapper for the data.
+
+The dataset package must work with eups, and must be registered in `config/dataset_config.yaml` in order for `ap_verify` to support it. `ap_verify` will use `eups setup` to prepare the dataset package and any dependencies; typically, they will include the `obs_` package for the instrument that took the data.
+
+## Running ap_verify
+
+A basic run on HiTS data:
+
+    python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --output workspace/hits/ --dataIdString "visit=54123"
+
+This will create a workspace (a Butler repository) in `workspace/hits` based on `<hits-data>/data/`, ingest the HiTS data into it, then run visit 54123 through the entire AP pipeline. `ap_verify` also supports the `--rerun` system:
+
+    python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --rerun run1 --dataIdString "visit=54123"
+
+This will create a workspace in `<hits-data>/rerun/run1/`. Since datasets are not, in general, repositories, many of the complexities of `--rerun` for Tasks (e.g., always using the highest-level repository) do not apply. In addition, the `--rerun` argument does not support input directories; the input for `ap_verify` will always be determined by the `--dataset`.
+
+### Optional Arguments
+
+`--silent`: Normally, `ap_verify` submits measurements to SQuaSH for quality tracking. This argument disables reporting for test runs. `ap_verify` will dump measurements to `ap_verify.verify.json` regardless of whether this flag is set.
+
+`-j, --processes`: Specify a particular degree of parallelism. Like in Tasks, this argument may be taken at face value with no intelligent thread management.
+
+`-h, --help, --version`: These arguments print a brief usage guide and the program version, respectively.
diff --git a/SConstruct b/SConstruct
@@ -0,0 +1,4 @@
+# -*- python -*-
+from lsst.sconsUtils import scripts
+scripts.BasicSConstruct("verify_ap")
+
diff --git a/config/dataset_config.yaml b/config/dataset_config.yaml
@@ -0,0 +1,5 @@
+---
+datasets:
+    HiTS2015: ap_verify_hits2015
+...
+
diff --git a/python/lsst/ap/verify/appipe.py b/python/lsst/ap/verify/appipe.py
@@ -0,0 +1,162 @@
+#
+# LSST Data Management System
+# Copyright 2017 LSST Corporation.
+#
+# This product includes software developed by the
+# LSST Project (http://www.lsst.org/).
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the LSST License Statement and
+# the GNU General Public License along with this program.  If not,
+# see <http://www.lsstcorp.org/LegalNotices/>.
+#
+
+from __future__ import absolute_import, division, print_function
+
+__all__ = ["ApPipeParser", "ApPipe"]
+
+import argparse
+
+import lsst.log
+from lsst.ap.verify.dataset import Dataset
+from lsst.ap.verify.pipeline import Pipeline
+
+
+class ApPipeParser(argparse.ArgumentParser):
+    """An argument parser for data needed by ap_pipe activities.
+
+    This parser is not complete, and is designed to be passed to another parser
+    using the `parent` parameter.
+    """
+
+    def __init__(self):
+        # Help and documentation will be handled by main program's parser
+        argparse.ArgumentParser.__init__(self, add_help=False)
+        self.add_argument('--dataIdString', dest='dataId', required=True,
+                          help='An identifier for the data to process. '
+                          'May not support all features of a Butler dataId; '
+                          'see the ap_pipe documentation for details.')
+        self.add_argument("-j", "--processes", default=1, type=int, help="Number of processes to use")
+
+
+class ApPipe(Pipeline):
+    """Wrapper for `lsst.ap.pipe` that executes all steps through source
+    association.
+
+    This class is not designed to have subclasses.
+
+    Parameters
+    ----------
+    dataset: `dataset.Dataset`
+        The dataset on which the pipeline will be run.
+    working_dir: `str`
+        The repository in which temporary products will be created. Must be
+        compatible with `dataset`.
+    parsed_cmd_line: `argparse.Namespace`
+        Command-line arguments, including all arguments supported by `ApPipeParser`.
+    """
+
+    def __init__(self, dataset, working_dir, parsed_cmd_line):
+        Pipeline.__init__(self, dataset, working_dir)
+        self._dataId = parsed_cmd_line.dataId
+        self._parallelization = parsed_cmd_line.processes
+
+    def _ingest_raws(self):
+        """Ingest the raw data for use by LSST.
+
+        The original data directory shall not be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _ingest_calibs(self):
+        """Ingest the raw calibrations for use by LSST.
+
+        The original calibration directory shall not be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _ingest_templates(self):
+        """Ingest precomputed templates for use by LSST.
+
+        The templates may be either LSST `calexp` or LSST
+        `deepCoadd_psfMatchedWarp`. The original template directory shall not
+        be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _process(self, metrics_job):
+        """Run single-frame processing on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._dataId, self._parallelization
+        raise NotImplementedError
+
+    def _difference(self, metrics_job):
+        """Run image differencing on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._dataId, self._parallelization
+        raise NotImplementedError
+
+    def _associate(self, metrics_job):
+        """Run source association on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._parallelization
+        raise NotImplementedError
+
+    def _post_process(self):
+        """Run post-processing on a dataset.
+
+        This step is called the "afterburner" in some design documents.
+        """
+        # use self.repo
+        pass
+
+    def run(self, metrics_job):
+        """Run `ap_pipe` on this object's dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        log = lsst.log.Log.getLogger('ap.verify.appipe.ApPipe.run')
+
+        self._ingest_raws()
+        self._ingest_calibs()
+        self._ingest_templates()
+        log.info('Data ingested')
+
+        self._process(metrics_job)
+        log.info('Single-frame processing complete')
+        self._difference(metrics_job)
+        log.info('Image differencing complete')
+        self._associate(metrics_job)
+        log.info('Source association complete')
+        self._post_process()
+        log.info('Pipeline complete')