lsst · kfindeisen · Jul 28, 2017 · Jul 13, 2017
diff --git a/README.md b/README.md
@@ -3,3 +3,33 @@
 This package manages end-to-end testing and metric generation for the LSST DM Alert Production pipeline. Metrics are tested against both project- and lower-level requirements, and will be deliverable to the SQuaSH metrics service.
 
 `ap_verify` is part of the LSST Science Pipelines. You can learn how to install the Pipelines at https://pipelines.lsst.io/install/index.html.
+
+## Configuration
+
+`ap_verify` is configured from `config/dataset_config.yaml`. The file currently must have a single dictionary named `datasets`, which maps from user-visible dataset names to the eups package that implements them (see `Setting Up a Package`, below). Other configuration options may be added in the future.
+
+### Setting Up a Package
+
+`ap_verify` requires that all data be in a [dataset package](https://github.com/lsst-dm/ap_verify_dataset_template). It will create a workspace modeled after the package's `data` directory, then process any data found in the `raw` and `ref_cats` in the new workspace. Anything placed in `data` will be copied to a `ap_verify` run's workspace as-is, and must at least include a `_mapper` file naming the CameraMapper for the data.
+
+The dataset package must work with eups, and must be registered in `config/dataset_config.yaml` in order for `ap_verify` to support it. `ap_verify` will use `eups setup` to prepare the dataset package and any dependencies; typically, they will include the `obs_` package for the instrument that took the data.
+
+## Running ap_verify
+
+A basic run on HiTS data:
+
+    python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --output workspace/hits/ --dataIdString "visit=54123"
+
+This will create a workspace (a Butler repository) in `workspace/hits` based on `<hits-data>/data/`, ingest the HiTS data into it, then run visit 54123 through the entire AP pipeline. `ap_verify` also supports the `--rerun` system:
+
+    python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --rerun run1 --dataIdString "visit=54123"
+
+This will create a workspace in `<hits-data>/rerun/run1/`. Since datasets are not, in general, repositories, many of the complexities of `--rerun` for Tasks (e.g., always using the highest-level repository) do not apply. In addition, the `--rerun` argument does not support input directories; the input for `ap_verify` will always be determined by the `--dataset`.
+
+### Optional Arguments
+
+`--silent`: Normally, `ap_verify` submits measurements to SQuaSH for quality tracking. This argument disables reporting for test runs. `ap_verify` will dump measurements to `ap_verify.verify.json` regardless of whether this flag is set.
+
+`-j, --processes`: Specify a particular degree of parallelism. Like in Tasks, this argument may be taken at face value with no intelligent thread management.
+
+`-h, --help, --version`: These arguments print a brief usage guide and the program version, respectively.
diff --git a/SConstruct b/SConstruct
@@ -0,0 +1,4 @@
+# -*- python -*-
+from lsst.sconsUtils import scripts
+scripts.BasicSConstruct("ap_verify")
+
diff --git a/config/dataset_config.yaml b/config/dataset_config.yaml
@@ -0,0 +1,5 @@
+---
+datasets:
+    HiTS2015: ap_verify_hits2015
+...
+
diff --git a/python/lsst/ap/verify/ap_verify.py b/python/lsst/ap/verify/ap_verify.py
@@ -0,0 +1,158 @@
+#
+# LSST Data Management System
+# Copyright 2017 LSST Corporation.
+#
+# This product includes software developed by the
+# LSST Project (http://www.lsst.org/).
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the LSST License Statement and
+# the GNU General Public License along with this program.  If not,
+# see <http://www.lsstcorp.org/LegalNotices/>.
+#
+
+"""Command-line program for running and analyzing AP pipeline.
+
+In addition to containing ap_verify's main function, this module manages
+command-line argument parsing.
+"""
+
+from __future__ import absolute_import, division, print_function
+
+__all__ = []
+
+import argparse
+import os
+import re
+
+import lsst.log
+from lsst.ap.verify.dataset import Dataset
+from lsst.ap.verify.metrics import MetricsParser, check_squash_ready, AutoJob
+from lsst.ap.verify.appipe import ApPipeParser, ApPipe
+
+
+class _VerifyApParser(argparse.ArgumentParser):
+    """An argument parser for data needed by this script.
+    """
+
+    def __init__(self):
+        argparse.ArgumentParser.__init__(
+            self,
+            description='Executes the LSST DM AP pipeline and analyzes its performance using metrics.',
+            epilog='',
+            parents=[ApPipeParser(), MetricsParser()],
+            add_help=True)
+        self.add_argument('--dataset', choices=Dataset.get_supported_datasets(), required=True,
+                          help='The source of data to pass through the pipeline.')
+
+        output = self.add_mutually_exclusive_group(required=True)
+        output.add_argument('--output', help='The location of the repository to use for program output.')
+        output.add_argument(
+            '--rerun', metavar='OUTPUT',
+            type=_FormattedType('[^:]+',
+                                'Invalid name "%s"; ap_verify supports only output reruns. '
+                                'You have entered something that appears to be of the form INPUT:OUTPUT. '
+                                'Please specify only OUTPUT.'),
+            help='The location of the repository to use for program output, as DATASET/rerun/OUTPUT')
+
+        self.add_argument('--version', action='version', version='%(prog)s 0.1.0')
+
+
+class _FormattedType:
+    """An argparse type converter that requires strings in a particular format.
+
+    Leaves the input as a string if it matches, else raises ArgumentTypeError.
+
+    Parameters
+    ----------
+    fmt: `str`
+        A regular expression that values must satisfy to be accepted. The *entire* string must match the
+        expression in order to pass.
+    msg: `str`
+        An error string to display for invalid values. The first "%s" shall be filled with the
+        invalid argument.
+    """
+    def __init__(self, fmt, msg='"%s" does not have the expected format.'):
+        full_format = fmt
+        if not full_format.startswith('^'):
+            full_format = '^' + full_format
+        if not full_format.endswith('$'):
+            full_format += '$'
+        self._format = re.compile(full_format)
+        self._message = msg
+
+    def __call__(self, value):
+        if self._format.match(value):
+            return value
+        else:
+            raise argparse.ArgumentTypeError(self._message % value)
+
+
+def _get_output_dir(input_dir, output_arg, rerun_arg):
+    """Choose an output directory based on program arguments.
+
+    Parameters
+    ----------
+    input_dir: `str`
+        The root directory of the input dataset.
+    output_arg: `str`
+        The directory given using the `--output` command line argument.
+    rerun_arg: `str`
+        The subdirectory given using the `--rerun` command line argument. Must
+        be relative to `input_rerun`.
+
+    Raises
+    ------
+    `ValueError`:
+        Neither `output_arg` nor `rerun_arg` is None, or both are.
+    """
+    if output_arg and rerun_arg:
+        raise ValueError('Cannot provide both --output and --rerun.')
+    if not output_arg and not rerun_arg:
+        raise ValueError('Must provide either --output or --rerun.')
+    if output_arg:
+        return output_arg
+    else:
+        return os.path.join(input_dir, "rerun", rerun_arg)
+
+
+def _measure_final_properties(metrics_job):
+    """Measure any metrics that apply to the final result of the AP pipeline,
+    rather than to a particular processing stage.
+
+    Parameters
+    ----------
+    metrics_job: `verify.Job`
+        The Job object to which to add any metric measurements made.
+    """
+    pass
+
+
+if __name__ == '__main__':
+    lsst.log.configure()
+    log = lsst.log.Log.getLogger('ap.verify.ap_verify.main')
+    # TODO: what is LSST's policy on exceptions escaping into main()?
+    args = _VerifyApParser().parse_args()
+    check_squash_ready(args)
+    log.debug('Command-line arguments: %s', args)
+
+    test_data = Dataset(args.dataset)
+    log.info('Dataset %s set up.', args.dataset)
+    output = _get_output_dir(test_data.dataset_root, args.output, args.rerun)
+    test_data.make_output_repo(output)
+    log.info('Output repo at %s created.', output)
+
+    with AutoJob(args) as job:
+        log.info('Running pipeline...')
+        pipeline = ApPipe(test_data, output, args)
+        pipeline.run(job)
+        _measure_final_properties(job)
diff --git a/python/lsst/ap/verify/appipe.py b/python/lsst/ap/verify/appipe.py
@@ -0,0 +1,162 @@
+#
+# LSST Data Management System
+# Copyright 2017 LSST Corporation.
+#
+# This product includes software developed by the
+# LSST Project (http://www.lsst.org/).
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the LSST License Statement and
+# the GNU General Public License along with this program.  If not,
+# see <http://www.lsstcorp.org/LegalNotices/>.
+#
+
+from __future__ import absolute_import, division, print_function
+
+__all__ = ["ApPipeParser", "ApPipe"]
+
+import argparse
+
+import lsst.log
+from lsst.ap.verify.dataset import Dataset
+from lsst.ap.verify.pipeline import Pipeline
+
+
+class ApPipeParser(argparse.ArgumentParser):
+    """An argument parser for data needed by ap_pipe activities.
+
+    This parser is not complete, and is designed to be passed to another parser
+    using the `parent` parameter.
+    """
+
+    def __init__(self):
+        # Help and documentation will be handled by main program's parser
+        argparse.ArgumentParser.__init__(self, add_help=False)
+        self.add_argument('--dataIdString', dest='dataId', required=True,
+                          help='An identifier for the data to process. '
+                          'May not support all features of a Butler dataId; '
+                          'see the ap_pipe documentation for details.')
+        self.add_argument("-j", "--processes", default=1, type=int, help="Number of processes to use")
+
+
+class ApPipe(Pipeline):
+    """Wrapper for `lsst.ap.pipe` that executes all steps through source
+    association.
+
+    This class is not designed to have subclasses.
+
+    Parameters
+    ----------
+    dataset: `dataset.Dataset`
+        The dataset on which the pipeline will be run.
+    working_dir: `str`
+        The repository in which temporary products will be created. Must be
+        compatible with `dataset`.
+    parsed_cmd_line: `argparse.Namespace`
+        Command-line arguments, including all arguments supported by `ApPipeParser`.
+    """
+
+    def __init__(self, dataset, working_dir, parsed_cmd_line):
+        Pipeline.__init__(self, dataset, working_dir)
+        self._dataId = parsed_cmd_line.dataId
+        self._parallelization = parsed_cmd_line.processes
+
+    def _ingest_raws(self):
+        """Ingest the raw data for use by LSST.
+
+        The original data directory shall not be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _ingest_calibs(self):
+        """Ingest the raw calibrations for use by LSST.
+
+        The original calibration directory shall not be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _ingest_templates(self):
+        """Ingest precomputed templates for use by LSST.
+
+        The templates may be either LSST `calexp` or LSST
+        `deepCoadd_psfMatchedWarp`. The original template directory shall not
+        be modified.
+        """
+        # use self.dataset and self.repo
+        raise NotImplementedError
+
+    def _process(self, metrics_job):
+        """Run single-frame processing on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._dataId, self._parallelization
+        raise NotImplementedError
+
+    def _difference(self, metrics_job):
+        """Run image differencing on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._dataId, self._parallelization
+        raise NotImplementedError
+
+    def _associate(self, metrics_job):
+        """Run source association on a dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        # use self.repo, self._parallelization
+        raise NotImplementedError
+
+    def _post_process(self):
+        """Run post-processing on a dataset.
+
+        This step is called the "afterburner" in some design documents.
+        """
+        # use self.repo
+        pass
+
+    def run(self, metrics_job):
+        """Run `ap_pipe` on this object's dataset.
+
+        Parameters
+        ----------
+        metrics_job: `verify.Job`
+            The Job object to which to add any metric measurements made.
+        """
+        log = lsst.log.Log.getLogger('ap.verify.appipe.ApPipe.run')
+
+        self._ingest_raws()
+        self._ingest_calibs()
+        self._ingest_templates()
+        log.info('Data ingested')
+
+        self._process(metrics_job)
+        log.info('Single-frame processing complete')
+        self._difference(metrics_job)
+        log.info('Image differencing complete')
+        self._associate(metrics_job)
+        log.info('Source association complete')
+        self._post_process()
+        log.info('Pipeline complete')