Skip to content

Commit

Permalink
Implement framework for ap_verify.
Browse files Browse the repository at this point in the history
The framework supports the current dataset system (see DM-11116). It
has been designed to be easy to modify and extend, as the details of
the verification framework are still being worked out.
  • Loading branch information
kfindeisen committed Jul 28, 2017
1 parent cf2fbc9 commit 1e210be
Show file tree
Hide file tree
Showing 11 changed files with 1,039 additions and 0 deletions.
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,33 @@
This package manages end-to-end testing and metric generation for the LSST DM Alert Production pipeline. Metrics are tested against both project- and lower-level requirements, and will be deliverable to the SQuaSH metrics service.

`ap_verify` is part of the LSST Science Pipelines. You can learn how to install the Pipelines at https://pipelines.lsst.io/install/index.html.

## Configuration

`ap_verify` is configured from `config/dataset_config.yaml`. The file currently must have a single dictionary named `datasets`, which maps from user-visible dataset names to the eups package that implements them (see `Setting Up a Package`, below). Other configuration options may be added in the future.

### Setting Up a Package

`ap_verify` requires that all data be in a [dataset package](https://github.com/lsst-dm/ap_verify_dataset_template). It will create a workspace modeled after the package's `data` directory, then process any data found in the `raw` and `ref_cats` in the new workspace. Anything placed in `data` will be copied to a `ap_verify` run's workspace as-is, and must at least include a `_mapper` file naming the CameraMapper for the data.

The dataset package must work with eups, and must be registered in `config/dataset_config.yaml` in order for `ap_verify` to support it. `ap_verify` will use `eups setup` to prepare the dataset package and any dependencies; typically, they will include the `obs_` package for the instrument that took the data.

## Running ap_verify

A basic run on HiTS data:

python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --output workspace/hits/ --dataIdString "visit=54123"

This will create a workspace (a Butler repository) in `workspace/hits` based on `<hits-data>/data/`, ingest the HiTS data into it, then run visit 54123 through the entire AP pipeline. `ap_verify` also supports the `--rerun` system:

python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --rerun run1 --dataIdString "visit=54123"

This will create a workspace in `<hits-data>/rerun/run1/`. Since datasets are not, in general, repositories, many of the complexities of `--rerun` for Tasks (e.g., always using the highest-level repository) do not apply. In addition, the `--rerun` argument does not support input directories; the input for `ap_verify` will always be determined by the `--dataset`.

### Optional Arguments

`--silent`: Normally, `ap_verify` submits measurements to SQuaSH for quality tracking. This argument disables reporting for test runs. `ap_verify` will dump measurements to `ap_verify.verify.json` regardless of whether this flag is set.

`-j, --processes`: Specify a particular degree of parallelism. Like in Tasks, this argument may be taken at face value with no intelligent thread management.

`-h, --help, --version`: These arguments print a brief usage guide and the program version, respectively.
4 changes: 4 additions & 0 deletions SConstruct
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConstruct("verify_ap")

5 changes: 5 additions & 0 deletions config/dataset_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
datasets:
HiTS2015: ap_verify_hits2015
...

162 changes: 162 additions & 0 deletions python/lsst/ap/verify/appipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#
# LSST Data Management System
# Copyright 2017 LSST Corporation.
#
# This product includes software developed by the
# LSST Project (http://www.lsst.org/).
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the LSST License Statement and
# the GNU General Public License along with this program. If not,
# see <http://www.lsstcorp.org/LegalNotices/>.
#

from __future__ import absolute_import, division, print_function

__all__ = ["ApPipeParser", "ApPipe"]

import argparse

import lsst.log
from lsst.ap.verify.dataset import Dataset
from lsst.ap.verify.pipeline import Pipeline


class ApPipeParser(argparse.ArgumentParser):
"""An argument parser for data needed by ap_pipe activities.
This parser is not complete, and is designed to be passed to another parser
using the `parent` parameter.
"""

def __init__(self):
# Help and documentation will be handled by main program's parser
argparse.ArgumentParser.__init__(self, add_help=False)
self.add_argument('--dataIdString', dest='dataId', required=True,
help='An identifier for the data to process. '
'May not support all features of a Butler dataId; '
'see the ap_pipe documentation for details.')
self.add_argument("-j", "--processes", default=1, type=int, help="Number of processes to use")


class ApPipe(Pipeline):
"""Wrapper for `lsst.ap.pipe` that executes all steps through source
association.
This class is not designed to have subclasses.
Parameters
----------
dataset: `dataset.Dataset`
The dataset on which the pipeline will be run.
working_dir: `str`
The repository in which temporary products will be created. Must be
compatible with `dataset`.
parsed_cmd_line: `argparse.Namespace`
Command-line arguments, including all arguments supported by `ApPipeParser`.
"""

def __init__(self, dataset, working_dir, parsed_cmd_line):
Pipeline.__init__(self, dataset, working_dir)
self._dataId = parsed_cmd_line.dataId
self._parallelization = parsed_cmd_line.processes

def _ingest_raws(self):
"""Ingest the raw data for use by LSST.
The original data directory shall not be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _ingest_calibs(self):
"""Ingest the raw calibrations for use by LSST.
The original calibration directory shall not be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _ingest_templates(self):
"""Ingest precomputed templates for use by LSST.
The templates may be either LSST `calexp` or LSST
`deepCoadd_psfMatchedWarp`. The original template directory shall not
be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _process(self, metrics_job):
"""Run single-frame processing on a dataset.
Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._dataId, self._parallelization
raise NotImplementedError

def _difference(self, metrics_job):
"""Run image differencing on a dataset.
Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._dataId, self._parallelization
raise NotImplementedError

def _associate(self, metrics_job):
"""Run source association on a dataset.
Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._parallelization
raise NotImplementedError

def _post_process(self):
"""Run post-processing on a dataset.
This step is called the "afterburner" in some design documents.
"""
# use self.repo
pass

def run(self, metrics_job):
"""Run `ap_pipe` on this object's dataset.
Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
log = lsst.log.Log.getLogger('ap.verify.appipe.ApPipe.run')

self._ingest_raws()
self._ingest_calibs()
self._ingest_templates()
log.info('Data ingested')

self._process(metrics_job)
log.info('Single-frame processing complete')
self._difference(metrics_job)
log.info('Image differencing complete')
self._associate(metrics_job)
log.info('Source association complete')
self._post_process()
log.info('Pipeline complete')

0 comments on commit 1e210be

Please sign in to comment.