Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-11118: Build stubbed out verify_ap #1

Merged
merged 1 commit into from
Jul 28, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,33 @@
This package manages end-to-end testing and metric generation for the LSST DM Alert Production pipeline. Metrics are tested against both project- and lower-level requirements, and will be deliverable to the SQuaSH metrics service.

`ap_verify` is part of the LSST Science Pipelines. You can learn how to install the Pipelines at https://pipelines.lsst.io/install/index.html.

## Configuration

`ap_verify` is configured from `config/dataset_config.yaml`. The file currently must have a single dictionary named `datasets`, which maps from user-visible dataset names to the eups package that implements them (see `Setting Up a Package`, below). Other configuration options may be added in the future.

### Setting Up a Package

`ap_verify` requires that all data be in a [dataset package](https://github.com/lsst-dm/ap_verify_dataset_template). It will create a workspace modeled after the package's `data` directory, then process any data found in the `raw` and `ref_cats` in the new workspace. Anything placed in `data` will be copied to a `ap_verify` run's workspace as-is, and must at least include a `_mapper` file naming the CameraMapper for the data.

The dataset package must work with eups, and must be registered in `config/dataset_config.yaml` in order for `ap_verify` to support it. `ap_verify` will use `eups setup` to prepare the dataset package and any dependencies; typically, they will include the `obs_` package for the instrument that took the data.

## Running ap_verify

A basic run on HiTS data:

python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --output workspace/hits/ --dataIdString "visit=54123"

This will create a workspace (a Butler repository) in `workspace/hits` based on `<hits-data>/data/`, ingest the HiTS data into it, then run visit 54123 through the entire AP pipeline. `ap_verify` also supports the `--rerun` system:

python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --rerun run1 --dataIdString "visit=54123"

This will create a workspace in `<hits-data>/rerun/run1/`. Since datasets are not, in general, repositories, many of the complexities of `--rerun` for Tasks (e.g., always using the highest-level repository) do not apply. In addition, the `--rerun` argument does not support input directories; the input for `ap_verify` will always be determined by the `--dataset`.

### Optional Arguments

`--silent`: Normally, `ap_verify` submits measurements to SQuaSH for quality tracking. This argument disables reporting for test runs. `ap_verify` will dump measurements to `ap_verify.verify.json` regardless of whether this flag is set.

`-j, --processes`: Specify a particular degree of parallelism. Like in Tasks, this argument may be taken at face value with no intelligent thread management.

`-h, --help, --version`: These arguments print a brief usage guide and the program version, respectively.
4 changes: 4 additions & 0 deletions SConstruct
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConstruct("ap_verify")

5 changes: 5 additions & 0 deletions config/dataset_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
datasets:
HiTS2015: ap_verify_hits2015
...

158 changes: 158 additions & 0 deletions python/lsst/ap/verify/ap_verify.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#
# LSST Data Management System
# Copyright 2017 LSST Corporation.
#
# This product includes software developed by the
# LSST Project (http://www.lsst.org/).
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the LSST License Statement and
# the GNU General Public License along with this program. If not,
# see <http://www.lsstcorp.org/LegalNotices/>.
#

"""Command-line program for running and analyzing AP pipeline.

In addition to containing ap_verify's main function, this module manages
command-line argument parsing.
"""

from __future__ import absolute_import, division, print_function

__all__ = []

import argparse
import os
import re

import lsst.log
from lsst.ap.verify.dataset import Dataset
from lsst.ap.verify.metrics import MetricsParser, check_squash_ready, AutoJob
from lsst.ap.verify.appipe import ApPipeParser, ApPipe


class _VerifyApParser(argparse.ArgumentParser):
"""An argument parser for data needed by this script.
"""

def __init__(self):
argparse.ArgumentParser.__init__(
self,
description='Executes the LSST DM AP pipeline and analyzes its performance using metrics.',
epilog='',
parents=[ApPipeParser(), MetricsParser()],
add_help=True)
self.add_argument('--dataset', choices=Dataset.get_supported_datasets(), required=True,
help='The source of data to pass through the pipeline.')

output = self.add_mutually_exclusive_group(required=True)
output.add_argument('--output', help='The location of the repository to use for program output.')
output.add_argument(
'--rerun', metavar='OUTPUT',
type=_FormattedType('[^:]+',
'Invalid name "%s"; ap_verify supports only output reruns. '
'You have entered something that appears to be of the form INPUT:OUTPUT. '
'Please specify only OUTPUT.'),
help='The location of the repository to use for program output, as DATASET/rerun/OUTPUT')

self.add_argument('--version', action='version', version='%(prog)s 0.1.0')


class _FormattedType:
"""An argparse type converter that requires strings in a particular format.

Leaves the input as a string if it matches, else raises ArgumentTypeError.

Parameters
----------
fmt: `str`
A regular expression that values must satisfy to be accepted. The *entire* string must match the
expression in order to pass.
msg: `str`
An error string to display for invalid values. The first "%s" shall be filled with the
invalid argument.
"""
def __init__(self, fmt, msg='"%s" does not have the expected format.'):
full_format = fmt
if not full_format.startswith('^'):
full_format = '^' + full_format
if not full_format.endswith('$'):
full_format += '$'
self._format = re.compile(full_format)
self._message = msg

def __call__(self, value):
if self._format.match(value):
return value
else:
raise argparse.ArgumentTypeError(self._message % value)


def _get_output_dir(input_dir, output_arg, rerun_arg):
"""Choose an output directory based on program arguments.

Parameters
----------
input_dir: `str`
The root directory of the input dataset.
output_arg: `str`
The directory given using the `--output` command line argument.
rerun_arg: `str`
The subdirectory given using the `--rerun` command line argument. Must
be relative to `input_rerun`.

Raises
------
`ValueError`:
Neither `output_arg` nor `rerun_arg` is None, or both are.
"""
if output_arg and rerun_arg:
raise ValueError('Cannot provide both --output and --rerun.')
if not output_arg and not rerun_arg:
raise ValueError('Must provide either --output or --rerun.')
if output_arg:
return output_arg
else:
return os.path.join(input_dir, "rerun", rerun_arg)


def _measure_final_properties(metrics_job):
"""Measure any metrics that apply to the final result of the AP pipeline,
rather than to a particular processing stage.

Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
pass


if __name__ == '__main__':
lsst.log.configure()
log = lsst.log.Log.getLogger('ap.verify.ap_verify.main')
# TODO: what is LSST's policy on exceptions escaping into main()?
args = _VerifyApParser().parse_args()
check_squash_ready(args)
log.debug('Command-line arguments: %s', args)

test_data = Dataset(args.dataset)
log.info('Dataset %s set up.', args.dataset)
output = _get_output_dir(test_data.dataset_root, args.output, args.rerun)
test_data.make_output_repo(output)
log.info('Output repo at %s created.', output)

with AutoJob(args) as job:
log.info('Running pipeline...')
pipeline = ApPipe(test_data, output, args)
pipeline.run(job)
_measure_final_properties(job)
162 changes: 162 additions & 0 deletions python/lsst/ap/verify/appipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#
# LSST Data Management System
# Copyright 2017 LSST Corporation.
#
# This product includes software developed by the
# LSST Project (http://www.lsst.org/).
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the LSST License Statement and
# the GNU General Public License along with this program. If not,
# see <http://www.lsstcorp.org/LegalNotices/>.
#

from __future__ import absolute_import, division, print_function

__all__ = ["ApPipeParser", "ApPipe"]

import argparse

import lsst.log
from lsst.ap.verify.dataset import Dataset
from lsst.ap.verify.pipeline import Pipeline


class ApPipeParser(argparse.ArgumentParser):
"""An argument parser for data needed by ap_pipe activities.

This parser is not complete, and is designed to be passed to another parser
using the `parent` parameter.
"""

def __init__(self):
# Help and documentation will be handled by main program's parser
argparse.ArgumentParser.__init__(self, add_help=False)
self.add_argument('--dataIdString', dest='dataId', required=True,
help='An identifier for the data to process. '
'May not support all features of a Butler dataId; '
'see the ap_pipe documentation for details.')
self.add_argument("-j", "--processes", default=1, type=int, help="Number of processes to use")


class ApPipe(Pipeline):
"""Wrapper for `lsst.ap.pipe` that executes all steps through source
association.

This class is not designed to have subclasses.

Parameters
----------
dataset: `dataset.Dataset`
The dataset on which the pipeline will be run.
working_dir: `str`
The repository in which temporary products will be created. Must be
compatible with `dataset`.
parsed_cmd_line: `argparse.Namespace`
Command-line arguments, including all arguments supported by `ApPipeParser`.
"""

def __init__(self, dataset, working_dir, parsed_cmd_line):
Pipeline.__init__(self, dataset, working_dir)
self._dataId = parsed_cmd_line.dataId
self._parallelization = parsed_cmd_line.processes

def _ingest_raws(self):
"""Ingest the raw data for use by LSST.

The original data directory shall not be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _ingest_calibs(self):
"""Ingest the raw calibrations for use by LSST.

The original calibration directory shall not be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _ingest_templates(self):
"""Ingest precomputed templates for use by LSST.

The templates may be either LSST `calexp` or LSST
`deepCoadd_psfMatchedWarp`. The original template directory shall not
be modified.
"""
# use self.dataset and self.repo
raise NotImplementedError

def _process(self, metrics_job):
"""Run single-frame processing on a dataset.

Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._dataId, self._parallelization
raise NotImplementedError

def _difference(self, metrics_job):
"""Run image differencing on a dataset.

Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._dataId, self._parallelization
raise NotImplementedError

def _associate(self, metrics_job):
"""Run source association on a dataset.

Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
# use self.repo, self._parallelization
raise NotImplementedError

def _post_process(self):
"""Run post-processing on a dataset.

This step is called the "afterburner" in some design documents.
"""
# use self.repo
pass

def run(self, metrics_job):
"""Run `ap_pipe` on this object's dataset.

Parameters
----------
metrics_job: `verify.Job`
The Job object to which to add any metric measurements made.
"""
log = lsst.log.Log.getLogger('ap.verify.appipe.ApPipe.run')

self._ingest_raws()
self._ingest_calibs()
self._ingest_templates()
log.info('Data ingested')

self._process(metrics_job)
log.info('Single-frame processing complete')
self._difference(metrics_job)
log.info('Image differencing complete')
self._associate(metrics_job)
log.info('Source association complete')
self._post_process()
log.info('Pipeline complete')