Running ap_verify From the Command Line

ap_verify is a Python script designed to be run on both developer machines and verification servers. While ap_verify is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical. This page describes the minimum options needed to run ap_verify. For more details, see the Command-Line Reference or run ap_verify.py -h.

Datasets as Input Arguments

Since ap_verify begins with an uningested dataset, the input argument is a dataset name rather than a repository.

Datasets are identified by a name that gets mapped to an eups-registered directory containing the data. The mapping is configurable. The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ap_verify.

How to Run ap_verify in a New Workspace

Using the HiTS 2015 dataset as an example, one can run ap_verify as follows:

python ap_verify/bin/ap_verify.py --dataset HiTS2015 --output workspace/hits/ --id "visit=54123 ccd=25 filter=g" --silent

Here:

  • HiTS2015 is the dataset name,
  • workspace/hits/ is the location of the Butler repository in which the pipeline will work,
  • visit=54123 ccd=25 filter=g is the dataId to process, and
  • --silent disables SQuaSH metrics reporting.

This will create a workspace (a Butler repository) in workspace/hits based on <hits-data>/data/, ingest the HiTS data into it, then run visit 54123 through the entire AP pipeline.

Note

The command-line interface for ap_verify is at present much more limited than those of command-line tasks. In particular, only file-based repositories are supported, and compound dataIds cannot be provided. See the Command-Line Reference for details.

Warning

ap_verify.py does not support running multiple instances concurrently. Attempting to run two or more programs, particularly from the same working directory, may cause them to compete for access to the workspace or to overwrite each others’ metrics.

How to Run ap_verify in the Dataset Directory

It is also possible to place a workspace in a subdirectory of a dataset directory. The syntax for this mode is:

python python/lsst/ap/verify/ap_verify.py --dataset HiTS2015 --rerun run1 --id "visit=54123 ccd=25 filter=g" --silent

The --rerun run1 argument will create a workspace in <hits-data>/rerun/run1/. Since datasets are not, in general, repositories, the --rerun parameter only superficially resembles the analogous argument for command-line tasks. In particular, ap_verify‘s --rerun does not support repository chaining (as in --rerun input:output); the input for ap_verify will always be determined by the --dataset.

How to Use Measurements of Metrics

After ap_verify has run, it will produce a file named ap_verify.verify.json in the working directory. This file contains metric measurements in lsst.verify format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019. The file name is currently hard-coded, but may be customizable in a future version.

Unless the --silent argument is provided, ap_verify will also upload measurements to the SQuaSH service on completion. See the SQuaSH documentation for details.

If the pipeline is interrupted by a fatal error, completed measurements will be saved to ap_verify.verify.json for debugging purposes, but nothing will get sent to SQuaSH. See the error-handling policy for details.