Packaging Data as a Dataset¶
The Dataset Framework is designed to be as generic as possible, and should be able to accomodate any collection of observations so long as the source observatory has an observatory interface (obs) package in the LSST software stack. This page describes how to create and maintain a dataset. It does not include configuring ap_verify to use the dataset.
Creating a Dataset Repository¶
Datasets are Git LFS repositories with a particular directory and file structure. The easiest way to create a new dataset is to create a repository, and add a copy of the dataset template repository as the initial commit. This will create empty directories for all data and will add placeholder files for dataset metadata.
Organizing the Data¶
- The
raw
andcalib
directories contain science and calibration data, respectively. The directories may have any internal structure. - The
templates
directory contains an LSST Butler repository containing processed images useable as templates. Template files must beTemplateCoadd
files produced by a compatible version of the LSST science pipeline. - The
refcats
directory contains one or more tar files, each containing containing one or more astrometric or photometric reference catalogs in HTM shard format.
The templates and reference catalogs need not be all-sky, but should cover the combined footprint of all the raw images.
Registering an Observatory Package¶
The observatory package must be named in two files:
ups/<package>.table
must contain a line readingsetupRequired(<obs-package>)
. For example, for DECam data this would readsetupRequired(obs_decam)
. If any other packages are required to process the data, they should have their ownsetupRequired
lines.repo/_mapper
must contain a single line with the name of the obs package’s mapper class. For DECam data this islsst.obs.decam.DecamMapper
.
Registering a Dataset Name¶
In order to be supported by ap_verify
, datasets must be registered in the package’s configuration file and registered as an optional EUPS dependency of ap_verify
.
The line for the new dataset should be committed to the ap_verify
Git repository.