#Define command line tasks for pre-ingest transformation

* [DM-2191](https://jira.lsstcorp.org/browse/DM-2191), John Swinbank, 6 SPs.

Issue description:

> DM-1903 provided a command line task which would transform a src catalogue into calibrated form. Here, we build on that to provide command line tasks for all source catalogues which will need to be ingested; will include at least `deepCoadd_src`, `goodSeeingCoadd_src`, `chiSquaredCoadd_src`.

This notebook has been checked to run against the release tagged `w_2015_26`.


The idea here is that we start with a source catalogue containing the raw outputs of some measurement algorithm -- say, pixel position -- and we want to transform them to a calibrated form -- in this case, celestial position. In previous stories (DM-1903, DM-2192), we've defined things such that each measurement algorithm has a particular "transformation" from raw to calibrated form associated with it. That transformation can make use of the configuration of the measurement algorithm as well as WCS and photometric calibration information provided by the end user.

Here, we provide a generic set of command line tasks which, for each `dataRef` in an input repository, reads a source catalogue and a `calexp`. The `Wcs` and `Calib` objects are retrieved from the latter, and used to transform the source catalogue to a calibrated form, which is written back to the repository.

We will demonstrate this by transforming `src` records, this is very easy using the contents of `obs_test`. We'll use the following repository:

In [1]:
import os.path
import lsst.utils
inputDir = os.path.join(lsst.utils.getPackageDir('obs_test'), "data", "input")
inputDir

'/Users/jds/Projects/Astronomy/LSST/stack/DarwinX86/obs_test/10.1-4-g461b62d+42/data/input'

We'll configure `ProcessCcd` task to store just some minimal information in our catalogue: 

In [2]:
from lsst.pipe.tasks.processCcd import ProcessCcdTask, ProcessCcdConfig
cfg = ProcessCcdConfig()
cfg.measurement.value.slots.centroid = "base_GaussianCentroid"
cfg.measurement.value.plugins = ["base_GaussianCentroid", "base_SkyCoord"]
cfg.measurement.value.slots.shape = None
cfg.measurement.value.slots.psfFlux = None
cfg.measurement.value.slots.apFlux = None
cfg.measurement.value.slots.instFlux = None
cfg.measurement.value.slots.modelFlux = None

We'll store the output from our measurement in a temporary directory, and remember to clean it up later.

In [3]:
from tempfile import mkdtemp
outputDir = mkdtemp()

Now we can run the `ProcessCcd` task, pretending we're on the command line. Note that we ask for and capture the result, which includes the list of sources measured.

In [4]:
measResult = ProcessCcdTask.parseAndRun(args=[inputDir, "--output", outputDir, "--id", "visit=1"],
                                        config=cfg, doReturnResults=True)
len(measResult.resultList[0].result.sources)

167

Even with the simple measurement configuration we chose, we end up with a lot of fields in the output table:

In [5]:
measResult.resultList[0].result.sources.getSchema().getNames()

('base_GaussianCentroid_flag',
 'base_GaussianCentroid_flag_noPeak',
 'base_GaussianCentroid_x',
 'base_GaussianCentroid_y',
 'calib_detected',
 'calib_psfCandidate',
 'calib_psfUsed',
 'coord_dec',
 'coord_ra',
 'flags_negative',
 'id',
 'parent')

Let's take the results of that task and feed it into our transformation task. We can configure it to copy ("pass through") arbitrary fields from the measurement table in addition to performing the transformation itself. We'll just ask for a single field to be copied -- the source ID.

Note that we point it at the temporary directory in which we stored the measurements. We also need to provide it with the type of the configuration of the measurement operation -- given that name, we can retrieve the configuration itself from the Butler.

In [6]:
from lsst.pipe.tasks.transformMeasurement import SrcTransformTask, RunTransformConfig
cfg = RunTransformConfig()
cfg.transform.value.copyFields = ['id']
trResult = SrcTransformTask.parseAndRun(args=[outputDir, "--id", "visit=1", "-c", "inputConfigType=processCcd_config"],
                                        config=cfg, doReturnResults=True)
len(trResult.resultList[0].result)

167

Note that we end up with the same number of sources, but the schema is much abbreviated: we have the `id` field we asked for, the flags relating to the `GaussianCentroid` algorithm, and the _transformed_ position (ie, instead of `x`, `y`, we have `ra`, `dec`):

In [7]:
trResult.resultList[0].result.getSchema().getNames()

('base_GaussianCentroid_dec',
 'base_GaussianCentroid_flag',
 'base_GaussianCentroid_flag_noPeak',
 'base_GaussianCentroid_ra',
 'id')

The position calculated by transforming the `GaussianCentroid` position should correspond to the position recorded in the original source record. Let's check:

In [8]:
from lsst.afw.table import CoordKey
trCoordKey = CoordKey(trResult.resultList[0].result.getSchema()["base_GaussianCentroid"])
print "Original position:", measResult.resultList[0].result.sources[0].getCoord()
print "Transformation result:", trCoordKey.get(trResult.resultList[0].result[0])

Original position: IcrsCoord(79.0833925, -9.2650917)
Transformation result: IcrsCoord(79.0833925, -9.2650917)


Don't forget to clean up the working directory.

In [9]:
import shutil
shutil.rmtree(outputDir)

Note that similar transformation tasks are available for `forced_src` (`transformForcedSrcMeasurement.py`) and sources measured on coadds (`transformCoaddMeasurement.py`). Defining similar transformations for other Butler dataset types is easy following the examples in the [code](https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/transformMeasurement.py#L272)): in most cases, it's just a matter of providing a few [standard attributes](https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/transformMeasurement.py#L196).