Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21501: Implement internal aperture corrections for fgcmcal tract mode #19

Merged
merged 5 commits into from
Oct 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions cookbook/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ The environment should be set up as follows:
```
setup lsst_distrib

export RCRERUN=RC/w_2019_18/DM-19151-sfm
export COOKBOOKRERUN=fgcm_cookbook_w_2019_18
export RCRERUN=RC/w_2019_38/DM-21386-sfm
export COOKBOOKRERUN=fgcm_cookbook_w_2019_38
```

The `RCRERUN` env variable should be set to the most recent completed rerun
Expand Down
10 changes: 8 additions & 2 deletions cookbook/fgcmFitCycleHscCookbook_cycle00_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,14 @@
config.expGrayHighCut = (0.2, 0.2, 0.2, 0.2, 0.2)
# Number of bins to do aperture correction. Not currently supported in LSST stack.
config.aperCorrFitNBins = 0
# "Fudge factors" for computing SED slope (best values for HSC not determined yet)
config.sedFudgeFactors = (1.0, 1.0, 1.0, 1.0, 1.0)
# Aperture correction input slope parameters. There should be one slope ber band.
# This is used when there is insufficient data to fit the parameters from the data
# itself (e.g. tract mode or RC2).
config.aperCorrInputParameters = (-1.0150, -0.9694, -1.7229, -1.4549, -1.1998)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should" or must be one slope per band? How does the algorithm know if there is insufficient data to fit the parameters? It seems that setting this list (side note: I've never seen ListFields set a s tuples before, but I guess there's no harm?!) is also serving as the config controlling the decision (i.e. if aperCorrInputParameters is set to a non-emply list, then use them, else, perform a fit?) If this is the case, this should be made VERY clear in the documentation. Another (but not necessarily preferable) pattern I've seen is to have a doFitAperCorrSlopes that controls this decision...it is an extra config parameter and can lead to confusion when parsing a config file and seeing aperCorrInputParameters, but not realizing that they aren't necessarily being used (although your doctstring can indicate they are not used when doFitAperCorrSlopes=True), and it's a pattern common throughout the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things on this! First: "must" is correct, I will change that. The number of elements is checked by the fgcm code, I'll make that clear. (This is also the case with the other config vars which are per band).
Second: the controlling variable on whether or not to perform a fit is aperCorrFitNBins but this is not actually described in the fgcmcal documentation.
So the idea is that these are the input parameters (optional!) and it will use these at the start of the run, and if aperCorrFitNBins == 0 then the output parameters will equal the input parameters. Otherwise they will be different.
This is not explicitly spelled out in the documentation, and I will improve that. But hopefully this logic does make sense, or do I need another config parameter or to rename something as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's convoluted, but as long as it is well-documented, I'm ok with whatever you feel is clear-enough.

# "Fudge factors" for computing SED slope
# These are approximating by looking at stellar SED templates, and are the same
# as used in DES calibrations (-E. Rykoff)
config.sedFudgeFactors = (0.25, 1.0, 1.0, 0.25, 0.25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this fit in with allowing fixed calibration slopes? Should this maybe be a separate commit with an explanation where the numbers are coming from/used for?? Also, same issue with the error-prone ordering issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these numbers came from eyeballing some stellar loci that I did ages ago, but never checked in for this very reason. I could say these are what I use for DES...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future you might appreciate the comment!

# Color cuts for stars to use for calibration. Each element is a string with
# band1, band2, range_low, range_high such that range_low < (band1 - band2) < range_high
config.starColorCuts = ('g,r,-0.25,2.25',
Expand Down
14 changes: 12 additions & 2 deletions python/lsst/fgcmcal/fgcmBuildStars.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,8 @@ def findAndGroupDataRefs(self, butler, dataRefs):
# only says that a given raw is available, and the src may not
# be accessible or processed in a specific repo.
dataRefs = butler.subset('src')
elif self.config.checkAllCcds:
scanAllCcds = True

groupedDataRefs = {}
for dataRef in dataRefs:
Expand All @@ -522,10 +524,18 @@ def findAndGroupDataRefs(self, butler, dataRefs):
for ccdId in ccdIds:
dataId[self.config.ccdDataRefName] = ccdId
if butler.datasetExists('src', dataId=dataId):
groupedDataRefs[visit].append(butler.dataRef('src', dataId=dataId))
goodDataRef = butler.dataRef('src', dataId=dataId)
if visit in groupedDataRefs:
if (goodDataRef.dataId[self.config.ccdDataRefName] not in
[d.dataId[self.config.ccdDataRefName] for d in groupedDataRefs[visit]]):
groupedDataRefs[visit].append(goodDataRef)
else:
groupedDataRefs[visit] = [goodDataRef]
else:
if visit in groupedDataRefs:
groupedDataRefs[visit].append(dataRef)
if (dataRef.dataId[self.config.ccdDataRefName] not in
[d.dataId[self.config.ccdDataRefName] for d in groupedDataRefs[visit]]):
groupedDataRefs[visit].append(dataRef)
else:
groupedDataRefs[visit] = [dataRef]

Expand Down
14 changes: 8 additions & 6 deletions python/lsst/fgcmcal/fgcmCalibrateTract.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ class FgcmCalibrateTractConfig(pexConfig.Config):
def setDefaults(self):
pexConfig.Config.setDefaults(self)

self.fgcmBuildStars.checkAllCcds = True
self.fgcmBuildStars.checkAllCcds = False
self.fgcmFitCycle.useRepeatabilityForExpGrayCuts = True
self.fgcmFitCycle.quietMode = True
self.fgcmOutputProducts.doReferenceCalibration = False
Expand Down Expand Up @@ -226,6 +226,8 @@ def runDataRef(self, butler, dataRefs):

if not self.config.fgcmBuildStars.doReferenceMatches:
raise RuntimeError("Must run FgcmCalibrateTract with fgcmBuildStars.doReferenceMatches")
if self.config.fgcmBuildStars.checkAllCcds:
raise RuntimeError("Cannot run FgcmCalibrateTract with fgcmBuildStars.checkAllCcds set to True")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that the visit need to be explicitly provided, but why does ccd list need to be too (this comment is really based on the commit message)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that the ccd list needs to be given, it's that this flag should only be used in testing (in particular, when you have only a few ccds per visit.) This is in the docs for checkAllCcds. It's also the case that in tract mode you will have specified the visit list and the tract on the command line and the PerTractCcdDataIdContainer takes care of a lot of this.
My expectation is that the change to PipelineTask will make most/all of this obsolete.


self.makeSubtask("fgcmBuildStars", butler=butler)
self.makeSubtask("fgcmOutputProducts", butler=butler)
Expand Down Expand Up @@ -433,11 +435,11 @@ def runDataRef(self, butler, dataRefs):
stdStruct = fgcmFitCycle.fgcmStars.retrieveStdStarCatalog(fgcmFitCycle.fgcmPars)
stdCat = makeStdCat(stdSchema, stdStruct)

outStruct = self.fgcmOutputProducts.generateOutputProducts(butler, tract,
visitCat,
zptCat, atmCat, stdCat,
self.config.fgcmBuildStars,
self.config.fgcmFitCycle)
outStruct = self.fgcmOutputProducts.generateTractOutputProducts(butler, tract,
visitCat,
zptCat, atmCat, stdCat,
self.config.fgcmBuildStars,
self.config.fgcmFitCycle)
outStruct.repeatability = fgcmFitCycle.fgcmPars.compReservedRawRepeatability

return outStruct
32 changes: 24 additions & 8 deletions python/lsst/fgcmcal/fgcmFitCycle.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,15 @@ class FgcmFitCycleConfig(pexConfig.Config):
fitFlag = pexConfig.ListField(
doc=("Flag for which bands are directly constrained in the FGCM fit. "
"Bands set to 0 will have the atmosphere constrained from observations "
"in other bands on the same night."),
"in other bands on the same night. Must be same length as config.bands, "
"and matched band-by-band."),
dtype=int,
default=(0,),
)
requiredFlag = pexConfig.ListField(
doc=("Flag for which bands are required for a star to be considered a calibration "
"star in the FGCM fit. Typically this should be the same as fitFlag."),
"star in the FGCM fit. Typically this should be the same as fitFlag. Must "
"be same length as config.bands, and matched band-by-band."),
dtype=int,
default=(0,),
)
Expand Down Expand Up @@ -263,13 +265,13 @@ class FgcmFitCycleConfig(pexConfig.Config):
)
expGrayPhotometricCut = pexConfig.ListField(
doc=("Maximum (negative) exposure gray for a visit to be considered photometric. "
"There will be one value per band."),
"Must be same length as config.bands, and matched band-by-band."),
dtype=float,
default=(0.0,),
)
expGrayHighCut = pexConfig.ListField(
doc=("Maximum (positive) exposure gray for a visit to be considered photometric. "
"There will be one value per band."),
"Must be same length as config.bands, and matched band-by-band."),
dtype=float,
default=(0.0,),
)
Expand All @@ -293,12 +295,24 @@ class FgcmFitCycleConfig(pexConfig.Config):
default=0.05,
)
aperCorrFitNBins = pexConfig.Field(
doc="Aperture correction number of bins",
doc=("Number of aperture bins used in aperture correction fit. When set to 0"
"no fit will be performed, and the config.aperCorrInputSlopes will be "
"used if available."),
dtype=int,
default=None,
default=10,
)
aperCorrInputSlopes = pexConfig.ListField(
doc=("Aperture correction input slope parameters. These are used on the first "
"fit iteration, and aperture correction parameters will be updated from "
"the data if config.aperCorrFitNBins > 0. It is recommended to set this"
"when there is insufficient data to fit the parameters (e.g. tract mode). "
"If set, must be same length as config.bands, and matched band-by-band."),
dtype=float,
default=[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be all-too-easy to get the order of these wrong. I assume the filter ordering is dictated by how they are set in bands, but I had to scroll up off the page (as I'm currently looking at it in GitHub) to find that config parameter, and your docstring is not specific about the ordering. Might a mapping be less error-prone?

I'm also puzzled by the name of this parameter. Using the term "input" implies (to me, at least) that there will be some accompanied "output". And, if these are "slopes", why call them "parameters"? This boils down to me wondering why the name isn't just aperCorrSlopes (with maybe a "fallback"/"default" or some such descriptor as, if I'm reading all this correctly, these are only meant to be used as the slopes if a live fitting was not possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason they aren't called aperCorrSlopes is that originally I had an slope/offset/pivot tuple here, and then I realized only the slope was useful. And didn't think to change the name. Which I will now!
As for the ordering, there are bunch of these that should be matched to bands. I will make sure that at least the documentation makes this clear. If you think a mapping would be useful, I can do that but there are a bunch of these and I'd probably do that on a separate ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave that decision up to you (just so long as whatever is there is clearly documented!)

)
sedFudgeFactors = pexConfig.ListField(
doc="Fudge factors for computing linear SED from colors",
doc=("Fudge factors for computing linear SED from colors. Must be same length as "
"config.bands, and matched band-by-band."),
dtype=float,
default=(0,),
)
Expand All @@ -318,7 +332,9 @@ class FgcmFitCycleConfig(pexConfig.Config):
default=0.10,
)
approxThroughput = pexConfig.ListField(
doc="Approximate overall throughput at start of calibration observations",
doc=("Approximate overall throughput at start of calibration observations. "
"May be 1 element (same for all bands) or the same length as config.bands, "
"and matched band-by-band."),
dtype=float,
default=(1.0, ),
)
Expand Down
34 changes: 20 additions & 14 deletions python/lsst/fgcmcal/fgcmOutputProducts.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ def runDataRef(self, butler):

# Output the standard stars in stack format
if self.config.doRefcatOutput:
self._outputStandardStars(butler, stdCat, offsets)
self._outputStandardStars(butler, stdCat, offsets, self.config.datasetConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the code flow super closely, but does this call happen when in "tract mode"? I assume not because I think it should not work properly if the special datasetConfig with the tract added is not passed in...but maybe it does and not only repeats the call already made in generateOutputProducts(), but also gets an erroneous persistence path, so I thought I'd point it out and let you do the tracing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not get run in tract mode. Then the runDataRef isn't used, but generateOutputProducts is instead. It's one or the other, and I can add a code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I will rename generateOutputProducts to generateTractOutputProducts.


del stdCat

Expand All @@ -365,11 +365,11 @@ def runDataRef(self, butler):
# We return the zp offsets
return pipeBase.Struct(offsets=offsets)

def generateOutputProducts(self, butler, tract,
visitCat, zptCat, atmCat, stdCat,
fgcmBuildStarsConfig, fgcmFitCycleConfig):
def generateTractOutputProducts(self, butler, tract,
visitCat, zptCat, atmCat, stdCat,
fgcmBuildStarsConfig, fgcmFitCycleConfig):
"""
Generate the output products, as specified in the config.
Generate the output products for a given tract, as specified in the config.

This method is here to have an alternate entry-point for
FgcmCalibrateTract.
Expand Down Expand Up @@ -404,14 +404,18 @@ def generateOutputProducts(self, butler, tract,
self.log.warn("doReferenceCalibration is set, and is possibly redundant with "
"fitCycleConfig.doReferenceCalibration")

if self.config.doRefcatOutput:
raise RuntimeError("Cannot do reference catalog output in `tract` mode.")

if self.config.doReferenceCalibration:
offsets = self._computeReferenceOffsets(butler, stdCat)
else:
offsets = np.zeros(len(self.bands))

if self.config.doRefcatOutput:
# Create a special config that has the tract number in it
datasetConfig = copy.copy(self.config.datasetConfig)
datasetConfig.ref_dataset_name = '%s_%d' % (self.config.datasetConfig.ref_dataset_name,
tract)
self._outputStandardStars(butler, stdCat, offsets, datasetConfig)

if self.config.doZeropointOutput:
self._outputZeropoints(butler, zptCat, visitCat, offsets, tract=tract)

Expand Down Expand Up @@ -599,7 +603,7 @@ def _computeOffsetOneBand(self, sourceMapper, badStarKey,

return struct

def _outputStandardStars(self, butler, stdCat, offsets):
def _outputStandardStars(self, butler, stdCat, offsets, datasetConfig):
"""
Output standard stars in indexed reference catalog format.

Expand All @@ -610,9 +614,11 @@ def _outputStandardStars(self, butler, stdCat, offsets):
FGCM standard star catalog from fgcmFitCycleTask
offsets: `numpy.array` of floats
Per band zeropoint offsets
datasetConfig: `lsst.meas.algorithms.DatasetConfig`
Config for reference dataset
"""

self.log.info("Outputting standard stars to %s" % (self.config.datasetConfig.ref_dataset_name))
self.log.info("Outputting standard stars to %s" % (datasetConfig.ref_dataset_name))

# We determine the conversion from the native units (typically radians) to
# degrees for the first star. This allows us to treat coord_ra/coord_dec as
Expand All @@ -627,7 +633,7 @@ def _outputStandardStars(self, butler, stdCat, offsets):

# Write the master schema
dataId = self.indexer.makeDataId('master_schema',
self.config.datasetConfig.ref_dataset_name)
datasetConfig.ref_dataset_name)
masterCat = afwTable.SimpleCatalog(formattedCat.schema)
addRefCatMetadata(masterCat)
butler.put(masterCat, 'ref_cat', dataId=dataId)
Expand All @@ -647,12 +653,12 @@ def _outputStandardStars(self, butler, stdCat, offsets):

# Write the individual pixel
dataId = self.indexer.makeDataId(indices[i1a[0]],
self.config.datasetConfig.ref_dataset_name)
datasetConfig.ref_dataset_name)
butler.put(formattedCat[selected], 'ref_cat', dataId=dataId)

# And save the dataset configuration
dataId = self.indexer.makeDataId(None, self.config.datasetConfig.ref_dataset_name)
butler.put(self.config.datasetConfig, 'ref_cat_config', dataId=dataId)
dataId = self.indexer.makeDataId(None, datasetConfig.ref_dataset_name)
butler.put(datasetConfig, 'ref_cat_config', dataId=dataId)

self.log.info("Done outputting standard stars.")

Expand Down
1 change: 1 addition & 0 deletions python/lsst/fgcmcal/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ def makeConfigDict(config, log, camera, maxIter,
'illegalValue': -9999.0, # internally used by fgcm.
'starColorCuts': starColorCutList,
'aperCorrFitNBins': config.aperCorrFitNBins,
'aperCorrInputSlopes': np.array(config.aperCorrInputSlopes),
'sedFudgeFactors': np.array(config.sedFudgeFactors),
'colorSplitIndices': np.array(config.colorSplitIndices),
'sigFgcmMaxErr': config.sigFgcmMaxErr,
Expand Down
6 changes: 6 additions & 0 deletions tests/fgcmcalTestBase.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,12 @@ def _testFgcmCalibrateTract(self, visits, tract,
self.assertTrue(butler.datasetExists('transmission_atmosphere_fgcm_tract',
tract=tract, visit=visit))

# Check that we got the reference catalog output.
# This will raise an exception if the catalog is not there.
config = LoadIndexedReferenceObjectsConfig()
config.ref_dataset_name = 'fgcm_stars_%d' % (tract)
task = LoadIndexedReferenceObjectsTask(butler, config=config) # noqa F841

def _checkResult(self, result):
"""
Check the result output from the task
Expand Down
10 changes: 7 additions & 3 deletions tests/test_fgcmcal_hsc.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ def test_fgcmcalTasks(self):
nGoodZp = 26
nOkZp = 26
nBadZp = 1206
nStdStars = 389
nStdStars = 390
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I updated the tests to use the aperture corrections as the input, the fits got just a bit better and one more star passed the cuts.

nPlots = 34

self._testFgcmFitCycle(nZp, nGoodZp, nOkZp, nBadZp, nStdStars, nPlots, skipChecks=True)
Expand Down Expand Up @@ -142,7 +142,7 @@ def test_fgcmcalTasks(self):
self.config.refObjLoader.ref_dataset_name = "sdss-dr9-fink-v5b"

filterMapping = {'r': 'HSC-R', 'i': 'HSC-I'}
zpOffsets = np.array([-0.001419307198, -0.001693746657])
zpOffsets = np.array([-0.0013903317740, -0.0020539460238])

self._testFgcmOutputProducts(visitDataRefName, ccdDataRefName,
filterMapping, zpOffsets,
Expand Down Expand Up @@ -172,10 +172,13 @@ def test_fgcmcalTract(self):

self.config = fgcmcal.FgcmCalibrateTractConfig()
self.fillDefaultBuildStarsConfig(self.config.fgcmBuildStars, visitDataRefName, ccdDataRefName)
self.config.fgcmBuildStars.checkAllCcds = False
self.fillDefaultFitCycleConfig(self.config.fgcmFitCycle)
self.config.maxFitCycles = 2

rawRepeatability = np.array([0.006505881543, 0.008963255070])
self.config.fgcmOutputProducts.doRefcatOutput = True

rawRepeatability = np.array([0.007070288705, 0.0074971053995])
filterNCalibMap = {'HSC-R': 13,
'HSC-I': 13}

Expand Down Expand Up @@ -270,6 +273,7 @@ def fillDefaultFitCycleConfig(self, config):
config.expGrayPhotometricCut = (-0.05, -0.05)
config.expGrayHighCut = (0.2, 0.2)
config.aperCorrFitNBins = 0
config.aperCorrInputSlopes = (-0.9694, -1.7229)
config.sedFudgeFactors = (1.0, 1.0)
config.starColorCuts = ('r,i,-0.50,2.25',)
config.freezeStdAtmosphere = True
Expand Down