Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21501: Implement internal aperture corrections for fgcmcal tract mode #19

Merged
merged 5 commits into from Oct 25, 2019

Conversation

erykoff
Copy link
Contributor

@erykoff erykoff commented Oct 18, 2019

Add the ability to use externally calibrated aperture corrections for use in tract mode when there isn't sufficient data to fit them empirically. Also fix a dataRef logic error when running globally (e.g. PDR1) and the ability to output stars in tract mode, used for qa and debugging.

This is supported in the latest version of fgcm to use externally calibrated
aperture correction slope parameters.
Also require that checkAllCcds is False for tract-based mode, because it should
be False (since visits and ccds will be specified explicitly in the dataRefs in
this mode).
This feature should not be generally needed, but is very useful for testing and debugging.
Copy link
Contributor

@laurenam laurenam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me...but I did have a few comment/questions that may be worth considering.

# Aperture correction input slope parameters. There should be one slope ber band.
# This is used when there is insufficient data to fit the parameters from the data
# itself (e.g. tract mode or RC2).
config.aperCorrInputParameters = (-1.0150, -0.9694, -1.7229, -1.4549, -1.1998)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should" or must be one slope per band? How does the algorithm know if there is insufficient data to fit the parameters? It seems that setting this list (side note: I've never seen ListFields set a s tuples before, but I guess there's no harm?!) is also serving as the config controlling the decision (i.e. if aperCorrInputParameters is set to a non-emply list, then use them, else, perform a fit?) If this is the case, this should be made VERY clear in the documentation. Another (but not necessarily preferable) pattern I've seen is to have a doFitAperCorrSlopes that controls this decision...it is an extra config parameter and can lead to confusion when parsing a config file and seeing aperCorrInputParameters, but not realizing that they aren't necessarily being used (although your doctstring can indicate they are not used when doFitAperCorrSlopes=True), and it's a pattern common throughout the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things on this! First: "must" is correct, I will change that. The number of elements is checked by the fgcm code, I'll make that clear. (This is also the case with the other config vars which are per band).
Second: the controlling variable on whether or not to perform a fit is aperCorrFitNBins but this is not actually described in the fgcmcal documentation.
So the idea is that these are the input parameters (optional!) and it will use these at the start of the run, and if aperCorrFitNBins == 0 then the output parameters will equal the input parameters. Otherwise they will be different.
This is not explicitly spelled out in the documentation, and I will improve that. But hopefully this logic does make sense, or do I need another config parameter or to rename something as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's convoluted, but as long as it is well-documented, I'm ok with whatever you feel is clear-enough.

@@ -105,7 +105,7 @@ def test_fgcmcalTasks(self):
nGoodZp = 26
nOkZp = 26
nBadZp = 1206
nStdStars = 389
nStdStars = 390
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I updated the tests to use the aperture corrections as the input, the fits got just a bit better and one more star passed the cuts.

@@ -226,6 +226,8 @@ def runDataRef(self, butler, dataRefs):

if not self.config.fgcmBuildStars.doReferenceMatches:
raise RuntimeError("Must run FgcmCalibrateTract with fgcmBuildStars.doReferenceMatches")
if self.config.fgcmBuildStars.checkAllCcds:
raise RuntimeError("Cannot run FgcmCalibrateTract with fgcmBuildStars.checkAllCcds set to True")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that the visit need to be explicitly provided, but why does ccd list need to be too (this comment is really based on the commit message)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that the ccd list needs to be given, it's that this flag should only be used in testing (in particular, when you have only a few ccds per visit.) This is in the docs for checkAllCcds. It's also the case that in tract mode you will have specified the visit list and the tract on the command line and the PerTractCcdDataIdContainer takes care of a lot of this.
My expectation is that the change to PipelineTask will make most/all of this obsolete.

"slope per band. This is used when there is insufficient data to "
"fit the parameters from the data itself (e.g. tract mode)."),
dtype=float,
default=[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be all-too-easy to get the order of these wrong. I assume the filter ordering is dictated by how they are set in bands, but I had to scroll up off the page (as I'm currently looking at it in GitHub) to find that config parameter, and your docstring is not specific about the ordering. Might a mapping be less error-prone?

I'm also puzzled by the name of this parameter. Using the term "input" implies (to me, at least) that there will be some accompanied "output". And, if these are "slopes", why call them "parameters"? This boils down to me wondering why the name isn't just aperCorrSlopes (with maybe a "fallback"/"default" or some such descriptor as, if I'm reading all this correctly, these are only meant to be used as the slopes if a live fitting was not possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason they aren't called aperCorrSlopes is that originally I had an slope/offset/pivot tuple here, and then I realized only the slope was useful. And didn't think to change the name. Which I will now!
As for the ordering, there are bunch of these that should be matched to bands. I will make sure that at least the documentation makes this clear. If you think a mapping would be useful, I can do that but there are a bunch of these and I'd probably do that on a separate ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave that decision up to you (just so long as whatever is there is clearly documented!)

@@ -172,10 +172,14 @@ def test_fgcmcalTract(self):

self.config = fgcmcal.FgcmCalibrateTractConfig()
self.fillDefaultBuildStarsConfig(self.config.fgcmBuildStars, visitDataRefName, ccdDataRefName)
self.config.fgcmBuildStars.checkAllCcds = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space needed?

# itself (e.g. tract mode or RC2).
config.aperCorrInputParameters = (-1.0150, -0.9694, -1.7229, -1.4549, -1.1998)
# "Fudge factors" for computing SED slope
config.sedFudgeFactors = (0.25, 1.0, 1.0, 0.25, 0.25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this fit in with allowing fixed calibration slopes? Should this maybe be a separate commit with an explanation where the numbers are coming from/used for?? Also, same issue with the error-prone ordering issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these numbers came from eyeballing some stellar loci that I did ages ago, but never checked in for this very reason. I could say these are what I use for DES...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future you might appreciate the comment!

@@ -347,7 +347,7 @@ def runDataRef(self, butler):

# Output the standard stars in stack format
if self.config.doRefcatOutput:
self._outputStandardStars(butler, stdCat, offsets)
self._outputStandardStars(butler, stdCat, offsets, self.config.datasetConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the code flow super closely, but does this call happen when in "tract mode"? I assume not because I think it should not work properly if the special datasetConfig with the tract added is not passed in...but maybe it does and not only repeats the call already made in generateOutputProducts(), but also gets an erroneous persistence path, so I thought I'd point it out and let you do the tracing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not get run in tract mode. Then the runDataRef isn't used, but generateOutputProducts is instead. It's one or the other, and I can add a code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I will rename generateOutputProducts to generateTractOutputProducts.

@erykoff erykoff merged commit 39230f8 into master Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants