New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-31345: Fix the underlying bug resulting in PSF-matching failures in ci_hsc and ci_imsim #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the wording around the need to compute the PSF at center
confusing. In the commit message you imply wanting it to be the center of the exposure's bounding box, but I think you mean that you want to evaluate the PSF at the centroid of the source. Also, the statement in the PR:
The PSF shape is be computed at the centroid of the source being measured, even if the exposure is already clipped to the bounding box of the corresponding footprint
is confusing as what I think you mean to imply is that clipping to the source's footprint bounding box does not guarantee (or even necessarily suggest) that this will be centered on the source's centroid. Either a clarification of that, or just a simple statement that the PSF needs to be evaluated at the source of interest's centroid (as opposed to whatever default you get without explicitly specifying a position) would be more clear.
Also (could I be more nit-picky!), I find the statement:
the messages about unable to find PSF-matching kernel do not appear any more
a bit misleading as such a message still should appear in the case of a failure! (i.e. the point here is that you fixed the issue that was causing so many failures, not that you just got rid of a superfluous error message, right?)
@@ -549,7 +549,7 @@ def _gaussianizeAndMeasure(self, measRecord: lsst.afw.table.SourceRecord, | |||
raise measBase.FatalAlgorithmError("No PSF in exposure") | |||
wcs = exposure.getWcs() | |||
|
|||
seeing = psf.computeShape(center).getTraceRadius() | |||
seeing = psf.computeShape(center).getDeterminantRadius() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a casual reader of this code, I have no idea what "seeing" is explicitly supposed to represent, so have no way of assessing if this change is correct (even though I'm sure you know that it is!) Perhaps some words defining what "seeing" means or, even better, a variable name that is more explicit about what the variable represents. I'll note that you can find the stack definitions of these here: https://github.com/lsst/afw/blob/master/include/lsst/afw/geom/ellipses/BaseCore.h#L99-L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat intentional that seeing
is vague. It can either be trace radius or determinant radius, as long as it is used consistently. I have renamed it to psfSigma
, as the factor in _gaussianizePsf
task that converts to a FWHM is called sigma2fwhm
.
fwhmModel = targetPsfModel.computeShape().getDeterminantRadius()*sigma2fwhm | ||
self.log.debug(f"Ratio of model to science PSF = {fwhmModel/fwhmScience}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't keep up with the proper use of format strings in logging, but I will note that the use of f
here does not match the other self.log.debug
statements I see in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For logging, we're not supposed to use f
strings. It should be self.log.debug("Ratio of model to science PSF = %f", fwhmModel/fwhmScience)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also happy to remove this logging from here. It will appear something like: psfMatch DEBUG: Ratio of model to science PSF = 1.15
, which may still not be super helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally your call given that it's at debug
level 😄
except Exception as e: | ||
self.log.error("Failed to solve for PSF matching kernel in GAaP for (%f, %f): %s", | ||
(center.getX(), center.getY(), e)) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add this new raise to the docs. Also, the previous behavior did not include a raise
statement (even though the docs in _solve()
say that it does)...please give a thought to what the intended behavior is (i.e. should this really be a raise
or just a logging of the error and move on?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exactly a new raise
, but just bubbling up of an exception raised in the _solve
method. The only new change is that some useful information is logged before the exception reaches the plugin framework, which then calls the fail
method of the plugin. The plugin framework doesn't have the contextual information about the location (center
) where the failure occurred, so it is logged here before propagating the exception up. And since this is not the origin of the exception, I don't think this needs to mentioned in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I moved the logging from here and the plugin's fail
method does the logging now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modified logging helped uncover a bug in meas_base
about how undeblended plugins were initialiized. DM-31442 has to be merged before merging this branch to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right...somehow I missed the raise e
in the _solve()
try block...sorry for the noise! Can you kick off a fresh Jenkins with the new implementation (which I'll have a look at now).
728f542
to
731f848
Compare
The ``center`` variable needs to be specified explicitly to the ``computeShape`` method.
731f848
to
d6cdb49
Compare
I modified the description in the PR and in the first commit message. Of course, the comment about "Unable to find PSF-matching kernel" is not a generic one but that it does not appear when run on |
d6cdb49
to
762aff6
Compare
@@ -277,6 +278,7 @@ class BaseGaapFluxMixin: | |||
""" | |||
|
|||
ConfigClass = BaseGaapFluxConfig | |||
hasLogName = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin logging framework needs this.
https://pipelines.lsst.io/modules/lsst.meas.base/tasks_and_algorithms.html#how-a-plugin-can-get-its-log-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...saw that when the meas_base
PR popped up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That documentation is out of date. It shouldn't be referring to lsst.log in the "how to use" section.
except Exception as e: | ||
self.log.error("Failed to solve for PSF matching kernel in GAaP for (%f, %f): %s", | ||
(center.getX(), center.getY(), e)) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right...somehow I missed the raise e
in the _solve()
try block...sorry for the noise! Can you kick off a fresh Jenkins with the new implementation (which I'll have a look at now).
tests/test_gaap.py
Outdated
'ext_gaap_GaapFlux', | ||
schema, None) | ||
"ext_gaap_GaapFlux", | ||
schema, None, logName="testGaap.log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a long-shot...but would it be possible to force a failure to _solve()
in the test code to make sure it "does the right thing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the testFail
method does that, especially at L266 where flag_gaussianization
is set to True by the fail
method after logging. But running pytest or scons doesn't create the log file. Running the test script as python tests/test_gaap.py
(not recommended and is going to be phased out) prints it on the command line. It looks as follows:
.ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (100.000000, 670.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (100.000000, 870.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (-10.000000, -20.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
Any inputs on how to run the test_gaap.py in a manner to produce a log file? I was going to run forcedPhotCoadd
with just the logging changes on the ci_imsim
and ci_hsc
branches and see how the error messages look like, but it is of course not principled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhhmmm...no, I have no idea how to do that. It seems there are now no GAaP failures in the limited ci datasets (which is fundamentally a very good thing if the numbers are all sensible!) But what you have shown here does sufficiently scratch the itch I had to see an example of the log output on error, so thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.log
seems odd for the name of a logger -- it's not a file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed specifying any value for logName
, since the Python logging can handle None
. The logging aspects themselves are checked in the testFail
method explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes clear the meas_base documentation should have been updated to use python logging.
@@ -331,6 +333,9 @@ def __init__(self, config: BaseGaapFluxConfig, name, schema, logName=None) -> No | |||
flagName = self.ConfigClass._getGaapResultName(scalingFactor, "flag_gaussianization") | |||
flagDefs.add(flagName, "PSF Gaussianization failed when trying to scale by this factor.") | |||
|
|||
self.logName = logName | |||
self.log = lsst.log.getLogger(self.getLogName()) # getLogName is inherited from the plugin classes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No new python code should be using lsst.log
. meas_base
does not. Use logging.getLogger()
.
What log name does this end up as in practice? Should the logName
default to something that has gaap
in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BaseMeasurementTask
seems to assign the logName
, with the plugin name included. https://github.com/lsst/meas_base/blob/c63f6ee5de4496055ae9f4a2d7b9beb5fae67e73/python/lsst/meas/base/baseMeasurement.py#L272
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log name will end up as measurement.ext_gaap_GaapFlux
in practice. Now, unlike lsst.log
, Python's default logging method doesn't seem to being the message with the logger name. So running the testFail
method from the command line produces
ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (100.000000, 670.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (100.000000, 870.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (-10.000000, -20.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
unlike the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that’s actually ok. See discussion on this PR: lsst/meas_extensions_piff#4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is default python logging and we configure it when pipetask or command line tasks run. If you want to test that a log message is actually being issued you should use self.assertLogs
and with that you can check that an ERROR is issued and also what the message contains. Testing that an ERROR log message is issued might well be a good idea for this ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -331,6 +333,9 @@ def __init__(self, config: BaseGaapFluxConfig, name, schema, logName=None) -> No | |||
flagName = self.ConfigClass._getGaapResultName(scalingFactor, "flag_gaussianization") | |||
flagDefs.add(flagName, "PSF Gaussianization failed when trying to scale by this factor.") | |||
|
|||
self.logName = logName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You default this to None
so are you happy for this to be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job of providing the logName
is taken care of by the plugin framework. It actually can never handle None
case within the plugin, which is why I had to give some string in the unit tests, where I run the plugin outside of the measurement framework. I thought about getting the default logger if logName
is None
. Should I go that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Okay, it's not clear from the parameter defaulting that this is impossible in general but it sounds like it's common knowledge for all plugin writers. I suppose a comment wouldn't hurt to remind people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now mentioned in the docstrings, where logName
is defined.
@@ -277,6 +278,7 @@ class BaseGaapFluxMixin: | |||
""" | |||
|
|||
ConfigClass = BaseGaapFluxConfig | |||
hasLogName = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That documentation is out of date. It shouldn't be referring to lsst.log in the "how to use" section.
Log the exact error message if the PSF Gaussianization fails along with the location in the exposure.
Use determinant radius consistently and remove any use of trace radius.
762aff6
to
1d602ff
Compare
The PSF shape needs to be computed at the centroid of the source of interest. There was a subtler issue with the scaling being done on trace radius in some places and on determinant radius in some other places, although this inconsistency had little role to play. Both of these are fixed in this ticket, and the messages about unable to find PSF-matching kernel do not appear any more.