Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-31345: Fix the underlying bug resulting in PSF-matching failures in ci_hsc and ci_imsim #19

Merged
merged 4 commits into from Aug 19, 2021

Conversation

arunkannawadi
Copy link
Member

@arunkannawadi arunkannawadi commented Aug 16, 2021

The PSF shape needs to be computed at the centroid of the source of interest. There was a subtler issue with the scaling being done on trace radius in some places and on determinant radius in some other places, although this inconsistency had little role to play. Both of these are fixed in this ticket, and the messages about unable to find PSF-matching kernel do not appear any more.

Copy link

@laurenam laurenam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the wording around the need to compute the PSF at center confusing. In the commit message you imply wanting it to be the center of the exposure's bounding box, but I think you mean that you want to evaluate the PSF at the centroid of the source. Also, the statement in the PR:

The PSF shape is be computed at the centroid of the source being measured, even if the exposure is already clipped to the bounding box of the corresponding footprint

is confusing as what I think you mean to imply is that clipping to the source's footprint bounding box does not guarantee (or even necessarily suggest) that this will be centered on the source's centroid. Either a clarification of that, or just a simple statement that the PSF needs to be evaluated at the source of interest's centroid (as opposed to whatever default you get without explicitly specifying a position) would be more clear.

Also (could I be more nit-picky!), I find the statement:

the messages about unable to find PSF-matching kernel do not appear any more

a bit misleading as such a message still should appear in the case of a failure! (i.e. the point here is that you fixed the issue that was causing so many failures, not that you just got rid of a superfluous error message, right?)

@@ -549,7 +549,7 @@ def _gaussianizeAndMeasure(self, measRecord: lsst.afw.table.SourceRecord,
raise measBase.FatalAlgorithmError("No PSF in exposure")
wcs = exposure.getWcs()

seeing = psf.computeShape(center).getTraceRadius()
seeing = psf.computeShape(center).getDeterminantRadius()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a casual reader of this code, I have no idea what "seeing" is explicitly supposed to represent, so have no way of assessing if this change is correct (even though I'm sure you know that it is!) Perhaps some words defining what "seeing" means or, even better, a variable name that is more explicit about what the variable represents. I'll note that you can find the stack definitions of these here: https://github.com/lsst/afw/blob/master/include/lsst/afw/geom/ellipses/BaseCore.h#L99-L112

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat intentional that seeing is vague. It can either be trace radius or determinant radius, as long as it is used consistently. I have renamed it to psfSigma, as the factor in _gaussianizePsf task that converts to a FWHM is called sigma2fwhm.

fwhmModel = targetPsfModel.computeShape().getDeterminantRadius()*sigma2fwhm
self.log.debug(f"Ratio of model to science PSF = {fwhmModel/fwhmScience}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't keep up with the proper use of format strings in logging, but I will note that the use of f here does not match the other self.log.debug statements I see in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logging, we're not supposed to use f strings. It should be self.log.debug("Ratio of model to science PSF = %f", fwhmModel/fwhmScience)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also happy to remove this logging from here. It will appear something like: psfMatch DEBUG: Ratio of model to science PSF = 1.15, which may still not be super helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally your call given that it's at debug level 😄

except Exception as e:
self.log.error("Failed to solve for PSF matching kernel in GAaP for (%f, %f): %s",
(center.getX(), center.getY(), e))
raise e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add this new raise to the docs. Also, the previous behavior did not include a raise statement (even though the docs in _solve() say that it does)...please give a thought to what the intended behavior is (i.e. should this really be a raise or just a logging of the error and move on?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly a new raise, but just bubbling up of an exception raised in the _solve method. The only new change is that some useful information is logged before the exception reaches the plugin framework, which then calls the fail method of the plugin. The plugin framework doesn't have the contextual information about the location (center) where the failure occurred, so it is logged here before propagating the exception up. And since this is not the origin of the exception, I don't think this needs to mentioned in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I moved the logging from here and the plugin's fail method does the logging now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified logging helped uncover a bug in meas_base about how undeblended plugins were initialiized. DM-31442 has to be merged before merging this branch to master.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right...somehow I missed the raise e in the _solve() try block...sorry for the noise! Can you kick off a fresh Jenkins with the new implementation (which I'll have a look at now).

The ``center`` variable needs to be specified explicitly to the
``computeShape`` method.
@arunkannawadi
Copy link
Member Author

I modified the description in the PR and in the first commit message. Of course, the comment about "Unable to find PSF-matching kernel" is not a generic one but that it does not appear when run on ci_hsc and ci_imsim.

@@ -277,6 +278,7 @@ class BaseGaapFluxMixin:
"""

ConfigClass = BaseGaapFluxConfig
hasLogName = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...saw that when the meas_base PR popped up!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That documentation is out of date. It shouldn't be referring to lsst.log in the "how to use" section.

except Exception as e:
self.log.error("Failed to solve for PSF matching kernel in GAaP for (%f, %f): %s",
(center.getX(), center.getY(), e))
raise e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right...somehow I missed the raise e in the _solve() try block...sorry for the noise! Can you kick off a fresh Jenkins with the new implementation (which I'll have a look at now).

'ext_gaap_GaapFlux',
schema, None)
"ext_gaap_GaapFlux",
schema, None, logName="testGaap.log")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a long-shot...but would it be possible to force a failure to _solve() in the test code to make sure it "does the right thing"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the testFail method does that, especially at L266 where flag_gaussianization is set to True by the fail method after logging. But running pytest or scons doesn't create the log file. Running the test script as python tests/test_gaap.py (not recommended and is going to be phased out) prints it on the command line. It looks as follows:

.ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (100.000000, 670.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (100.000000, 870.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
measurement.ext_gaap_GaapFlux ERROR: Failed to solve for PSF matching kernel in GAaP for (-10.000000, -20.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')

Any inputs on how to run the test_gaap.py in a manner to produce a log file? I was going to run forcedPhotCoadd with just the logging changes on the ci_imsim and ci_hsc branches and see how the error messages look like, but it is of course not principled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhhmmm...no, I have no idea how to do that. It seems there are now no GAaP failures in the limited ci datasets (which is fundamentally a very good thing if the numbers are all sensible!) But what you have shown here does sufficiently scratch the itch I had to see an example of the log output on error, so thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.log seems odd for the name of a logger -- it's not a file name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed specifying any value for logName, since the Python logging can handle None. The logging aspects themselves are checked in the testFail method explicitly.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes clear the meas_base documentation should have been updated to use python logging.

@@ -331,6 +333,9 @@ def __init__(self, config: BaseGaapFluxConfig, name, schema, logName=None) -> No
flagName = self.ConfigClass._getGaapResultName(scalingFactor, "flag_gaussianization")
flagDefs.add(flagName, "PSF Gaussianization failed when trying to scale by this factor.")

self.logName = logName
self.log = lsst.log.getLogger(self.getLogName()) # getLogName is inherited from the plugin classes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new python code should be using lsst.log. meas_base does not. Use logging.getLogger().

What log name does this end up as in practice? Should the logName default to something that has gaap in it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BaseMeasurementTask seems to assign the logName, with the plugin name included. https://github.com/lsst/meas_base/blob/c63f6ee5de4496055ae9f4a2d7b9beb5fae67e73/python/lsst/meas/base/baseMeasurement.py#L272

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log name will end up as measurement.ext_gaap_GaapFlux in practice. Now, unlike lsst.log, Python's default logging method doesn't seem to being the message with the logger name. So running the testFail method from the command line produces

ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (100.000000, 670.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (100.000000, 870.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')
ERROR: Unable to calculate psf matching kernel
Failed to solve for PSF matching kernel in GAaP for (-10.000000, -20.000000): Problematic scaling factors = 100.0 Errors: Exception('Unable to determine kernel sum; 0 candidates')

unlike the previous comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s actually ok. See discussion on this PR: lsst/meas_extensions_piff#4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is default python logging and we configure it when pipetask or command line tasks run. If you want to test that a log message is actually being issued you should use self.assertLogs and with that you can check that an ERROR is issued and also what the message contains. Testing that an ERROR log message is issued might well be a good idea for this ticket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -331,6 +333,9 @@ def __init__(self, config: BaseGaapFluxConfig, name, schema, logName=None) -> No
flagName = self.ConfigClass._getGaapResultName(scalingFactor, "flag_gaussianization")
flagDefs.add(flagName, "PSF Gaussianization failed when trying to scale by this factor.")

self.logName = logName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You default this to None so are you happy for this to be None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job of providing the logName is taken care of by the plugin framework. It actually can never handle None case within the plugin, which is why I had to give some string in the unit tests, where I run the plugin outside of the measurement framework. I thought about getting the default logger if logName is None. Should I go that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Okay, it's not clear from the parameter defaulting that this is impossible in general but it sounds like it's common knowledge for all plugin writers. I suppose a comment wouldn't hurt to remind people.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now mentioned in the docstrings, where logName is defined.

@@ -277,6 +278,7 @@ class BaseGaapFluxMixin:
"""

ConfigClass = BaseGaapFluxConfig
hasLogName = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That documentation is out of date. It shouldn't be referring to lsst.log in the "how to use" section.

Log the exact error message if the PSF Gaussianization fails along with
the location in the exposure.
Use determinant radius consistently and remove any use of trace radius.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants