New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25782: Add an "detect_isPrimary" flag to single frame processing schema #400
Conversation
doc="true if source has no children and is in the inner region of a coadd patch " | ||
"and is in the inner region of a coadd tract " | ||
"and is not \"detected\" in a pseudo-filter (see config.pseudoFilterList)", | ||
doc=primaryDoc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need both "doc" and "primaryDoc"? Where do these each appear? I also noticed the coadd primary doc does not specify anything about sky sources, while the single frame one does. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the doc string for SetPrimaryFlagsTask
is different depending on whether or not self.isSingleFrame
is True
(line 55) or False
(line 43). If true, the doc string refers only to children and sky sources (or pseudo-filters, see below), however, if false, the doc string additionally refers to coadd patch and tract selections (which would not make sense in the single frame output). We use the if
logic above to define primaryDoc
, which is then subsequently placed into the main self.isPrimaryKey
doc string. I think that's the best way to do it, but happy to consider another approach if you have something in mind?
With regards not mentioning sky sources in the coadd primary doc, the coadd primary doc refers to "pseduo-filters", of which sky sources are included. I agree that the language here may not be immediately obvious, but this was a decision made prior to this ticket and so we were reluctant to change it. I think the logic is that other additional 'pseudo-filters' can be added in addition to sky sources which would then be picked up by this task at the coadd level, but currently, only sky sources are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is subtle! The order in which they're defined threw me off. For coadds, you define primaryDoc after the two docs because it includes them both. Can you follow this order for single frame even though it's not necessary, for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't quite follow. The two extra docs being defined for coadds are isPatchInnerKey
and isTractInnerKey
. These docs don't apply in the single frame case, so they're not being set there (so there's nothing to move primaryDoc
above or below of). The single frame __init__
doc string is just the primaryDoc
, which then gets added to isPrimaryKey
. Sorry if I've missed something here, please let me know! (Edit: we could move the primaryDoc
to the top of the coadd if
block so that it's the first thing read, if that helps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry :) I'm suggesting delete line 56, explicitly set doc = "true if source has no children and is not a sky source"
on line 59, and then explicitly define primaryDoc
as the last thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the feedback here. Marking the issue as resolved following our earlier discussion, cheers!
) | ||
|
||
def run(self, sources, skyMap, tractInfo, patchInfo, includeDeblend=True): | ||
def run(self, sources, skyMap=None, tractInfo=None, patchInfo=None, | ||
includeDeblend=True): | ||
"""Set is-primary and related flags on sources | ||
|
||
@param[in,out] sources a SourceTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take this opportunity to numpydoc this run method 🙂 I also suggest expanding the main doc string a bit to remind us what "is-primary" means and what any "related flags" may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this docstring to below (please let me know if this looks okay):
"""Set is-patch-inner, is-tract-inner and is-primary flags on sources.
For coadded imaging, the is-primary flag returns True when an object
has no children, is in the inner region of a coadd patch, is in the
inner region of a coadd trach, and is not detected in a pseudo-filter
(e.g., a sky_object).
For single frame imaging, the is-primary flag returns True when a
source has no children and is not a sky source.
Parameters
----------
sources :
A sourceTable. Reads in centroid fields and an nChild field.
Writes is-patch-inner, is-tract-inner, and is-primary flags.
skyMap : lsst.skymap.BaseSkyMap
Sky tessellation object
tractInfo : lsst.skymap.TractInfo
Tract object
patchInfo : lsst.skymap.PatchInfo
Patch object
includeDeblend : `bool`
Include deblend information in isPrimary?
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please do add whatever type sources
is. And put all the types in backticks, even the lsst ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
|
||
source.setFlag(self.isPrimaryKey, isPatchInner and isTractInner and not isPseudo) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment indicating this is the single frame case, especially since the if
was quite long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added prior to the start of both if
and else
.
# set inner flags for each source and set primary flags for sources with no children | ||
# (or all sources if deblend info not available) | ||
innerFloatBBox = Box2D(patchInfo.getInnerBBox()) | ||
if not self.isSingleFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment explicitly stating this is the coadd case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
hasSkySources = True if "sky_source" in sources.schema else False | ||
for source in sources: | ||
if nChildKey is None or source.get(nChildKey) == 0: | ||
if hasSkySources: | ||
if not source["sky_source"]: | ||
source.setFlag(self.isPrimaryKey, True) | ||
else: | ||
source.setFlag(self.isPrimaryKey, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite surprised this is so short when the corresponding code for coadds is so long. Please consider pulling out this core bit into its own "actually set primary key now that we've dealt with whatever-TF coadd pseudofilters are" function so there is less code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that setFlag
from both the coadd case and the single-frame case should be pulled out of the above if statement and put into its own section, or just that this single-frame case section should be cleaned up? If the latter, we did discuss this at length, but couldn't come up with a clean way to do it that was in keeping with the coadd case above - suggestions welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that both cases had code identical to
for source in sources:
if nChildKey is None or source.get(nChildKey) == 0:
if hasSkySources:
if not source["sky_source"]:
source.setFlag(self.isPrimaryKey, True)
else:
source.setFlag(self.isPrimaryKey, True)
but now I see there's a sneaky other pseudoFilterKeys
loop hiding in the coadd case. Honestly the whole thing feels a bit too meandery in the if/else logic train; I don't want you to spend forever optimizing the coadd side of things when you were just going in trying to add the single frame side of things. But if you can come up with a way to create a "setPrimaryFlag" function that works for both, that would be a lot nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, after having thought about it for a while, I think the only option to achieve this would be to move the for loop one level higher and check whether or not we're working on coadd data or single-frame data within each iteration of the for loop. This might make the code clearer, but the introduction of an if-statement check for each iteration of the for loop may impact efficiency somewhat. With that said, and keeping the coadd block and the single-frame block separate, I've modified the single frame block to this (below) which should hopefully be more readable. The primary key is now only being assigned once (formerly twice). Thoughts welcome!
# single frame case
else:
hasSkySources = True if "sky_source" in sources.schema else False
for source in sources:
hasNoChildren = True if nChildKey is None or source.get(nChildKey) == 0 else False
isSkySource = False
if hasSkySources:
if source["sky_source"]:
isSkySource = True
source.setFlag(self.isPrimaryKey, hasNoChildren and not isSkySource)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good - I like hasNoChildren
, that adds a lot of clarity. It's fine to keep the coadd block and single frame blocks separate to avoid slowing down the looping.
tests/test_isPrimaryFlag.py
Outdated
# make sure all sky sources are flagged as not primary | ||
self.assertTrue(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])) == 0) | ||
# make sure all parent sources are flagged as not primary | ||
self.assertTrue(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is efficient, but asserting the sum of two things being &
-ed should be equal to zero is ... not a very intuitive check to me. Could you please break this down a bit, or explain why you did it this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are bitwise ANDs for two boolean lists, i.e. (for line 59 above), we're checking that zero objects are both detect_isPrimary == True
and sky_source == True
. If any object has been labelled as a sky source, then it should not also be labelled as primary. Does that make sense? Happy to add some extra comments to the code if you think it might be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the sum is what threw me off.
It might read a little easier as a few assertEqual statements instead? e.g., self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great, thanks for the suggestion - both asserts changed.
tests/test_isPrimaryFlag.py
Outdated
del self.exposure | ||
|
||
def testIsPrimaryFlag(self): | ||
"""Tests detect_isPrimary column gets added when run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it does more than this, please elaborate briefly, and specify this test is just for single frame processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docstring to read Tests detect_isPrimary column gets added when run, and that sources labelled as detect_isPrimary are not sky sources and have no children.
.
from lsst.pipe.tasks.calibrate import CalibrateTask, CalibrateConfig | ||
|
||
|
||
class IsPrimaryTestCase(lsst.utils.tests.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it would be quick to add a parallel test for the coadd case, that would be lovely, but it's not essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback on this. On balance, as this ticket is focussed on the single frame case, we think it best to push the coadd unit test to a future ticket and leave this one as-is, but this is certainly something worth coming back to once this has been merged.
@@ -36,25 +36,31 @@ class SetPrimaryFlagsConfig(Config): | |||
class SetPrimaryFlagsTask(Task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a nice friendly class docstring per https://developer.lsst.io/python/numpydoc.html?documentation#py-docstring-class-structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this brief class docstring:
"""Add isPrimaryKey to a given schema.
Parameters
----------
schema :
The input schema.
isSingleFrame : `bool`
Flag specifying if task is operating with single frame imaging.
kwargs :
Keyword arguments passed to the task.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The only missing piece is the type for schema
. I don't know it offhand... and that's exactly why having it in the docs is useful :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! Added (it's lsst.afw.table.Schema
!)
python/lsst/pipe/tasks/calibrate.py
Outdated
@@ -254,6 +255,10 @@ class CalibrateConfig(pipeBase.PipelineTaskConfig, pipelineConnections=Calibrate | |||
target=SingleFrameMeasurementTask, | |||
doc="Measure sources" | |||
) | |||
setPrimaryFlags = pexConfig.ConfigurableField( | |||
target=SetPrimaryFlagsTask, | |||
doc="Set flags for primary source classification in single frame processing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please very briefly define what a primary source means here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to this: Set flags for primary source classification in single frame processing. True if sources are not sky sources and not a parent.
The coadd level catalogs include a detect_isPrimary flag, useful for culling catalogs of duplicate (patch/tract overlap), non-deblended parent sources (deblend_nChild>0), and sky objects. This commit adds an equivalent flag for single frame processing. Currently, the only two conditions for a source to be considered primary are that it is not a parent (undeblended) source, and not a sky source.
52ae739
to
d043b1a
Compare
d043b1a
to
d850385
Compare
No description provided.