Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25782: Add an "detect_isPrimary" flag to single frame processing schema #400

Merged
merged 4 commits into from Aug 4, 2020

Conversation

leeskelvin
Copy link
Contributor

No description provided.

doc="true if source has no children and is in the inner region of a coadd patch "
"and is in the inner region of a coadd tract "
"and is not \"detected\" in a pseudo-filter (see config.pseudoFilterList)",
doc=primaryDoc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need both "doc" and "primaryDoc"? Where do these each appear? I also noticed the coadd primary doc does not specify anything about sky sources, while the single frame one does. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the doc string for SetPrimaryFlagsTask is different depending on whether or not self.isSingleFrame is True (line 55) or False (line 43). If true, the doc string refers only to children and sky sources (or pseudo-filters, see below), however, if false, the doc string additionally refers to coadd patch and tract selections (which would not make sense in the single frame output). We use the if logic above to define primaryDoc, which is then subsequently placed into the main self.isPrimaryKey doc string. I think that's the best way to do it, but happy to consider another approach if you have something in mind?

With regards not mentioning sky sources in the coadd primary doc, the coadd primary doc refers to "pseduo-filters", of which sky sources are included. I agree that the language here may not be immediately obvious, but this was a decision made prior to this ticket and so we were reluctant to change it. I think the logic is that other additional 'pseudo-filters' can be added in addition to sky sources which would then be picked up by this task at the coadd level, but currently, only sky sources are implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is subtle! The order in which they're defined threw me off. For coadds, you define primaryDoc after the two docs because it includes them both. Can you follow this order for single frame even though it's not necessary, for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite follow. The two extra docs being defined for coadds are isPatchInnerKey and isTractInnerKey. These docs don't apply in the single frame case, so they're not being set there (so there's nothing to move primaryDoc above or below of). The single frame __init__ doc string is just the primaryDoc, which then gets added to isPrimaryKey. Sorry if I've missed something here, please let me know! (Edit: we could move the primaryDoc to the top of the coadd if block so that it's the first thing read, if that helps?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry :) I'm suggesting delete line 56, explicitly set doc = "true if source has no children and is not a sky source" on line 59, and then explicitly define primaryDoc as the last thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the feedback here. Marking the issue as resolved following our earlier discussion, cheers!

)

def run(self, sources, skyMap, tractInfo, patchInfo, includeDeblend=True):
def run(self, sources, skyMap=None, tractInfo=None, patchInfo=None,
includeDeblend=True):
"""Set is-primary and related flags on sources

@param[in,out] sources a SourceTable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take this opportunity to numpydoc this run method 🙂 I also suggest expanding the main doc string a bit to remind us what "is-primary" means and what any "related flags" may be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this docstring to below (please let me know if this looks okay):

    """Set is-patch-inner, is-tract-inner and is-primary flags on sources.
    For coadded imaging, the is-primary flag returns True when an object
    has no children, is in the inner region of a coadd patch, is in the
    inner region of a coadd trach, and is not detected in a pseudo-filter
    (e.g., a sky_object).
    For single frame imaging, the is-primary flag returns True when a
    source has no children and is not a sky source.

    Parameters
    ----------
    sources :
        A sourceTable. Reads in centroid fields and an nChild field.
        Writes is-patch-inner, is-tract-inner, and is-primary flags.
    skyMap : lsst.skymap.BaseSkyMap
        Sky tessellation object
    tractInfo : lsst.skymap.TractInfo
        Tract object
    patchInfo : lsst.skymap.PatchInfo
        Patch object
    includeDeblend : `bool`
        Include deblend information in isPrimary?
    """

Copy link
Contributor

@mrawls mrawls Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please do add whatever type sources is. And put all the types in backticks, even the lsst ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!


source.setFlag(self.isPrimaryKey, isPatchInner and isTractInner and not isPseudo)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment indicating this is the single frame case, especially since the if was quite long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added prior to the start of both if and else.

# set inner flags for each source and set primary flags for sources with no children
# (or all sources if deblend info not available)
innerFloatBBox = Box2D(patchInfo.getInnerBBox())
if not self.isSingleFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explicitly stating this is the coadd case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

Comment on lines 125 to 132
hasSkySources = True if "sky_source" in sources.schema else False
for source in sources:
if nChildKey is None or source.get(nChildKey) == 0:
if hasSkySources:
if not source["sky_source"]:
source.setFlag(self.isPrimaryKey, True)
else:
source.setFlag(self.isPrimaryKey, True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite surprised this is so short when the corresponding code for coadds is so long. Please consider pulling out this core bit into its own "actually set primary key now that we've dealt with whatever-TF coadd pseudofilters are" function so there is less code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that setFlag from both the coadd case and the single-frame case should be pulled out of the above if statement and put into its own section, or just that this single-frame case section should be cleaned up? If the latter, we did discuss this at length, but couldn't come up with a clean way to do it that was in keeping with the coadd case above - suggestions welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that both cases had code identical to

for source in sources:
     if nChildKey is None or source.get(nChildKey) == 0:
         if hasSkySources:
             if not source["sky_source"]:
                 source.setFlag(self.isPrimaryKey, True)
         else:
             source.setFlag(self.isPrimaryKey, True)

but now I see there's a sneaky other pseudoFilterKeys loop hiding in the coadd case. Honestly the whole thing feels a bit too meandery in the if/else logic train; I don't want you to spend forever optimizing the coadd side of things when you were just going in trying to add the single frame side of things. But if you can come up with a way to create a "setPrimaryFlag" function that works for both, that would be a lot nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after having thought about it for a while, I think the only option to achieve this would be to move the for loop one level higher and check whether or not we're working on coadd data or single-frame data within each iteration of the for loop. This might make the code clearer, but the introduction of an if-statement check for each iteration of the for loop may impact efficiency somewhat. With that said, and keeping the coadd block and the single-frame block separate, I've modified the single frame block to this (below) which should hopefully be more readable. The primary key is now only being assigned once (formerly twice). Thoughts welcome!

    # single frame case
    else:
        hasSkySources = True if "sky_source" in sources.schema else False
        for source in sources:
            hasNoChildren = True if nChildKey is None or source.get(nChildKey) == 0 else False
            isSkySource = False
            if hasSkySources:
                if source["sky_source"]:
                    isSkySource = True
            source.setFlag(self.isPrimaryKey, hasNoChildren and not isSkySource)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good - I like hasNoChildren, that adds a lot of clarity. It's fine to keep the coadd block and single frame blocks separate to avoid slowing down the looping.

Comment on lines 58 to 61
# make sure all sky sources are flagged as not primary
self.assertTrue(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])) == 0)
# make sure all parent sources are flagged as not primary
self.assertTrue(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is efficient, but asserting the sum of two things being &-ed should be equal to zero is ... not a very intuitive check to me. Could you please break this down a bit, or explain why you did it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are bitwise ANDs for two boolean lists, i.e. (for line 59 above), we're checking that zero objects are both detect_isPrimary == True and sky_source == True. If any object has been labelled as a sky source, then it should not also be labelled as primary. Does that make sense? Happy to add some extra comments to the code if you think it might be clearer?

Copy link
Contributor

@mrawls mrawls Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the sum is what threw me off.

It might read a little easier as a few assertEqual statements instead? e.g., self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great, thanks for the suggestion - both asserts changed.

del self.exposure

def testIsPrimaryFlag(self):
"""Tests detect_isPrimary column gets added when run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it does more than this, please elaborate briefly, and specify this test is just for single frame processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring to read Tests detect_isPrimary column gets added when run, and that sources labelled as detect_isPrimary are not sky sources and have no children..

from lsst.pipe.tasks.calibrate import CalibrateTask, CalibrateConfig


class IsPrimaryTestCase(lsst.utils.tests.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it would be quick to add a parallel test for the coadd case, that would be lovely, but it's not essential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback on this. On balance, as this ticket is focussed on the single frame case, we think it best to push the coadd unit test to a future ticket and leave this one as-is, but this is certainly something worth coming back to once this has been merged.

@@ -36,25 +36,31 @@ class SetPrimaryFlagsConfig(Config):
class SetPrimaryFlagsTask(Task):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this brief class docstring:

"""Add isPrimaryKey to a given schema.

Parameters
----------
schema :
    The input schema.
isSingleFrame : `bool`
    Flag specifying if task is operating with single frame imaging.
kwargs :
    Keyword arguments passed to the task.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The only missing piece is the type for schema. I don't know it offhand... and that's exactly why having it in the docs is useful :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Added (it's lsst.afw.table.Schema!)

@@ -254,6 +255,10 @@ class CalibrateConfig(pipeBase.PipelineTaskConfig, pipelineConnections=Calibrate
target=SingleFrameMeasurementTask,
doc="Measure sources"
)
setPrimaryFlags = pexConfig.ConfigurableField(
target=SetPrimaryFlagsTask,
doc="Set flags for primary source classification in single frame processing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please very briefly define what a primary source means here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to this: Set flags for primary source classification in single frame processing. True if sources are not sky sources and not a parent.

The coadd level catalogs include a detect_isPrimary flag, useful for culling
catalogs of duplicate (patch/tract overlap), non-deblended parent sources
(deblend_nChild>0), and sky objects. This commit adds an equivalent flag for
single frame processing. Currently, the only two conditions for a source to
be considered primary are that it is not a parent (undeblended) source, and
not a sky source.
@leeskelvin leeskelvin merged commit 6654009 into master Aug 4, 2020
@leeskelvin leeskelvin deleted the tickets/DM-25782 branch August 4, 2020 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants