Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28542: Implement RFC-750 #457

Merged
merged 2 commits into from Mar 17, 2021
Merged

DM-28542: Implement RFC-750 #457

merged 2 commits into from Mar 17, 2021

Conversation

fred3m
Copy link
Contributor

@fred3m fred3m commented Jan 29, 2021

This PR fixes a few things:

  1. It refactors isPrimary to use vectorized columns
    instead of iteratinf over sources
  2. It appropriately uses the scarlet outputs to choose
    which type of isolated sources are characterized
    by isPrimary, and it adds the columns isSimpleLeaf
    and isModelLeaf to the catalog to differentiate
    between the two different types of isolted sources
  3. Sets isPrimary to isSimpleLeaf, mimicking the behavior
    of meas_deblender

python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
isolated parents, and a ModelLeaf, which uses
the scarlet models for both isolated and blended sources.
In the case of meas_deblender, a ModelLeaf is the same thing
as a blended source, and a SimpleModel has the same definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleModel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, SimpleLeaf.

nChildKey = "deblend_nChild"
nChild = sources[nChildKey]
parent = sources["parent"]
isModelLeaf = (parent != 0) & (nChild == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this explicitly exclude all truly isolated sources (i.e. with parent == 0 & nChild == 0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because there are no parent==0 & nChild == 0 sources in the scarlet outputs. nChild describes the number of children in the catalog deblended from a parent, and because scarlet will deblend all of the isolated sources, all of the isolated parents have nChild == 1. Maybe that's what you were missing? If you can think of a way for me to make that more clear in the docs let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not in scarlet...but what if someone retargets with meas_deblender? Another way to ask the question is "should there be a difference between isSimpleLeaf and isModelLeaf for catalogs deblended with meas_deblender? Taking it a step further, I'm inclined argue that meas_deblender deblended catalogues should not have the isSimpleLeaf and isModelLeaf columns added since they don't really hold any meaning there.

I just ran processCcd on this branch, so only meas_deblender is in play. The source catalog reveals the following:

In [51]: np.all(srcNew["detect_isPrimary"] == srcNew["detect_isSimpleLeaf"])
Out[51]: True
In [52]: np.all(srcNew["detect_isPrimary"] == srcNew["detect_isModelLeaf"])
Out[52]: False

So...either the difference between isSimpleLeaf and isModelLeaf is meaningless and those columns should not even be there, or they have meaning and I'm still not "getting it" 😜

python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
@laurenam
Copy link
Contributor

Commit message: iteratinf --> iterating?

Also, I don't know if it's quite accurate to say anything is "mimicking" the isPrimary setting of meas_deblender. Perhaps state it more loosely, something along the lines "coming as close as possible to..."

@fred3m
Copy link
Contributor Author

fred3m commented Feb 16, 2021

Also, I don't know if it's quite accurate to say anything is "mimicking" the isPrimary setting of meas_deblender. Perhaps state it more loosely, something along the lines "coming as close as possible to..."

It think that this was already answered in my previous comments, but just to make sure that this comment isn't dangling, isPrimary returns the exact same set of sources (unblended isolated sources and children from parent blends with more than 1 source) as meas_deblender.

python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
nChildKey = "deblend_nChild"
nChild = sources[nChildKey]
parent = sources["parent"]
isModelLeaf = (parent != 0) & (nChild == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not in scarlet...but what if someone retargets with meas_deblender? Another way to ask the question is "should there be a difference between isSimpleLeaf and isModelLeaf for catalogs deblended with meas_deblender? Taking it a step further, I'm inclined argue that meas_deblender deblended catalogues should not have the isSimpleLeaf and isModelLeaf columns added since they don't really hold any meaning there.

I just ran processCcd on this branch, so only meas_deblender is in play. The source catalog reveals the following:

In [51]: np.all(srcNew["detect_isPrimary"] == srcNew["detect_isSimpleLeaf"])
Out[51]: True
In [52]: np.all(srcNew["detect_isPrimary"] == srcNew["detect_isModelLeaf"])
Out[52]: False

So...either the difference between isSimpleLeaf and isModelLeaf is meaningless and those columns should not even be there, or they have meaning and I'm still not "getting it" 😜

isPatchInner = getPatchInner(sources, patchInfo)
isTractInner = getTractInner(sources, tractInfo, skyMap)
isPseudo = getPseudoSources(sources, self.config.pseudoFilterList, self.schema, self.log)
isPrimary = isTractInner & isPatchInner & ~isPseudo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so this is not really isPrimary as defined until line 272 is executed...it's just a step on its way to being "for reals" isPrimary.

tests/test_isPrimaryFlag.py Outdated Show resolved Hide resolved
tests/test_isPrimaryFlag.py Show resolved Hide resolved
tests/test_isPrimaryFlag.py Show resolved Hide resolved
tests/test_isPrimaryFlag.py Outdated Show resolved Hide resolved
outputCat = calibResults.outputCat
self.assertTrue("detect_isPrimary" in outputCat.schema.getNames())
# make sure all sky sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0)
# make sure all parent sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)), 0)

def testIsScarletPrimaryFlag(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test has some "issues"...debugging output follows (I wasn't following your skyMap mocking, so had to go digging...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cleaned up in the new commit. The previous version was using an older version of the mock skymap, tract, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. Now that you have the functionality, would you mind adding a similar test for meas_deblender runs and, in particular, making sure any scarlet-specific columns (whatever they be named) are NOT present? I.e. something like (warning: not fully tested):

    def testIsMeasDeblenderPrimaryFlag(self):
        """Test detect_isPrimary column for meas_deblender runs.
        """
        coadd = self.exposure
        # Create a SkyMap with a tract that contains a portion of the image,
        # subdivided into 3x3 patches.
        wcs = coadd.getWcs()
        tractBBox = Box2I(Point2I(100, 100), Extent2I(900, 900))
        skyMap = MockSkyMap([tractBBox], wcs, 3)
        tractInfo = skyMap[0]
        patchInfo = tractInfo[0, 0]
        patchBBox = patchInfo.getInnerBBox()

        schema = SourceCatalog.Table.makeMinimalSchema()
        # Initialize the detection task
        detectionTask = SourceDetectionTask(schema=schema)

        # Initialize the fake source injection task
        skyConfig = SkyObjectsTask.ConfigClass()
        skySourcesTask = SkyObjectsTask(name="skySources", config=skyConfig)
        schema.addField("merge_peak_sky", type="Flag")

        # Initialize the deblender task
        deblendTask = SourceDeblendTask(schema=schema)

        # We'll customize the configuration of measurement to just run the
        # minimal number of plugins to make setPrimaryFlags work.
        measureConfig = SingleFrameMeasurementTask.ConfigClass()
        measureConfig.plugins.names = ["base_SdssCentroid", "base_SkyCoord"]
        measureConfig.slots.psfFlux = None
        measureConfig.slots.apFlux = None
        measureConfig.slots.shape = None
        measureConfig.slots.modelFlux = None
        measureConfig.slots.calibFlux = None
        measureConfig.slots.gaussianFlux = None
        measureTask = SingleFrameMeasurementTask(config=measureConfig, schema=schema)
        primaryConfig = SetPrimaryFlagsTask.ConfigClass()
        setPrimaryTask = SetPrimaryFlagsTask(config=primaryConfig, schema=schema,
                                             name="setPrimaryFlags", isSingleFrame=False)
        table = SourceCatalog.Table.make(schema)
        # detect sources
        detectionResult = detectionTask.run(table=table, exposure=coadd)
        catalog = detectionResult.sources
        # add fake sources
        skySources = skySourcesTask.run(mask=coadd.mask, seed=0)
        for foot in skySources[:5]:
            src = catalog.addNew()
            src.setFootprint(foot)
            src.set("merge_peak_sky", True)
        # deblend
        deblendTask.run(exposure=coadd, sources=catalog)
        catalog = catalog.copy(deep=True)
        measureTask.run(measCat=catalog, exposure=coadd)
        outputCat = catalog.copy(deep=True)
        # Set the primary flags
        setPrimaryTask.run(outputCat, skyMap=skyMap, tractInfo=tractInfo, patchInfo=patchInfo)

        # Check that SCARLET-only columns are NOT in the schema
        self.assertTrue("SCARLET_NAME" not in outputCat.schema.getNames())

        # Check that the sources contained in a tract are all marked
        # appropriately.
        x = outputCat["slot_Centroid_x"]
        y = outputCat["slot_Centroid_y"]
        tractInner = tractBBox.contains(x, y)
        # tractInner = bboxContainsColumn(tractBBox, x, y)
        np.testing.assert_array_equal(outputCat["detect_isTractInner"], tractInner)

        # Check that the sources contained in a patch are all marked
        # appropriately.
        patchInner = bboxContainsColumn(patchBBox, x, y)
        np.testing.assert_array_equal(outputCat["detect_isPatchInner"], patchInner)

        # make sure all sky sources are flagged as not primary
        self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["merge_peak_sky"])), 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is necessary, since meas_deblender (SourceDeblendTask) is run in testIsSinglePrimaryFlag. I can add some checks to make sure that the scarlet columns are not contained in the outputs, but I don't think that this test will touch any code that isn't covered by the two existing tests. Please correct me if I missed something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may seem like slight overkill, but testIsSinglePrimary above is for a single frame-like catalog (no tract or patch, different sky column name...different code path to setting the flags). It actually wasn’t tested before because no one had gone to the trouble of mocking up a skymap...but you did, so I’m taking full advantage of your efforts (but wrote the above myself in hopes of sparing you too much extra work for your efforts!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that my latest push corrected everything but this. It isn't that it's an inconvenience to implement this test, it would be trivial to implement a deblender generic function and call it once with ScarletDeblendTask and once with SourceDeblendTask. My issue with this request is that it doesn't actually test any new code paths but will add to the overall runtime of the already long pipe_tasks tests. All of the deblend flags are logically separate from the tract/patch code blocks, and looking back through the code I am fairly confident that this module is fully covered without it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be at odds on this one. My point is that the code path followed by a meas_deblender multiband run and the flag setting therein is not tested. It has never had a test (no mocked up skymap, yadda, yadda...), but now that you've added the functionality, an explicit test of the flag setting of this code path could easily be added. This long-standing failure-to-test the existing code path is not on you, though, so I will leave this to your discretion if you prefer not to add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, either I'm not understanding you or you're not understanding me, but I'll give it one last shot.

You are correct that when meas_deblender is run, the flags set by deblending are not tested with the flags from a skymap. So if the results of the deblender were dependent on those flags in any way, then not testing that would be missing a possible source of failure. Or, if the flags set by the deblender or isPrimary were in any way correlated then this would also be something that needs to be tested.

But... the deblender is completely isolated and independent of those flags. Similarly, the creation of the deblender flags is completely independent of the patch/tract flags. So it doesn't matter which deblender the patch/tract flags are tested against, as long as they are tested against a set of deblender flags, which they are. I'm fairly certain that coverage of this module is 100%, but if you can find a code path that isn't tested then please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm convinced this can be left as is!

isSimpleLeaf: `bool` array
True for each source that is a `SimpleLeaf` as defined above.
isModelLeaf: `bool` array
True for each sources that is a `ModelLeaf` as defined above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add a comment there. Should I also bring up the question of whether the scarlet-based "leaf" columns should be added to a meas_deblender schema?

"detect_isModelLeaf", type="Flag",
doc=primaryDoc + " and is a deblended child")

def run(self, sources, skyMap=None, tractInfo=None, patchInfo=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite my point...

@fred3m
Copy link
Contributor Author

fred3m commented Feb 22, 2021

Thanks for looking into the tests more. I actually tested them all in a jupyter notebook and verified that they were all working properly, not returning all False. So I was surprised to see your comments but after seeing I checked the tests again it looks like I didn’t update the repo with the correct version of the tests to make them work correctly. :(

So I'll update the tests and push those, as well as try to address some of your other comments.

tests/test_isPrimaryFlag.py Show resolved Hide resolved
tests/test_isPrimaryFlag.py Outdated Show resolved Hide resolved
tests/test_isPrimaryFlag.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved

Some categories of sources, for example sky objects,
are not really detected sources and should not be considered primary
sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble trying to find a pattern on your choices for line-breaks. You don't seem to ever really use the full 79 chars, and sometimes truncate differently in the same set of lines (as above!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having a line limit for docstrings and code comments is that if you write all the way to the end of the line and want to edit it later, you have to reformat the entire paragraph after the edit, even if you're a single character or two over the very small line limit. So I always give myself a large buffer and try to break at logical points where there might be an edit later, for example a comma or period. But at the end of the paragraph, if there are only a few words on the next line, I don't really care as much about leaving space so I tend to go write all the way to the end.

It's why I'm a huge supporter of a suggested line length for docstrings and comments but no strict enforcement, which keeps the diffs down and makes it easier (and thus more likely) that I'll keep docs up to date.

outputCat = calibResults.outputCat
self.assertTrue("detect_isPrimary" in outputCat.schema.getNames())
# make sure all sky sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0)
# make sure all parent sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)), 0)

def testIsScarletPrimaryFlag(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. Now that you have the functionality, would you mind adding a similar test for meas_deblender runs and, in particular, making sure any scarlet-specific columns (whatever they be named) are NOT present? I.e. something like (warning: not fully tested):

    def testIsMeasDeblenderPrimaryFlag(self):
        """Test detect_isPrimary column for meas_deblender runs.
        """
        coadd = self.exposure
        # Create a SkyMap with a tract that contains a portion of the image,
        # subdivided into 3x3 patches.
        wcs = coadd.getWcs()
        tractBBox = Box2I(Point2I(100, 100), Extent2I(900, 900))
        skyMap = MockSkyMap([tractBBox], wcs, 3)
        tractInfo = skyMap[0]
        patchInfo = tractInfo[0, 0]
        patchBBox = patchInfo.getInnerBBox()

        schema = SourceCatalog.Table.makeMinimalSchema()
        # Initialize the detection task
        detectionTask = SourceDetectionTask(schema=schema)

        # Initialize the fake source injection task
        skyConfig = SkyObjectsTask.ConfigClass()
        skySourcesTask = SkyObjectsTask(name="skySources", config=skyConfig)
        schema.addField("merge_peak_sky", type="Flag")

        # Initialize the deblender task
        deblendTask = SourceDeblendTask(schema=schema)

        # We'll customize the configuration of measurement to just run the
        # minimal number of plugins to make setPrimaryFlags work.
        measureConfig = SingleFrameMeasurementTask.ConfigClass()
        measureConfig.plugins.names = ["base_SdssCentroid", "base_SkyCoord"]
        measureConfig.slots.psfFlux = None
        measureConfig.slots.apFlux = None
        measureConfig.slots.shape = None
        measureConfig.slots.modelFlux = None
        measureConfig.slots.calibFlux = None
        measureConfig.slots.gaussianFlux = None
        measureTask = SingleFrameMeasurementTask(config=measureConfig, schema=schema)
        primaryConfig = SetPrimaryFlagsTask.ConfigClass()
        setPrimaryTask = SetPrimaryFlagsTask(config=primaryConfig, schema=schema,
                                             name="setPrimaryFlags", isSingleFrame=False)
        table = SourceCatalog.Table.make(schema)
        # detect sources
        detectionResult = detectionTask.run(table=table, exposure=coadd)
        catalog = detectionResult.sources
        # add fake sources
        skySources = skySourcesTask.run(mask=coadd.mask, seed=0)
        for foot in skySources[:5]:
            src = catalog.addNew()
            src.setFootprint(foot)
            src.set("merge_peak_sky", True)
        # deblend
        deblendTask.run(exposure=coadd, sources=catalog)
        catalog = catalog.copy(deep=True)
        measureTask.run(measCat=catalog, exposure=coadd)
        outputCat = catalog.copy(deep=True)
        # Set the primary flags
        setPrimaryTask.run(outputCat, skyMap=skyMap, tractInfo=tractInfo, patchInfo=patchInfo)

        # Check that SCARLET-only columns are NOT in the schema
        self.assertTrue("SCARLET_NAME" not in outputCat.schema.getNames())

        # Check that the sources contained in a tract are all marked
        # appropriately.
        x = outputCat["slot_Centroid_x"]
        y = outputCat["slot_Centroid_y"]
        tractInner = tractBBox.contains(x, y)
        # tractInner = bboxContainsColumn(tractBBox, x, y)
        np.testing.assert_array_equal(outputCat["detect_isTractInner"], tractInner)

        # Check that the sources contained in a patch are all marked
        # appropriately.
        patchInner = bboxContainsColumn(patchBBox, x, y)
        np.testing.assert_array_equal(outputCat["detect_isPatchInner"], patchInner)

        # make sure all sky sources are flagged as not primary
        self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["merge_peak_sky"])), 0)

python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
their own.
isIsolated : array-like of `bool`
True for isolated sources, regardless of whether or not they
were deblended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but aren’t these redundant columns (i.e. doesn’t isIsolated imply ~isBlended?). If there are cases for which they can both both be True (or False), can you describe those here (and perhaps update the names such that they don't read as if they are the NOTs of each other)? I don’t get from the docs what category a parent with multiple children would be in...I believe it has to be isBlended, but a parent is not “deblended from a parent”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at the docs I don't know how to be more clear, so perhaps you can help me trim down this long explanation into something that can fit into the docs.

top level parents with multiple peaks: not contained in either isBlended or isIsolated.
top level parents skipped for some reason: not contained in either isBlended or isIsolated.
top level parents with only a single peak: isIsolated
source deblended from a parent with a single peak: isIsolated
source deblended from a parent with multiple peaks: isBlended

So the two are not complementary and we should keep the names, as they describe exactly what they contain. It would be possible to add another flag, isBlend, where isBlend = (parent == 0) & ~isIsolated, so that the entire source catalog would be the combination of the orthogonal flags isBlended | isIolated | isBlend. If you really want me to I can add it, but our catalogs are already large enough and I don't see the point of adding a new flag that will never be used just to clarify isBlended and isIsolated. We should be able to do this in the docs, but obviously I haven't hit on the right explanation yet.

Copy link
Contributor

@laurenam laurenam Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my main sticking point is in part the tense implied by isBlended. Really, this represents a source that wasBlended or isDeblendedChildOfBlend, i.e. the "has been deblended from a parent with more than one child" nature just doesn't come across naturally to me with isBlended. These suggestions might be (ok, are!) terrible for other reasons, so I will leave it to you if you want to change them but I suspect I'm not the only one who will be confused...as I mentioned below, just the first line in the doc jars with the name (isBlended --> "...source that was deblended..."). At the very least, could you add those double-False cases to the docs (prolly for isBlended)...I totally agree we don't want to add yet another flag column!

isolated parents, and a DeblendedModelPrimary, which uses
the scarlet models for both isolated and blended sources.
In the case of meas_deblender, DeblendedModelPrimary is
`None` because it is not contained in the output catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, was there consensus on using Primary in a name that does not imply any of its usual meanings? I would prefer using (your original) Leaf or something else (Branch, Channel, Fork?) that does not already carry the weight of deeply ingrained implied conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is my mistake. I had an additional comment in JIRA but I apparently never hit Add so while I thought I made a post over a week ago, apparently I did not. I just added (a slightly modified version of) that post now, so let's discuss this there.

"deblended child from a parent with 'deblend_nChild' > 1")
self.isBlendedKey = self.schema.addField(
"detect_isBlended", type="Flag",
doc="This source is deblended from a parent with more than one child."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit head spinning for the doc of a column with isBlended in its name to read “This source is deblended...”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion? The source is blended in the image, the deblender just attempts to make a model for it as if it wasn't. But if you're looking at an output catalog and want to know if a source is isolated or blended in an image, then ones with this flag are blended.

outputCat = calibResults.outputCat
self.assertTrue("detect_isPrimary" in outputCat.schema.getNames())
# make sure all sky sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0)
# make sure all parent sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)), 0)

def testIsScarletPrimaryFlag(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be at odds on this one. My point is that the code path followed by a meas_deblender multiband run and the flag setting therein is not tested. It has never had a test (no mocked up skymap, yadda, yadda...), but now that you've added the functionality, an explicit test of the flag setting of this code path could easily be added. This long-standing failure-to-test the existing code path is not on you, though, so I will leave this to your discretion if you prefer not to add one.

tests/test_isPrimaryFlag.py Outdated Show resolved Hide resolved
# over which model to use for the isolated sources.
self.assertEqual(
np.sum(outputCat["detect_isDeblendedPrimary"]),
np.sum(outputCat["detect_isDeblendedModelPrimary"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, check that these sums are non-zero (preferably checking against the known number)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a check that they are non-zero, but checking against a known number is not advisable. As you know so many things upstream affect the creation of footprints and peaks that the number of peaks and footprints changes with practically every weekly, causing this test to trip frequently when people make changes upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this is adjusted in DM-27929 because they will be slightly different now that meas_extensions_scarlet skips sky sources, but in a way that can be predicted and tested.


if "deblend_scarletFlux" in sources.schema:
# The number of peaks in the sources footprint.
# This is different than nChild, the number of sources in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave me hangin', man!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's building suspense! :)

python/lsst/pipe/tasks/setPrimaryFlags.py Outdated Show resolved Hide resolved
their own.
isIsolated : array-like of `bool`
True for isolated sources, regardless of whether or not they
were deblended.
Copy link
Contributor

@laurenam laurenam Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my main sticking point is in part the tense implied by isBlended. Really, this represents a source that wasBlended or isDeblendedChildOfBlend, i.e. the "has been deblended from a parent with more than one child" nature just doesn't come across naturally to me with isBlended. These suggestions might be (ok, are!) terrible for other reasons, so I will leave it to you if you want to change them but I suspect I'm not the only one who will be confused...as I mentioned below, just the first line in the doc jars with the name (isBlended --> "...source that was deblended..."). At the very least, could you add those double-False cases to the docs (prolly for isBlended)...I totally agree we don't want to add yet another flag column!

outputCat = calibResults.outputCat
self.assertTrue("detect_isPrimary" in outputCat.schema.getNames())
# make sure all sky sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["sky_source"])), 0)
# make sure all parent sources are flagged as not primary
self.assertEqual(sum((outputCat["detect_isPrimary"]) & (outputCat["deblend_nChild"] > 0)), 0)

def testIsScarletPrimaryFlag(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm convinced this can be left as is!

@laurenam
Copy link
Contributor

In principle, this PR should be titled (https://developer.lsst.io/work/flow.html#make-a-pull-request):
DM-28542: Implement RFC-750

@fred3m fred3m changed the title tickets/DM-28542: Refactor isPrimary to work with scarlet DM-28542: Implement RFC-750 Mar 15, 2021
@fred3m fred3m force-pushed the tickets/DM-28542 branch 2 times, most recently from 216ab49 to d42e70e Compare March 15, 2021 19:05
fred3m added 2 commits March 16, 2021 20:13
This commit fixes a few things:
1. It refactors isPrimary to use vectorized columns
   instead of iteratinf over sources
2. It appropriately uses the scarlet outputs to choose
   which type of isolated sources are characterized
   by isPrimary, and it adds new columns to differentiate
   between the different deblender results.
@fred3m fred3m merged commit 0be6b20 into master Mar 17, 2021
@fred3m fred3m deleted the tickets/DM-28542 branch March 17, 2021 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants