Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14534: Fix measurementInvestigationLib.makeRerunCatalog parent keys #117

Merged
merged 1 commit into from Jun 26, 2018

Conversation

taranu
Copy link
Contributor

@taranu taranu commented May 30, 2018

Adds two options to makeRerunCatalog to either resetParents (default=True) to zero or addParents (default=False), so that child objects with parents outside of the rerunCatalog aren't skipped by runPlugins.

@taranu taranu requested a review from natelust May 30, 2018 16:57
if not isinstance(oldCatalog, SourceCatalog):
raise RuntimeError("oldCatalog must be an instance of "
"lsst.afw.table.SourceCatalogiterable")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it does not really make sense to specify both add parents and reset parents both, there should be a condition to check mutual exclusivity, and raise if they are both set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting resetParents=True will have no effect if addParents=True, so it seems excessive to raise an exception given clearer documentation for resetParents.

@@ -113,13 +131,30 @@ def makeRerunCatalog(schema, oldCatalog, idList, fields=None):
if entry not in schema:
schema.addField(oldCatalog.schema.find(entry).field)

# It's likely better to convert to a list and append
idList = list(idList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loops through the whole iterable of idList once to create a new set object. You are then going to loop through again below. It is better to just make and empty container and then add to it in the loop below. Moreover this can be moved inside the if addParents loop, so nothing is done if there is nothing to be added.

idList = list(idList)

if addParents:
lenIdList = len(idList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very c like code consider the following:

if addParents:
    newIdList = set() # For faster inserts and lookups
    for srcId in idList:
        newIdList.add(srcId) # add the srcId, will be unique if someone specified an id twice
        parentId = oldCatalog.find(srcId).getParent()
        if parentId:
            newIdList.add(parentId) # No need to check if it is in the list already as sets are unique
    idList = newIdList
idList = sorted(idList) # sort unconditionally outside the if block, sorted will take any iterable 

measCat = SourceCatalog(schema)
for srcId in idList:
oldSrc = oldCatalog.find(srcId)
src = measCat.addNew()
src.setId(srcId)
src.setFootprint(oldSrc.getFootprint())
src.setParent(oldSrc.getParent())
parent = oldSrc.getParent()
if parent != 0 and resetParents and parent not in idList:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all of this behavior is documented in the docstring. I would expect via the docstring that all parents are reset, not just the ones whos parents are not in the list.

resetParents is a strict bool, so it should go first in the logic, so no comparisons must be done if it is false. And or related to this, like above, you should not check if somethings is not equal to zero, as non zero values in python are truth-y

@@ -97,13 +98,30 @@ def makeRerunCatalog(schema, oldCatalog, idList, fields=None):
keys that exist in both the old catalog and input schema. Fields listed
will be copied from the old catalog into the new catalog.

resetParents: boolean
Flag to indicate that child objects should have their parents set to 0.
Otherwise, lsst.meas.base.SingleFrameMeasurementTask.runPlugins() will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really documentation about this parameter so much as it is about the function itself. Document what this parameter does, and put the rest of the description about what will happen in the general discussion at the start of the docstring



At the moment, measTask.runPlugins will not run on any child
objects if their parents are not also in the catalog. This issue can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following:
"...in the catalog. The makeRerunCatalog function has two different optional flags to ensure measurement records are rerun. The reset..."

children whose parents are not included in the idList, effectively turning
them into parents. The ``addParents`` flag (False by default) will add these
parents to the idList. Some timed examples follow:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example below, all the timing code is unrelated to the topic of this document, and obfuscates the point being made. Examples should have the bare minimum amount of extra utility and concepts to demonstrate a point.

Consider instead counting the number of nans (or zeros if that is the default) in a column (pick which ever makes sense) before and after turning on resetParents, to show that the number decreases. You can also show that before turning on reset parents the number of bad values is equal to the number of sources that are children. To demonstrate adding parents, you can show the number of bad values as above, but you can also show the length of the inputId list vs the length of the output rerun catalog and how the later is now larger.

@taranu taranu self-assigned this Jun 8, 2018
)
print("New (addParents=True, resetParents=False): ", timeRunPlugins())
measTask.runPlugins(rebuildNoiseReplacer(exposure, srcCat), newSrcCatalog, exposure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only last comment will be to add a code comment explaining the expected output, aka with parents more sources will be in the output, like you did in the above two cases

…keys

Add two options to either resetParents (default) to zero or addParents
(default=False) to the rerunCatalog in makeRerunCatalog so that runPlugins
doesn't skip children with no parent.
@taranu taranu merged commit 37e07f5 into master Jun 26, 2018
@ktlim ktlim deleted the tickets/DM-14534 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants