New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14534: Fix measurementInvestigationLib.makeRerunCatalog parent keys #117
Conversation
if not isinstance(oldCatalog, SourceCatalog): | ||
raise RuntimeError("oldCatalog must be an instance of " | ||
"lsst.afw.table.SourceCatalogiterable") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it does not really make sense to specify both add parents and reset parents both, there should be a condition to check mutual exclusivity, and raise if they are both set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting resetParents=True will have no effect if addParents=True, so it seems excessive to raise an exception given clearer documentation for resetParents.
@@ -113,13 +131,30 @@ def makeRerunCatalog(schema, oldCatalog, idList, fields=None): | |||
if entry not in schema: | |||
schema.addField(oldCatalog.schema.find(entry).field) | |||
|
|||
# It's likely better to convert to a list and append | |||
idList = list(idList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loops through the whole iterable of idList once to create a new set object. You are then going to loop through again below. It is better to just make and empty container and then add to it in the loop below. Moreover this can be moved inside the if addParents loop, so nothing is done if there is nothing to be added.
idList = list(idList) | ||
|
||
if addParents: | ||
lenIdList = len(idList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very c like code consider the following:
if addParents:
newIdList = set() # For faster inserts and lookups
for srcId in idList:
newIdList.add(srcId) # add the srcId, will be unique if someone specified an id twice
parentId = oldCatalog.find(srcId).getParent()
if parentId:
newIdList.add(parentId) # No need to check if it is in the list already as sets are unique
idList = newIdList
idList = sorted(idList) # sort unconditionally outside the if block, sorted will take any iterable
measCat = SourceCatalog(schema) | ||
for srcId in idList: | ||
oldSrc = oldCatalog.find(srcId) | ||
src = measCat.addNew() | ||
src.setId(srcId) | ||
src.setFootprint(oldSrc.getFootprint()) | ||
src.setParent(oldSrc.getParent()) | ||
parent = oldSrc.getParent() | ||
if parent != 0 and resetParents and parent not in idList: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all of this behavior is documented in the docstring. I would expect via the docstring that all parents are reset, not just the ones whos parents are not in the list.
resetParents is a strict bool, so it should go first in the logic, so no comparisons must be done if it is false. And or related to this, like above, you should not check if somethings is not equal to zero, as non zero values in python are truth-y
@@ -97,13 +98,30 @@ def makeRerunCatalog(schema, oldCatalog, idList, fields=None): | |||
keys that exist in both the old catalog and input schema. Fields listed | |||
will be copied from the old catalog into the new catalog. | |||
|
|||
resetParents: boolean | |||
Flag to indicate that child objects should have their parents set to 0. | |||
Otherwise, lsst.meas.base.SingleFrameMeasurementTask.runPlugins() will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really documentation about this parameter so much as it is about the function itself. Document what this parameter does, and put the rest of the description about what will happen in the general discussion at the start of the docstring
doc/ReconstructingMeasurements.rst
Outdated
|
||
|
||
At the moment, measTask.runPlugins will not run on any child | ||
objects if their parents are not also in the catalog. This issue can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following:
"...in the catalog. The makeRerunCatalog function has two different optional flags to ensure measurement records are rerun. The reset..."
doc/ReconstructingMeasurements.rst
Outdated
children whose parents are not included in the idList, effectively turning | ||
them into parents. The ``addParents`` flag (False by default) will add these | ||
parents to the idList. Some timed examples follow: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the example below, all the timing code is unrelated to the topic of this document, and obfuscates the point being made. Examples should have the bare minimum amount of extra utility and concepts to demonstrate a point.
Consider instead counting the number of nans (or zeros if that is the default) in a column (pick which ever makes sense) before and after turning on resetParents, to show that the number decreases. You can also show that before turning on reset parents the number of bad values is equal to the number of sources that are children. To demonstrate adding parents, you can show the number of bad values as above, but you can also show the length of the inputId list vs the length of the output rerun catalog and how the later is now larger.
doc/ReconstructingMeasurements.rst
Outdated
) | ||
print("New (addParents=True, resetParents=False): ", timeRunPlugins()) | ||
measTask.runPlugins(rebuildNoiseReplacer(exposure, srcCat), newSrcCatalog, exposure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only last comment will be to add a code comment explaining the expected output, aka with parents more sources will be in the output, like you did in the above two cases
…keys Add two options to either resetParents (default) to zero or addParents (default=False) to the rerunCatalog in makeRerunCatalog so that runPlugins doesn't skip children with no parent.
Adds two options to makeRerunCatalog to either resetParents (default=True) to zero or addParents (default=False), so that child objects with parents outside of the rerunCatalog aren't skipped by runPlugins.