New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-34174: Replace PropagateVisitFlagsTask with PropagateSourceFlagsTask #659
Conversation
bef49f8
to
b9c72ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good with one exception: without having looked at the ticket I wouldn't understand the difference between the source table and the finalized source table, so I think that throughout the PR the docs need to be updated to explain the difference between the two.
I'm also still confused about whether or not it is expected for there to be a scenario where both the source table flags and finalized source table flags are propagated. At least the comments in the default config make it appear as if they won't be regularly used. My recommendation would be to write the entire task to use a single input table and if there is a case where the user will need to propagate both sets of flags, they could run the task twice using different sets of inputs (sources vs finalized sources). Or am I missing something?
selected_src['calib_psf_candidate'] = np.zeros(len(selected_src), dtype=bool) | ||
selected_src['calib_psf_used'] = np.zeros(len(selected_src), dtype=bool) | ||
selected_src['calib_psf_reserved'] = np.zeros(len(selected_src), dtype=bool) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused. calib_psf_candidate
and calib_psf_reserved
are both set (not updated) below, so the above code seems unnecessary except for calib_psf_used
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've update the comment that I do this just to be extra sure that information from the input tables doesn't leak through.
type='Flag', | ||
doc=('set if source was used in the PSF determination by ' | ||
'FinalizeCharacterizationTask.'), | ||
) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code block commented out? It looks like it should be necessary based on compute_psf_and_ap_corr_map
and if it isn't, it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a mistake. It is not necessary since the names are being carried through from the input catalog.
|
||
|
||
class PropagateSourceFlagsConfig(pexConfig.Config): | ||
"""Configuration for propagating source flags to coadd objects.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change coadd objects
to coadd sources
here and everywhere else in the file. "Coadd objects" sounds more like some sort of class associated with a coadd as opposed to a source catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we decided to leave this as coadd objects vs single-frame sources, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
source_flags = pexConfig.DictField( | ||
keytype=str, | ||
itemtype=float, | ||
default={ | ||
# TODO: DM-34391: when doApplyFinalizedPsf is the default, these flags | ||
# should be set below and not here. | ||
"calib_psf_candidate": 0.2, | ||
"calib_psf_used": 0.2, | ||
"calib_psf_reserved": 0.2, | ||
"calib_astrometry_used": 0.2, | ||
"calib_photometry_used": 0.2, | ||
"calib_photometry_reserved": 0.2 | ||
}, | ||
doc=("Source flags to propagate, with the threshold of relative occurrence " | ||
"(valid range: [0-1]). Coadd object will have flag set if fraction " | ||
"of input visits in which it is flagged is greater than the threshold."), | ||
) | ||
finalized_source_flags = pexConfig.DictField( | ||
keytype=str, | ||
itemtype=float, | ||
default={ | ||
# TODO: DM-34391: when doApplyFinalizedPsf is the default, these flags | ||
# should be set here and not above. | ||
# "calib_psf_candidate": 0.2, | ||
# "calib_psf_used": 0.2, | ||
# "calib_psf_reserved": 0.2 | ||
}, | ||
doc=("Finalized source flags to propagate, with the threshold of relative " | ||
"occurrence (valid range: [0-1]). Coadd object will have flag set if " | ||
"fraction of input visits in which it is flagged is greater than the " | ||
"threshold."), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. From the looks of it, only one of source_flags
and finalized_source_flags
will be used by default, right? So why is this entire task setup to use both instead of having a single set of parameters and a config option useFinalized
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more documentation to make it clear that there will be a mix of flags from different tables.
for flag in source_flag_counts: | ||
thresh = num_overlaps*self.config.source_flags[flag] | ||
object_flag = (source_flag_counts[flag] > thresh) | ||
coadd_object_cat[flag] = object_flag | ||
self.log.info("Propagated %d sources with flag %s", object_flag.sum(), flag) | ||
|
||
for flag in finalized_source_flag_counts: | ||
thresh = num_overlaps*self.config.finalized_source_flags[flag] | ||
object_flag = (finalized_source_flag_counts[flag] > thresh) | ||
coadd_object_cat[flag] = object_flag | ||
self.log.info("Propagated %d finalized sources with flag %s", object_flag.sum(), flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a separate function or class method instead of using duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but there are so many variables to pass I went with the rule-of-3:
is a code refactoring rule of thumb to decide when a replicated piece of code should be replaced by a new procedure. It states that you are allowed to copy and paste the code once, but that when the same code is replicated three times, it should be extracted into a new procedure. The rule was introduced by Martin Fowler in his text "Refactoring" and attributed to Don Roberts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
counts_list, | ||
x_column_list, | ||
y_column_list): | ||
if handle_dict is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also check that len(columns) > 0
, that way if you are only using one of source_columns
and finalized_source_columns
it doesn't match up against a catalog with no flags to propagate.
b9c72ea
to
1746cf4
Compare
1746cf4
to
0edda7f
Compare
No description provided.