Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-32034: Create MatchProbabilisticTask #157

Merged
merged 1 commit into from Dec 6, 2021
Merged

DM-32034: Create MatchProbabilisticTask #157

merged 1 commit into from Dec 6, 2021

Conversation

taranu
Copy link
Contributor

@taranu taranu commented Dec 1, 2021

No description provided.

Copy link
Contributor

@morriscb morriscb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main comment is to break up the parts of the match method in matcher_probabilistic into some smaller methods sub methods. As add some comments to explain the more complicated parts.

columns_target_select_false = pexConfig.ListField(
dtype=str,
default=('merge_peak_sky',),
doc='Target table columns to require to be True for selecting match candidates',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc right?

finite = np.isfinite(chi)
n_finite = np.sum(finite, axis=1)
chisq_good = n_finite >= config.match_n_finite_min
if np.any(chisq_good):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean you don't look at any object with NaN errors even if other columns are valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose that's true. I guess I could add a min/max/default error or something, but then the user can also just modify the catalogs before passing them in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to only compare the non-NaN columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already based on the non-NaN columns, and will succeed as long as there are more than n_finite finite chi values. It's just that it doesn't distinguish between measurements vs errors being NaN (but again, the user can modify the input catalogs if they care, and we really shouldn't be producing finite values with NaN errors in the first place).

):
self.config = config

def match(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code could use some inline comments describing what is happening at a give step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to break some of this function up into sub sections? That would certainly make the testing easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be much benefit in breaking it up, as any of the sub-functions I could think of defining would have large numbers of arguments/return values but still not really make sense to call in any other context.

class MatchProbabilisticTask(pipeBase.Task):
"""Run MatchProbabilistic on a reference and target catalog covering the same tract.
"""
ConfigClass = MatchProbabilisticConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there no configs specific to the task? Would it make more sense to have the prob matcher be a configurable subtask of the main task? It's fine if not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, I could have folded the Matcher into the Task instead of having a separate class, but this way the Matcher has no stack dependencies besides pex_config (which I believe can be installed as a standalone package itself), and in principle can be imported independently without building the package. Whether that ever ends up being useful to anyone remains to be seen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Like I said on the other repo, I think you already take care of this by having the configurable in the pipeline class.

for column in config.columns_target_select_false:
select_target &= ~catalog_target[column].values

logger.info(f'Beginning MatcherProbabilistic.match with {np.sum(select_ref)}/{len(select_ref)}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use f-strings with loggers. The logger must use % formatting and must be given as parameters at the end. The reason is to defer the stringification until we know we need it. There are four other places (at least) that need to be fixed.

@taranu taranu force-pushed the tickets/DM-32034 branch 2 times, most recently from f7929f0 to e72c3a3 Compare December 6, 2021 16:25
@taranu taranu merged commit ac2d92b into main Dec 6, 2021
@taranu taranu deleted the tickets/DM-32034 branch December 6, 2021 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants