New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26987: Update filtering of matched catalog to operate before creating GroupView #118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few small code comments, overall looks like a good fix.
pipelines/preparation/preparation_matched_multi_jointcal_fgcm.yaml
Outdated
Show resolved
Hide resolved
faintMagCut: 21.5 | ||
selectExtended: False | ||
python: | | ||
config.connections.outputCatalog = 'matchedCatalogTractMag17to21p5' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't deduce what the p5
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my shorthand for 21.5, since I don't want to put the '.' in there...
Jenkins job kicked off here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colin's comments look to be addressed. I have not additional comments
37ce503
to
90e9d96
Compare
Successful Jenkins run here |
Implemented filtering of catalogs before running multiMatch to create the matched catalogs. This involved:
a) writing a new "prefilter" function, which is called from matcher.py
b) adding config params to control the filtering to the base matched catalog class
c) re-writing some pipelines to allow for different cuts to be applied for different metrics (e.g., the "default" SNR cut for photometric repeatability, but magnitude ranges for astrometric repeatability)
Part (c) has the effect of writing additional matched catalogs rather than using the same one for all metrics. However, I thought it was worth the extra steps to make the entire pipeline more efficient. We could reconsider this if needed.
Some simple tests suggest that for small amounts of data, pipelines with prefiltering of catalogs before matching run in ~half the time, and use ~half the memory. It's possible that the effect will be more dramatic when processing large chunks of data.