-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42825: Prompt Processing does slow calib queries against /repo/embargo #153
Conversation
Dataset queries done using fixed IDs are 2-3 times faster than queries done using the where keyword, a significant savings when querying the central repo. However, queries against a set of IDs cannot be converted to this form and must continue using where.
The comment justifies why we can't filter by validity range, but we have been doing so for a while.
Butler registry queries against all dataset types are an order of magnitude less efficient than an equivalent set of queries that enumerates all possible types.
@@ -638,11 +638,15 @@ def _export_calibs(self, detector_id, filter): | |||
""" | |||
# TAI observation start time should be used for calib validity range. | |||
calib_date = astropy.time.Time(self.visit.private_sndStamp, format="unix_tai") | |||
# Querying by specific types is much faster than querying by ... | |||
# Some calibs have an exposure ID (of the source dataset?), but these can't be used in AP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, calibs with an exposure
dimension don't make sense to me either. I wonder if they're accidents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a mix of things, some more reasonable than others:
- Produced by CPP:
- cpBiasProc, biasIsr_metadata, biasIsr_log
- cpDarkIsr, darkIsr_metadata, darkIsr_log
- cpDarkProc, cpDark_metadata, cpDark_log
- cpDarkProcForDefects, darkIsrForDefects_metadata, darkIsrForDefects_log
- cpFlatProc, flatIsr_metadata, flatIsr_log
- flatStats, cpFlatMeasure_metadata, cpFlatMeasure_log
- cpDefectsProc
- cpPartialDefects, measureDefects_metadata, measureDefects_log
- cpPtcProc, ptcIsr_metadata, ptcIsr_log
- cpPtcExtract
- cpCrosstalkProc
- crosstalkRatios, crosstalkFluxes, cpCrosstalkRatio, crosstalkExtract_metadata, crosstalkExtract_log
- Produced in the verification counterparts of the above:
- verifyBiasProc, verifyBiasApply_metadata, verifyBiasApply_log
- verifyBiasDetStats, verifyMeasureStats_metadata, verifyMeasureStats_log
- verifyBiasExpStats, verifyExposureStats_log, verifyExposureStats_metadata
- verifyDarkProc, verifyDarkApply_metadata, verifyDarkApply_log
- verifyDarkDetStats, verifyDarkChip_metadata, verifyDarkChip_log
- verifyDarkExpStats, verifyDarkExp_metadata, verifyDarkExp_log
- verifyFlatProc, verifyFlatApply_metadata, verifyFlatApply_log
- verifyFlatDetStats, verifyFlatChip_metadata, verifyFlatChip_log
- verifyFlatExpStats, verifyFlatExp_metadata, verifyFlatExp_log
- verifyDefectProc, verifyDefectApply_metadata, verifyDefectApply_log
- verifyDefectDetStats
- verifyDefectExpStats
- verifyDefectIsr, verifyDefectIsr_metadata, verifyDefectIsr_log
- verifyUncorrDefectIsr, verifyUncorrectedDefectIsr_metadata, verifyUncorrectedDefectIsr_log
- verifyCrosstalkProc, verifyCrosstalkApply_metadata, verifyCrosstalkApply_log
- verifyCrosstalkRatio, verifyCrosstalkExtract_metadata, verifyCrosstalkExtract_log
- verifyLinearityProc, verifyLinearityApply_metadata, verifyLinearityApply_log
- verifyLinearityPtcExtract
- Miscellaneous:
- raw
- rawSpectrum
- postISRCCD, isrStatistics, isr_metadata, isr_log
- timing_isr_metadata, timing_isr_log, cputiming_isr_metadata, cputiming_isr_log [Did somebody run
ApVerify.yaml
on /repo/embargo???]
- Unidentified (possibly obsolete):
- quickLookExp, quickLookTask_metadata, quickLookTask_log
- crosstalkSignals, crosstalkResults, crosstalkBackgroundXTilts, crosstalkBackgroundYTilts, crosstalkBackgroundZOffsets, crosstalkCoefficients, crosstalkCoefficientErrors, crosstalkNonLinearCoefficients, crosstalkTask_metadata, crosstalkTask_log
- backgroundResults
- plotPtcTask_metadata, plotPtcTask_log
- cpCRisr, cpCRproc
- spotSrc, characterizeSpots_metadata, characterizeSpots_log
- eoFlatIsr, eoFlatStats, eoFlatMeasure_metadata, eoFlatMeasure_log
- eoDarkIsr, eoDarkProc, eoDarkTask_log, eoDarkTask_metadata
- myIsr_metadata, myIsr_log
- verifyBiasExpMatrix, verifyBiasExpResults, verifyBiasExp_metadata, verifyBiasExp_log
- verifyBiasDetMatrix, verifyBiasDetResults, verifyBiasChip_metadata, verifyBiasChip_log
- verifyDarkExpResults, verifyDarkDetResults
- verifyFlatDetResults, verifyFlatExpResults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect bias/dark/flat calibrations to all be exposure calibrations and not visit.
@@ -610,6 +610,8 @@ def _export_skymap_and_templates(self, center, detector, wcs, filter): | |||
collections=self._collection_template, | |||
instrument=self.instrument.getName(), | |||
skymap=self.skymap_name, | |||
tract=tract.tract_id, | |||
physical_filter=filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own curiousity, do you see this going meaningfully faster in a single query (or a small number of queries), or is it only when doing many queries that it becomes noticeable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not "meaningfully", no -- templates went from 0.0801, 0.0834 to 0.0697, 0.0678, 0.0840, 0.0846, 0.0721. But if nothing else, it will keep the code from getting cargo-culted to cases where it does matter.
Creating a function that returns the types makes it possible to replace the hardcoded list with something more flexible later.
This still forces us to load refcats the pipelines don't use, but at least prevents us from having to manually curate a list of known types.
e936c99
to
e886784
Compare
This PR does some optimization of
Registry.queryDatasets
calls against the central repo, moving data IDs out ofwhere
clauses and avoiding the use of...
as a dataset type identifier....
is still used to implement the "cache nothing" hack (which is against the relatively small local repo, and needs to pick up absolutely everything to work) and as an explicit argument toqueryDatasetTypes
(where unfiltered queries are the norm).This PR also uses experience gained with
queryDatasetTypes
to remove the explicit list of refcat dataset types (which was itself a workaround for the impracticality of...
); since Middleware still doesn't have an explicit "this is a refcat" concept, the new implementation uses a heuristic filter on dimensions and storage class.