DM-39514: Add validity-range workaround to prep_butler #77

kfindeisen · 2023-07-31T17:24:47Z

This PR uses queryDatasetAssociations to filter out expired calibs during local repo setup. This is not a foolproof solution; in particular, error detection is quite poor because the current code needs to be agnostic to exactly which calibs the pipeline(s) need(s). Error detection will be improved once DM-40245 rewrites calib loading to be more dynamic.

This is a crude function for vetting calibs by their validity range after an initial list has been returned by queryDatasets.

The call is done from _filter_datasets to make it possible in principle to detect when there are no valid calibs. In practice, there are always unbounded calibs, making it hard to tell that the limited-duration calibs are missing.

Since the default is for all logs from the PP framework itself to be at DEBUG level, the "trace" logs are attacked to a separate logger to turn them off unless specifically requested in the service config.

python/activator/middleware_interface.py

TallJimbo · 2023-08-18T14:25:43Z

python/activator/middleware_interface.py

+    t = astropy.time.Time(date, scale='utc')
+    # DatasetAssociation.timespan guaranteed not None
+    return [ref for ref in unfiltered_calibs
+            if ref in associations and associations[ref].timespan.contains(t)]


If more than one CALIBRATION-type collection is passed in (either directly as part of a CHAINED collection), this can return more than one DatasetRef of the same type and data ID (there's no logic like findFirst=True argument to queryDatasets happening here). It looks like that's okay, since you export all of the collections you'll want to search, too, and that will export the search order so later lookups do the right thing, but I wanted to check.

Yes, this has actually already happened (test runs picked up multiple sky datasets with overlapping validity ranges).

Since (according to a TODO) we can't use findFirst with calibs, is there some other way to get _export_calibs to return unique calibs?

queryDatasetAssociations will return results in the right order (I don't think this is documented but I'd be happy to make it a guarantee), so after you restrict the results to one instant in time, you could deduplicate manually on dataset type + data ID by using the first one returned.

The only other option at present is to use Registry.findDataset on one dataset type + data ID combination at a time.

I don't think deduplication should be in scope for _filter_calibs_by_date, since the only way it can produce duplicate output is if it has duplicate input; it's not an artifact of using queryDatasetAssociations internally.

Actually, on second thought, wouldn't using findDataset make queryDatasetAssociations unnecessary? 🤔

It might be a net gain. My understanding is that with this PR's changes all calibration lookups are still correct, but you export more than you have to (especially in the limit that there are many unnecessary CALIBRATION collections in the chain). With findDataset you'd be doing many small database queries (to the point where the per-query latency overhead can become a performance problem; this is what makes step1 QG generation slow). I don't know if that's a good trade either overall or on the basis of where the time is spent.

I also think I saw a comment saying that you don't know which dataset types will be needed up-front, and that probably rules out going with findDataset right now anyway.

I thought the proposal was to use findDataset to filter the results of queryDatasets, in which case I have (a superset of) the dataset types already.

Yes, I was thinking of findDataset as an alternative to queryDatasetAssociations.

Yes, doing a full queryDatasets with findFirst=False to find the candidate dataset refs and then following up with findDatasets would indeed work. If time spent here in prep_butler is "cheap" compared to time spent doing the export or import, that's a pretty good plan.

python/activator/middleware_interface.py

kfindeisen force-pushed the tickets/DM-39514 branch 2 times, most recently from d58f702 to 7d30afa Compare August 17, 2023 19:08

kfindeisen added 3 commits August 17, 2023 15:29

Implement _filter_calibs_by_date.

fbd1929

This is a crude function for vetting calibs by their validity range after an initial list has been returned by queryDatasets.

Use _filter_calibs_by_date in prep_butler.

1b4b245

The call is done from _filter_datasets to make it possible in principle to detect when there are no valid calibs. In practice, there are always unbounded calibs, making it hard to tell that the limited-duration calibs are missing.

Add trace-level logging to calib filtering.

089cf09

Since the default is for all logs from the PP framework itself to be at DEBUG level, the "trace" logs are attacked to a separate logger to turn them off unless specifically requested in the service config.

kfindeisen force-pushed the tickets/DM-39514 branch from f165244 to 089cf09 Compare August 17, 2023 22:32

kfindeisen marked this pull request as ready for review August 17, 2023 22:40

kfindeisen requested a review from TallJimbo August 17, 2023 23:21

TallJimbo approved these changes Aug 18, 2023

View reviewed changes

kfindeisen merged commit 798174d into main Aug 18, 2023

kfindeisen deleted the tickets/DM-39514 branch August 18, 2023 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DM-39514: Add validity-range workaround to prep_butler #77

DM-39514: Add validity-range workaround to prep_butler #77

Uh oh!

kfindeisen commented Jul 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

TallJimbo Aug 18, 2023

Uh oh!

kfindeisen Aug 18, 2023 •

edited

Loading

Uh oh!

TallJimbo Aug 18, 2023

Uh oh!

kfindeisen Aug 18, 2023 •

edited

Loading

Uh oh!

kfindeisen Aug 18, 2023

Uh oh!

TallJimbo Aug 18, 2023

Uh oh!

kfindeisen Aug 18, 2023 •

edited

Loading

Uh oh!

TallJimbo Aug 18, 2023

Uh oh!

TallJimbo Aug 18, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DM-39514: Add validity-range workaround to prep_butler #77

DM-39514: Add validity-range workaround to prep_butler #77

Uh oh!

Conversation

kfindeisen commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TallJimbo Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

kfindeisen Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

kfindeisen Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfindeisen Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

TallJimbo Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

kfindeisen Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

TallJimbo Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfindeisen commented Jul 31, 2023 •

edited

Loading

kfindeisen Aug 18, 2023 •

edited

Loading

kfindeisen Aug 18, 2023 •

edited

Loading

kfindeisen Aug 18, 2023 •

edited

Loading