Skip to content

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Dec 15, 2023

This PR replaces our first call to queryDatasetAssociations with a much faster loop over individual datasets. We still need the second call to queryDatasetAssociations, which provides the information we need to recertify the calibs we've preloaded (see DM-41915), but that call will be skipped if calib preload becomes a no-op.

@kfindeisen kfindeisen force-pushed the tickets/DM-41713 branch 2 times, most recently from a274af0 to 93ace76 Compare December 15, 2023 22:12
@kfindeisen kfindeisen requested a review from TallJimbo December 15, 2023 22:24
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I also have an idea for the second pass you referred to in the PR description, but I'll put that on the Jira ticket you referenced.

queryDatasetAssociations is extremely slow (the loop over dataset types
takes a minute to complete), so its use makes filtering a bottleneck.
Use find_dataset instead to select by timestamp.

We cannot yet use find_dataset to search for calibs directly, because
it can only be run on one dataset type at a time and we don't yet have
the sophistication to organically determine which types the pipelines
need.
The unfiltered calibs has many sets of calibs with the same type and
data ID, but different run and validity range. Calls to find_dataset
on each such calib return the same result. Do the filtering locally so
that we call find_dataset only once per each filtered calib.
@kfindeisen kfindeisen merged commit 83549c9 into main Dec 16, 2023
@kfindeisen kfindeisen deleted the tickets/DM-41713 branch December 16, 2023 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants