-
Notifications
You must be signed in to change notification settings - Fork 0
DM-36586: Use a single output run in Prompt Prototype #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b21047b
to
99794b1
Compare
The old test tried to conflate two different problems ("do datasets exist?" "what IDs do they have?"), making it inflexible. Breaking it up into two separate assertions makes the code more readable, more flexible, and more naturally extensible to cases with multiple data IDs.
Giving the input exposures (and output exposures) distinct IDs is more realistic, and will be necessary once we put different visits in the same run.
Previously, there was a single raw collection for each group, and raw/all was a chained collection that linked them together. This architecture not only deviated from repository conventions, it had the same scaling problems as timestamped output runs. Now that raws are guaranteed to have unique IDs in simulated data, we can safely put them all in a single standard run.
99794b1
to
eb0ce71
Compare
The new docs provide the exact location for Butler commands, and update the contents to reflect DM-37072 and DM-37751.
prompt_prototype is an LSST-DM package, and follows the usual process of needing to be setup once per session.
eb0ce71
to
a6035a9
Compare
6d0e9b7
to
20e98e2
Compare
hsinfang
approved these changes
Feb 24, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great improvement to the collection management!
This change allows the pipeline file to be looked up by other code, and will make it easier to make the pipeline configurable later.
Reducing the number of possible runs allows the merging of collections that share the same configuration and provenance. Merging any further would result in Butler conflicts.
This commit is merely a string substitution; no attempt has been made to change the structure of MiddlewareInterface. However, a unit test that assumed pipelines are never unique had to be tweaked.
The old variable potentially conflicted with the `time` package.
A large output chain is not desired in the central repo, and the chaining process was prone to races between different workers.
The cleanup keeps the local repo from growing unnecessarily large, reducing the resource load on the prompt processing cluster. This cleanup also supersedes the init-output removal hack added in DM-37068.
20e98e2
to
633d399
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR replaces the per-visit raw collection and per-processing output collection with a single raw collection and a per-pipeline output collection, avoiding some scaling and concurrency issues in the central repository. However, the latter change required some reworking of
MiddlewareInterface
, which had assumed that each output collection was new and unique.