DM-36586: Use a single output run in Prompt Prototype #45

kfindeisen · 2023-01-10T00:53:35Z

This PR replaces the per-visit raw collection and per-processing output collection with a single raw collection and a per-pipeline output collection, avoiding some scaling and concurrency issues in the central repository. However, the latter change required some reworking of MiddlewareInterface, which had assumed that each output collection was new and unique.

The old test tried to conflate two different problems ("do datasets exist?" "what IDs do they have?"), making it inflexible. Breaking it up into two separate assertions makes the code more readable, more flexible, and more naturally extensible to cases with multiple data IDs.

Giving the input exposures (and output exposures) distinct IDs is more realistic, and will be necessary once we put different visits in the same run.

Previously, there was a single raw collection for each group, and raw/all was a chained collection that linked them together. This architecture not only deviated from repository conventions, it had the same scaling problems as timestamped output runs. Now that raws are guaranteed to have unique IDs in simulated data, we can safely put them all in a single standard run.

The new docs provide the exact location for Butler commands, and update the contents to reflect DM-37072 and DM-37751.

prompt_prototype is an LSST-DM package, and follows the usual process of needing to be setup once per session.

hsinfang

Looks like a great improvement to the collection management!

python/activator/middleware_interface.py

This change allows the pipeline file to be looked up by other code, and will make it easier to make the pipeline configurable later.

Reducing the number of possible runs allows the merging of collections that share the same configuration and provenance. Merging any further would result in Butler conflicts.

This commit is merely a string substitution; no attempt has been made to change the structure of MiddlewareInterface. However, a unit test that assumed pipelines are never unique had to be tweaked.

The old variable potentially conflicted with the `time` package.

A large output chain is not desired in the central repo, and the chaining process was prone to races between different workers.

The cleanup keeps the local repo from growing unnecessarily large, reducing the resource load on the prompt processing cluster. This cleanup also supersedes the init-output removal hack added in DM-37068.

kfindeisen force-pushed the tickets/DM-36586 branch from b21047b to 99794b1 Compare January 12, 2023 23:36

kfindeisen added 3 commits February 16, 2023 10:49

Use distinct exposure IDs in MiddlewareInterfaceWriteableTest.

927e179

Giving the input exposures (and output exposures) distinct IDs is more realistic, and will be necessary once we put different visits in the same run.

kfindeisen force-pushed the tickets/DM-36586 branch from 99794b1 to eb0ce71 Compare February 16, 2023 18:57

kfindeisen added 3 commits February 22, 2023 13:53

Update documentation of central repo.

665dbf9

The new docs provide the exact location for Butler commands, and update the contents to reflect DM-37072 and DM-37751.

Update documentation of kubectl credentials.

fa5c706

Document that prompt_prototype must be set up before use.

d7e51b4

prompt_prototype is an LSST-DM package, and follows the usual process of needing to be setup once per session.

kfindeisen force-pushed the tickets/DM-36586 branch from eb0ce71 to a6035a9 Compare February 23, 2023 00:27

kfindeisen marked this pull request as ready for review February 23, 2023 21:18

kfindeisen force-pushed the tickets/DM-36586 branch from 6d0e9b7 to 20e98e2 Compare February 23, 2023 21:51

kfindeisen requested a review from hsinfang February 23, 2023 21:54

hsinfang approved these changes Feb 24, 2023

View reviewed changes

kfindeisen added 7 commits February 24, 2023 13:29

Factor pipeline file retrieval from pipeline generation.

8904376

This change allows the pipeline file to be looked up by other code, and will make it easier to make the pipeline configurable later.

Define a new output run that is only unique per-pipeline.

7eaf886

Reducing the number of possible runs allows the merging of collections that share the same configuration and provenance. Merging any further would result in Butler conflicts.

Use new-style run in _prep_collections.

aa7adc9

This commit is merely a string substitution; no attempt has been made to change the structure of MiddlewareInterface. However, a unit test that assumed pipelines are never unique had to be tweaked.

Rename local variable in unit test setup.

d6fc8be

The old variable potentially conflicted with the `time` package.

Remove chain support from MiddlewareInterface.export_output.

9cec157

A large output chain is not desired in the central repo, and the chaining process was prone to races between different workers.

Add function for removing specific collections from chains.

9eca69b

Add a post-processing cleanup step.

633d399

The cleanup keeps the local repo from growing unnecessarily large, reducing the resource load on the prompt processing cluster. This cleanup also supersedes the init-output removal hack added in DM-37068.

kfindeisen force-pushed the tickets/DM-36586 branch from 20e98e2 to 633d399 Compare February 24, 2023 23:20

kfindeisen merged commit 2f0745e into main Feb 24, 2023

kfindeisen deleted the tickets/DM-36586 branch February 24, 2023 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DM-36586: Use a single output run in Prompt Prototype #45

DM-36586: Use a single output run in Prompt Prototype #45

Uh oh!

kfindeisen commented Jan 10, 2023 •

edited

Loading

Uh oh!

hsinfang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DM-36586: Use a single output run in Prompt Prototype #45

DM-36586: Use a single output run in Prompt Prototype #45

Uh oh!

Conversation

kfindeisen commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsinfang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfindeisen commented Jan 10, 2023 •

edited

Loading