Skip to content

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Mar 28, 2023

This PR redesigns MiddlewareInterface to be a per-exposure rather than a per-worker object. This change will make it much easier and more foolproof to work with MiddlewareInterface in the future, because it no longer needs to avoid storing per-exposure state. This done, the PR replaces MiddlewareInterface's use of SimplePipelineExecutor with the SeparablePipelineExecutor introduced in lsst/ctrl_mpexec#230.

The changes to build-service.yml and Dockerfile.activator are not part of the PR, and were temporary hacks to get a service container that had access to SeparablePipelineExecutor. I intend to wait until lsst/ctrl_mpexec#230 is incorporated into a nightly/weekly before merging this PR; this will allow all actions to complete without hacks.

@kfindeisen kfindeisen marked this pull request as ready for review April 10, 2023 22:55
@kfindeisen kfindeisen requested a review from TallJimbo April 10, 2023 22:55

central_butler = get_central_butler(calib_repo, instrument_name)
# local_repo is a temporary directory with the same lifetime as this process.
local_repo = make_local_repo(local_repos, central_butler, instrument_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can image why we can't use a context manager for cleanup here; do we need to worry about that, or is the lifetime of this process tied to some kind of more complete reset of the compute resource that will take care of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, we want to reuse the same local repo for processing multiple exposures. And AFAIK Flask doesn't let us pass in arguments to next_visit_handler, so it needs to be (directly or indirectly) a global. I think that precludes use of a context manager.

Basically, the way the Prompt Processing architecture is set up, the Python process represented by activator.py has the same lifetime as the (virtual) filesystem it's running on. I don't know if that answers your second question.

# repos may have been modified by other MWI instances.
# TODO: get a proper synchronization API for Butler
self.central_butler.registry.refresh()
self.butler.registry.refresh()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what this refresh and the one below are doing for you, especially after the switch to per-visit MiddlewareInterface objects. Do workers ever need to use something that is only added to the repo by some other worker? Or is this to avoid races when creating some shared repo content?

Copy link
Member Author

@kfindeisen kfindeisen Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refresh might not be strictly necessary, because currently there's still only one MiddlewareInterface instance per process/worker at a time. But MiddlewareInterface itself no longer guarantees that there are no concurrent writes to the local repo, and we might add a more complex execution framework in the activator later without realizing that MWI was making assumptions. So I feel better having a refresh before each operation, even if it slows things down a bit.

To answer your more specific questions, we expect to only import calibs/templates/refcats/etc. once, the first time they're needed (this is the main advantage of reusing the local repo). So while each local repo is unique to its worker, a MiddlewareInterface object may indeed need to use something that was added by a different object.

Copy link
Member Author

@kfindeisen kfindeisen Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I misread the original question. Yes, the central_butler refreshes are to avoid conflicts between concurrently writing workers. So far the only such conflicts have involved things like collection creation, and there shouldn't be any right now, but as with the local repo I'm trying to be robust to unknown future changes. There's still a lot of churn in architecture, execution strategy, repo organization, etc.

The make_local_repo function makes it possible to decouple the lifetime
of the local repo from the lifetime of individual MiddlewareInterface
objects, allowing more flexibility in how the latter are created.
This change makes it possible for multiple MiddlewareInterface objects
to be defined, in order, for the same local repo. Since
MiddlewareInterface no longer guarantees that it has exclusive access
to its repo, I've added a pre-emptive refresh call before any
operations that access it.
Using a new object for each visit makes it possible to start processing
from an (almost) clean slate without needing to carefully manage object
state. Since the local repo is shared between objects, we still get the
benefits of local dataset caching.
SeparablePipelineExecutor is configured to give equivalent behavior to
the old use of SimplePipelineExecutor. Actually taking advantage of its
new capabilities will be done on later tickets.
@kfindeisen kfindeisen force-pushed the tickets/DM-36162 branch 2 times, most recently from adc6f55 to 8ab8cbd Compare April 17, 2023 17:36
@kfindeisen kfindeisen merged commit e7a37c1 into main Apr 17, 2023
@kfindeisen kfindeisen deleted the tickets/DM-36162 branch April 17, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants