-
Notifications
You must be signed in to change notification settings - Fork 0
DM-36162: Upgrade from SimplePipelineExecutor in prompt prototype #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
befb1e0
to
2a48b42
Compare
836e96e
to
cce18c4
Compare
|
||
central_butler = get_central_butler(calib_repo, instrument_name) | ||
# local_repo is a temporary directory with the same lifetime as this process. | ||
local_repo = make_local_repo(local_repos, central_butler, instrument_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can image why we can't use a context manager for cleanup here; do we need to worry about that, or is the lifetime of this process tied to some kind of more complete reset of the compute resource that will take care of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons, we want to reuse the same local repo for processing multiple exposures. And AFAIK Flask doesn't let us pass in arguments to next_visit_handler
, so it needs to be (directly or indirectly) a global. I think that precludes use of a context manager.
Basically, the way the Prompt Processing architecture is set up, the Python process represented by activator.py
has the same lifetime as the (virtual) filesystem it's running on. I don't know if that answers your second question.
# repos may have been modified by other MWI instances. | ||
# TODO: get a proper synchronization API for Butler | ||
self.central_butler.registry.refresh() | ||
self.butler.registry.refresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what this refresh
and the one below are doing for you, especially after the switch to per-visit MiddlewareInterface
objects. Do workers ever need to use something that is only added to the repo by some other worker? Or is this to avoid races when creating some shared repo content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refresh
might not be strictly necessary, because currently there's still only one MiddlewareInterface
instance per process/worker at a time. But MiddlewareInterface
itself no longer guarantees that there are no concurrent writes to the local repo, and we might add a more complex execution framework in the activator later without realizing that MWI was making assumptions. So I feel better having a refresh before each operation, even if it slows things down a bit.
To answer your more specific questions, we expect to only import calibs/templates/refcats/etc. once, the first time they're needed (this is the main advantage of reusing the local repo). So while each local repo is unique to its worker, a MiddlewareInterface
object may indeed need to use something that was added by a different object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I misread the original question. Yes, the central_butler
refreshes are to avoid conflicts between concurrently writing workers. So far the only such conflicts have involved things like collection creation, and there shouldn't be any right now, but as with the local repo I'm trying to be robust to unknown future changes. There's still a lot of churn in architecture, execution strategy, repo organization, etc.
The make_local_repo function makes it possible to decouple the lifetime of the local repo from the lifetime of individual MiddlewareInterface objects, allowing more flexibility in how the latter are created.
This change makes it possible for multiple MiddlewareInterface objects to be defined, in order, for the same local repo. Since MiddlewareInterface no longer guarantees that it has exclusive access to its repo, I've added a pre-emptive refresh call before any operations that access it.
Using a new object for each visit makes it possible to start processing from an (almost) clean slate without needing to carefully manage object state. Since the local repo is shared between objects, we still get the benefits of local dataset caching.
SeparablePipelineExecutor is configured to give equivalent behavior to the old use of SimplePipelineExecutor. Actually taking advantage of its new capabilities will be done on later tickets.
adc6f55
to
8ab8cbd
Compare
This PR redesigns
MiddlewareInterface
to be a per-exposure rather than a per-worker object. This change will make it much easier and more foolproof to work withMiddlewareInterface
in the future, because it no longer needs to avoid storing per-exposure state. This done, the PR replacesMiddlewareInterface
's use ofSimplePipelineExecutor
with theSeparablePipelineExecutor
introduced in lsst/ctrl_mpexec#230.The changes toI intend to wait until lsst/ctrl_mpexec#230 is incorporated into a nightly/weekly before merging this PR; this will allow all actions to complete without hacks.build-service.yml
andDockerfile.activator
are not part of the PR, and were temporary hacks to get a service container that had access toSeparablePipelineExecutor
.