-
Notifications
You must be signed in to change notification settings - Fork 0
DM-36080: Separate GCP-specific code in Prompt Processing prototype #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
233b610
to
b32f534
Compare
Added minor comments on Google Cloud Storage imports and storage client library still being there. Fine for now, but if you were intending to remove all GCP dependencies. |
No, the goal was to remove everything that wasn't a GCP dependency... at least, as far as practical. See the PR description and the original DM-36080 description (yes, they don't quite match). |
1a7bffa
to
e3c6646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general: a few comments on labelling things.
There are several new TODOs here: should they get tickets? We probably do want a "figure out how much overhead various steps cause" ticket in the future, which is what some of the TODOs are for. If they're just on instantiating the class, it may not be a problem, unless we instantiate a new class for each Visit.
def __str__(self): | ||
"""Return a short string that disambiguates the image. | ||
""" | ||
return f"(exposure {self.exp_id}, group {self.group}/{self.snap})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want an =
after the names, instead of spaces?
This avoids leaving any global variables dangling in `activator.py`. Unfortunately, I don't see a way to unit test `setup_google_logger`: its reliance on global state and standard error means that neither assertLogs nor a custom stream will work.
Putting the base path in an environment variable lets us control the (environment-dependent) path inside the activator, while letting MiddlewareInterface take responsibility for the repo without needing to know about the environment.
All tests have been rewritten to use the MiddlewareInterface's internal Butler, allowing future changes to how MiddlewareInterface is constructed.
This change not only partially decouples the activator from LSST/Butler code, it also allows MiddlewareInterface to guarantee that the Butler is unique and isolated from all other MiddlewareInterface instances. Previously, this had to be enforced by the creator as a precondition.
There is no guarantee that each MiddlewareInterface object will be associated with a unique PID in all future architectures; for example, on GCP each worker always has PID 503, and the repos are disambiguated by being on different (virtual) filesystems.
This change removes any need for activator.py to know about the Butler, although a Butler object is still returned from one function call in the activator. This is a lesser evil than having the details of the central Butler definition be coded into the MiddlewareInterface class.
A temporary Instrument object is still needed to translate between class name and short name.
Making MWI more flexible on this matter gives us a lot more freedom in how we handle instrument information in activator.py.
The short name is used in most contexts, including (by requirement) the next_visit protocol. Eliminating all uses of the class name allows us to remove conversion code between the two, thereby removing the last LSST imports from the activator. This is a breaking API change to the service's environment variables.
The parser is a fairly self-contained block of code, and this change makes the rest of the handler easier to read and more focused on the actual subscription handling.
This change keeps the activator code from depending on the details of the raw filename convention, particularly the (optimized) directory order, and prevents drift between the two places where the filename is parsed.
This makes the activator code simpler and easier to read, and avoids distracting from the main subscription loop.
There is already a debug log for raws reported through the subscription system, but already present is by far the more common case in testing.
This allows more flexibility in how the APDB and registry databases are set up, such as having them be different databases on the same PostgreSQL server.
The previous implementation stored the URI component(s) in object fields and assembled them on the fly. Now there's a private method for computing the URI. This is currently run once and stored on __init__ for convenience, but in the future the APDB may be accessed using a situational namespace.
This feature is essential to being able to run on the USDF development APDB, which is shared by multiple users. It's not logically related to DM-36080, but it's not worth its own ticket and we shouldn't make DM-36505 a blocker on the USDF migration.
c1f86b2
to
9bd8f58
Compare
This PR does a large amount of refactoring centered on
activator.py
, plus some minor reimplementation, with the following goals:Butler
andInstrument
, fromactivator.py
.activator.py
that depends on Google components (Storage and PubSub) from that which does not.activator.py
, which cannot be imported within the standardrubin-devl
environment.