Skip to content

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Oct 5, 2022

This PR does a large amount of refactoring centered on activator.py, plus some minor reimplementation, with the following goals:

  • Removing all LSST dependencies, particularly Butler and Instrument, from activator.py.
  • Providing a more platform-independent guarantee that local Butler repos will never collide.
  • Separating code within activator.py that depends on Google components (Storage and PubSub) from that which does not.
  • More unit test coverage of functionality that was previously in activator.py, which cannot be imported within the standard rubin-devl environment.
  • Providing more platform-independent configuration of the databases, especially the APDB.

@kfindeisen kfindeisen force-pushed the tickets/DM-36080 branch 4 times, most recently from 233b610 to b32f534 Compare October 7, 2022 02:02
@kfindeisen kfindeisen marked this pull request as ready for review October 7, 2022 16:52
@dspeck1
Copy link
Collaborator

dspeck1 commented Oct 7, 2022

Added minor comments on Google Cloud Storage imports and storage client library still being there. Fine for now, but if you were intending to remove all GCP dependencies.

@kfindeisen
Copy link
Member Author

kfindeisen commented Oct 7, 2022

Added minor comments on Google Cloud Storage imports and storage client library still being there. Fine for now, but if you were intending to remove all GCP dependencies.

No, the goal was to remove everything that wasn't a GCP dependency... at least, as far as practical. See the PR description and the original DM-36080 description (yes, they don't quite match).

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general: a few comments on labelling things.

There are several new TODOs here: should they get tickets? We probably do want a "figure out how much overhead various steps cause" ticket in the future, which is what some of the TODOs are for. If they're just on instantiating the class, it may not be a problem, unless we instantiate a new class for each Visit.

def __str__(self):
"""Return a short string that disambiguates the image.
"""
return f"(exposure {self.exp_id}, group {self.group}/{self.snap})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an = after the names, instead of spaces?

This avoids leaving any global variables dangling in `activator.py`.

Unfortunately, I don't see a way to unit test `setup_google_logger`: its
reliance on global state and standard error means that neither assertLogs nor
a custom stream will work.
Putting the base path in an environment variable lets us control the
(environment-dependent) path inside the activator, while letting
MiddlewareInterface take responsibility for the repo without needing to
know about the environment.
All tests have been rewritten to use the MiddlewareInterface's internal
Butler, allowing future changes to how MiddlewareInterface
is constructed.
This change not only partially decouples the activator from LSST/Butler
code, it also allows MiddlewareInterface to guarantee that the Butler
is unique and isolated from all other MiddlewareInterface instances.
Previously, this had to be enforced by the creator as a precondition.
There is no guarantee that each MiddlewareInterface object will be
associated with a unique PID in all future architectures; for example,
on GCP each worker always has PID 503, and the repos are disambiguated
by being on different (virtual) filesystems.
This change removes any need for activator.py to know about the Butler,
although a Butler object is still returned from one function call in
the activator. This is a lesser evil than having the details of the
central Butler definition be coded into the MiddlewareInterface class.
A temporary Instrument object is still needed to translate between
class name and short name.
Making MWI more flexible on this matter gives us a lot more freedom in
how we handle instrument information in activator.py.
The short name is used in most contexts, including (by requirement) the
next_visit protocol. Eliminating all uses of the class name allows us
to remove conversion code between the two, thereby removing the last
LSST imports from the activator.

This is a breaking API change to the service's environment variables.
The parser is a fairly self-contained block of code, and this change
makes the rest of the handler easier to read and more focused on the
actual subscription handling.
This change keeps the activator code from depending on the details of
the raw filename convention, particularly the (optimized)
directory order, and prevents drift between the two places where the
filename is parsed.
This makes the activator code simpler and easier to read, and avoids
distracting from the main subscription loop.
There is already a debug log for raws reported through the subscription
system, but already present is by far the more common case in testing.
This allows more flexibility in how the APDB and registry databases are
set up, such as having them be different databases on the same
PostgreSQL server.
The previous implementation stored the URI component(s) in object
fields and assembled them on the fly. Now there's a private method
for computing the URI. This is currently run once and stored on
__init__ for convenience, but in the future the APDB may be accessed
using a situational namespace.
This feature is essential to being able to run on the USDF development
APDB, which is shared by multiple users. It's not logically related to
DM-36080, but it's not worth its own ticket and we shouldn't make
DM-36505 a blocker on the USDF migration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants