DM-36080: Separate GCP-specific code in Prompt Processing prototype #27

kfindeisen · 2022-10-05T17:38:39Z

This PR does a large amount of refactoring centered on activator.py, plus some minor reimplementation, with the following goals:

Removing all LSST dependencies, particularly Butler and Instrument, from activator.py.
Providing a more platform-independent guarantee that local Butler repos will never collide.
Separating code within activator.py that depends on Google components (Storage and PubSub) from that which does not.
More unit test coverage of functionality that was previously in activator.py, which cannot be imported within the standard rubin-devl environment.
Providing more platform-independent configuration of the databases, especially the APDB.

python/activator/activator.py

dspeck1 · 2022-10-07T19:45:57Z

Added minor comments on Google Cloud Storage imports and storage client library still being there. Fine for now, but if you were intending to remove all GCP dependencies.

kfindeisen · 2022-10-07T19:58:55Z

Added minor comments on Google Cloud Storage imports and storage client library still being there. Fine for now, but if you were intending to remove all GCP dependencies.

No, the goal was to remove everything that wasn't a GCP dependency... at least, as far as practical. See the PR description and the original DM-36080 description (yes, they don't quite match).

parejkoj

Looks good in general: a few comments on labelling things.

There are several new TODOs here: should they get tickets? We probably do want a "figure out how much overhead various steps cause" ticket in the future, which is what some of the TODOs are for. If they're just on instantiating the class, it may not be a problem, unless we instantiate a new class for each Visit.

python/activator/activator.py

python/activator/logger.py

python/activator/middleware_interface.py

parejkoj · 2022-10-13T21:43:55Z

python/activator/raw.py

+    def __str__(self):
+        """Return a short string that disambiguates the image.
+        """
+        return f"(exposure {self.exp_id}, group {self.group}/{self.snap})"


Do we want an = after the names, instead of spaces?

tests/test_middleware_interface.py

tests/test_raw.py

This avoids leaving any global variables dangling in `activator.py`. Unfortunately, I don't see a way to unit test `setup_google_logger`: its reliance on global state and standard error means that neither assertLogs nor a custom stream will work.

Putting the base path in an environment variable lets us control the (environment-dependent) path inside the activator, while letting MiddlewareInterface take responsibility for the repo without needing to know about the environment.

All tests have been rewritten to use the MiddlewareInterface's internal Butler, allowing future changes to how MiddlewareInterface is constructed.

This change not only partially decouples the activator from LSST/Butler code, it also allows MiddlewareInterface to guarantee that the Butler is unique and isolated from all other MiddlewareInterface instances. Previously, this had to be enforced by the creator as a precondition.

There is no guarantee that each MiddlewareInterface object will be associated with a unique PID in all future architectures; for example, on GCP each worker always has PID 503, and the repos are disambiguated by being on different (virtual) filesystems.

This change removes any need for activator.py to know about the Butler, although a Butler object is still returned from one function call in the activator. This is a lesser evil than having the details of the central Butler definition be coded into the MiddlewareInterface class.

A temporary Instrument object is still needed to translate between class name and short name.

Making MWI more flexible on this matter gives us a lot more freedom in how we handle instrument information in activator.py.

The short name is used in most contexts, including (by requirement) the next_visit protocol. Eliminating all uses of the class name allows us to remove conversion code between the two, thereby removing the last LSST imports from the activator. This is a breaking API change to the service's environment variables.

The parser is a fairly self-contained block of code, and this change makes the rest of the handler easier to read and more focused on the actual subscription handling.

This change keeps the activator code from depending on the details of the raw filename convention, particularly the (optimized) directory order, and prevents drift between the two places where the filename is parsed.

This makes the activator code simpler and easier to read, and avoids distracting from the main subscription loop.

There is already a debug log for raws reported through the subscription system, but already present is by far the more common case in testing.

This allows more flexibility in how the APDB and registry databases are set up, such as having them be different databases on the same PostgreSQL server.

The previous implementation stored the URI component(s) in object fields and assembled them on the fly. Now there's a private method for computing the URI. This is currently run once and stored on __init__ for convenience, but in the future the APDB may be accessed using a situational namespace.

This feature is essential to being able to run on the USDF development APDB, which is shared by multiple users. It's not logically related to DM-36080, but it's not worth its own ticket and we shouldn't make DM-36505 a blocker on the USDF migration.

kfindeisen force-pushed the tickets/DM-36080 branch 4 times, most recently from 233b610 to b32f534 Compare October 7, 2022 02:02

kfindeisen marked this pull request as ready for review October 7, 2022 16:52

dspeck1 reviewed Oct 7, 2022

View reviewed changes

python/activator/activator.py Show resolved Hide resolved

dspeck1 reviewed Oct 7, 2022

View reviewed changes

python/activator/activator.py Show resolved Hide resolved

kfindeisen requested a review from parejkoj October 10, 2022 17:46

kfindeisen force-pushed the tickets/DM-36080 branch from 1a7bffa to e3c6646 Compare October 11, 2022 23:50

parejkoj reviewed Oct 13, 2022

View reviewed changes

kfindeisen added 18 commits October 13, 2022 16:01

Clean up redundant GoogleFormatterTest method.

89ca223

Move local repo path into envvar.

6b03d91

Putting the base path in an environment variable lets us control the (environment-dependent) path inside the activator, while letting MiddlewareInterface take responsibility for the repo without needing to know about the environment.

Avoid assuming external Butler in test code.

039ece8

All tests have been rewritten to use the MiddlewareInterface's internal Butler, allowing future changes to how MiddlewareInterface is constructed.

Remove redundant variable from test setup.

e8b8264

Factor Instrument object out of activator.

7f1508e

A temporary Instrument object is still needed to translate between class name and short name.

Allow MiddlewareInterface code to take either instrument name.

5414c72

Making MWI more flexible on this matter gives us a lot more freedom in how we handle instrument information in activator.py.

Document environment variables.

18b8183

Factor message parsing out of message handler.

4cc06a2

The parser is a fairly self-contained block of code, and this change makes the rest of the handler easier to read and more focused on the actual subscription handling.

Factor filename parsing out of message handler.

5a3e2b1

This change keeps the activator code from depending on the details of the raw filename convention, particularly the (optimized) directory order, and prevents drift between the two places where the filename is parsed.

Factor snap-visit comparisons out of activator.

2d6f96b

This makes the activator code simpler and easier to read, and avoids distracting from the main subscription loop.

Add debug log for already-existing raws.

d562425

There is already a debug log for raws reported through the subscription system, but already present is by far the more common case in testing.

Move all DB configuration to environment variables.

5eaefd3

This allows more flexibility in how the APDB and registry databases are set up, such as having them be different databases on the same PostgreSQL server.

kfindeisen force-pushed the tickets/DM-36080 branch from c1f86b2 to 9bd8f58 Compare October 13, 2022 23:30

kfindeisen merged commit a45a862 into main Oct 14, 2022

kfindeisen deleted the tickets/DM-36080 branch October 14, 2022 00:12

kfindeisen mentioned this pull request Oct 14, 2022

Fix oversights in envvar descriptions #29

Merged

kfindeisen mentioned this pull request Oct 24, 2022

Update service environment variables. slaclab/rubin-usdf-prompt-processing#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DM-36080: Separate GCP-specific code in Prompt Processing prototype #27

DM-36080: Separate GCP-specific code in Prompt Processing prototype #27

Uh oh!

kfindeisen commented Oct 5, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dspeck1 commented Oct 7, 2022

Uh oh!

kfindeisen commented Oct 7, 2022 •

edited

Loading

Uh oh!

parejkoj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parejkoj Oct 13, 2022

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DM-36080: Separate GCP-specific code in Prompt Processing prototype #27

DM-36080: Separate GCP-specific code in Prompt Processing prototype #27

Uh oh!

Conversation

kfindeisen commented Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dspeck1 commented Oct 7, 2022

Uh oh!

kfindeisen commented Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parejkoj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parejkoj Oct 13, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kfindeisen commented Oct 5, 2022 •

edited

Loading

kfindeisen commented Oct 7, 2022 •

edited

Loading