DM-49670: Add option to use a service for Butler database writes #330

dhirving · 2025-07-08T22:27:59Z

Added an alternate implementation for writing Butler outputs to the central repository, following the design in DMTN-310. When USE_KAFKA_BUTLER_WRITER=1, writes to the central Butler database will be done indirectly by sending a Kafka message to a service, instead of connecting directly from Prompt Processing.

kfindeisen

Sorry for the slow reply -- I'm returning this ahead of the Phalanx review to not keep you waiting any longer.

While the basic approach looks good, I'm concerned about how the environment variable interface is organized (I might have more specific advice once I see Phalanx), and the introduction of certain Middleware-isms (specifically, split responsibilities and implementation-level dependencies instead of modularization, and the use of assert to tape over structural issues).

Please also clean up the commit history; the current history splits changes in an "overlapping" fashion that means no single commit is associated with any logical change. This makes the code hard to review, hard to understand for future development, and hard to reorganize/revert should the need arise.

kfindeisen · 2025-07-28T20:08:01Z

python/activator/middleware_interface.py

-    def _export_exposure_dimensions(src_butler, dest_butler, **kwargs):
-        """Transfer dimensions generated from an exposure to the central repo.
+    def _export_exposure_dimensions(src_butler, **kwargs) -> dict[str, list[DimensionRecord]]:
+        """Retrieve dimension records generated from an exposure that need to
+        be transferred to the central repo.


Please rename this method and parameters to fit its new purpose (perhaps _get_dimensions_to_export?) and avoid confusion. Also rewrite the docs, as almost none of it is appropriate for a local repo query.

kfindeisen · 2025-07-28T20:12:35Z

python/activator/middleware_interface.py

+
+        Returns
+        -------
+        dimension_records : `dict` [ `str` , `list` [ `lsst.daf.butler.DimensionRecord` ] ]


Excessive whitespace that impairs readability, see the DM style guide:

Suggested change

dimension_records : `dict` [ `str` , `list` [ `lsst.daf.butler.DimensionRecord` ] ]

dimension_records : `dict` [`str` , `list` [`lsst.daf.butler.DimensionRecord`]]

Do these actually need to be specced as a dict and list instead of, say, a mapping and a collection? Surely the records can't be ordered in any meaningful way.

python/activator/middleware_interface.py

python/activator/activator.py

tests/test_kafka_butler_writer.py

kfindeisen · 2025-07-30T17:51:29Z

tests/test_kafka_butler_writer.py

+    def test_transfer_outputs(self):
+        data_dir = os.path.join(os.path.abspath(os.path.dirname(__file__)), "data")
+        repository_dir = os.path.join(data_dir, "central_repo")
+        butler = Butler(repository_dir, writeable=False)


Please use setUp as appropriate, to leave room for more tests in the future.

I still recommend using setUp or setUpClass for anything that would be shared among tests, such as the test repo. Not sure about the datasets/dimension records, I can see those changing from case to case.

tests/test_kafka_butler_writer.py

kfindeisen · 2025-07-30T17:57:38Z

tests/test_kafka_butler_writer.py

+        dimension_record_count = 0
+        for dimension in ["instrument", "skymap"]:
+            records = butler.query_dimension_records(dimension)
+            dimension_record_count += len(records)


What is the purpose of tracking dimension_record_count separately? dimension_records is self-contained.

dimension_records is a dict of lists though, so this:

Avoids a second loop to get the count.

Helps catch potential errors where the code we are calling modifies dimension_records when it shouldn't.

I'm worried it could introduce errors in the test code, but it's true that nonmodification is hard to enforce. Maybe add a comment that that's what you're testing?

tests/test_kafka_butler_writer.py

python/activator/activator.py

kfindeisen

Looks good, thanks!

kfindeisen · 2025-08-20T17:27:51Z

python/activator/middleware_interface.py

+
+        Returns
+        -------
+        transferred : `list` [`DatasetRef`]


Inconsistent with the previous style:

Suggested change

transferred : `list` [`DatasetRef`]

transferred : `list` [`lsst.daf.butler.DatasetRef`]

kfindeisen · 2025-08-20T17:37:03Z

python/activator/middleware_interface.py

+        Returns
+        -------
+        transferred : `list` [`DatasetRef`]
+            List of datasets actually transferred.


Again, do these actually need to be specced as concrete classes like list instead of generic Collection? While you could define an order for (serial) transfers, logically the transferred datasets are a set -- what matters is whether a particular dataset got transferred or not.

I realize that nested collections like dimension_records run into type invariance and need to use concrete element types, but that doesn't apply here.

kfindeisen · 2025-08-20T18:01:50Z

python/activator/middleware_interface.py

    @connect.retry(2, DATASTORE_EXCEPTIONS, wait=repo_retry)
    def _export_subset(self, exposure_ids: set[int],
-                       dataset_types: typing.Any, in_collections: typing.Any) -> None:
+                       dataset_types: typing.Any, in_collections: typing.Any) -> list[DatasetRef]:


Note that the docs spec this as a generic collection. As stated above, I think that's the right level of abstraction, but the method should at least be self-consistent.

kfindeisen · 2025-08-20T18:04:25Z

python/activator/middleware_interface.py

-            The butler from which to transfer dimension records.
-        dest_butler : `lsst.daf.butler.Butler`
-            The butler to which to transfer records.
+            The butler from which to retrieve dimension records.


The name src_butler still seems inappropriate in the new context. For a query operation, can this just be butler?

kfindeisen · 2025-08-20T18:46:19Z

tests/test_kafka_butler_writer.py

+    def test_transfer_outputs(self):
+        data_dir = os.path.join(os.path.abspath(os.path.dirname(__file__)), "data")
+        repository_dir = os.path.join(data_dir, "central_repo")
+        butler = Butler(repository_dir, writeable=False)


I still recommend using setUp or setUpClass for anything that would be shared among tests, such as the test repo. Not sure about the datasets/dimension records, I can see those changing from case to case.

kfindeisen · 2025-08-20T20:55:21Z

python/activator/middleware_interface.py

        return [dstype.name for dstype in butler.registry.queryDatasetTypes(...)
                if "detector" in dstype.dimensions]

    @connect.retry(2, DATASTORE_EXCEPTIONS, wait=repo_retry)


Sorry, one last question: does this need to be updated to account for KafkaButlerWriter.transfer_outputs? I assume it raises a completely different set of exceptions on e.g. network problems.

Good question.

The exceptions thrown when there are S3 issues should be the same as they were previously -- so the exceptions in the existing list are still useful.

The Kafka client should be doing some amount of retrying/reconnecting internally, so we may not want to attempt to retry on a Kafka exception here. The confluent_kafka documentation isn't very clear on the exact sorts of errors we might expect. I should be able to simulate some of the more obvious issues like the broker being down, so I'll poke at this a bit when I'm testing the changes in DM-52180.

To prepare for transferring datasets to the central Butler using Kafka, add an interface for an object that does the Butler writes back to central, to allow for alternate implementations. This re-orders the middleware export process so that all of the writes to central happen at a single point in the code.

Added an option to send a Kafka message to a microservice to write output datasets to the central Butler database, instead of connecting directly to the database. This is intended to reduce database contention, as detailed in DMTN-310.

dhirving changed the title ~~DM-49670: Add option for using a service for Butler database writes.~~ DM-49670: Add option for using a service for Butler database writes Jul 8, 2025

dhirving force-pushed the tickets/DM-49670 branch 2 times, most recently from a707deb to 85a2cd2 Compare July 21, 2025 21:07

dhirving changed the title ~~DM-49670: Add option for using a service for Butler database writes~~ DM-49670: Add option to use a service for Butler database writes Jul 21, 2025

dhirving force-pushed the tickets/DM-49670 branch 2 times, most recently from aa9810b to e8768ca Compare July 22, 2025 22:50

dhirving marked this pull request as ready for review July 24, 2025 18:11

kfindeisen requested changes Jul 30, 2025

View reviewed changes

kfindeisen mentioned this pull request Aug 1, 2025

DM-49670: Deploy Butler writer service for prompt processing lsst-sqre/phalanx#5079

Closed

kfindeisen reviewed Aug 6, 2025

View reviewed changes

python/activator/activator.py Show resolved Hide resolved

dhirving force-pushed the tickets/DM-49670 branch 2 times, most recently from 0f874a4 to 1939acd Compare August 13, 2025 00:45

kfindeisen approved these changes Aug 20, 2025

View reviewed changes

kfindeisen reviewed Aug 20, 2025

View reviewed changes

dhirving force-pushed the tickets/DM-49670 branch from 1939acd to a9a06af Compare August 21, 2025 19:40

dhirving added 2 commits August 21, 2025 15:27

Add option to use service for Butler outputs

7bf8843

Added an option to send a Kafka message to a microservice to write output datasets to the central Butler database, instead of connecting directly to the database. This is intended to reduce database contention, as detailed in DMTN-310.

dhirving force-pushed the tickets/DM-49670 branch from a9a06af to 7bf8843 Compare August 21, 2025 22:27

dhirving merged commit 100000b into main Aug 21, 2025
11 checks passed

dhirving deleted the tickets/DM-49670 branch August 21, 2025 22:42

	dimension_records : `dict` [ `str` , `list` [ `lsst.daf.butler.DimensionRecord` ] ]
	dimension_records : `dict` [`str` , `list` [`lsst.daf.butler.DimensionRecord`]]

	transferred : `list` [`DatasetRef`]
	transferred : `list` [`lsst.daf.butler.DatasetRef`]

DM-49670: Add option to use a service for Butler database writes #330

DM-49670: Add option to use a service for Butler database writes #330

Uh oh!

Conversation

dhirving commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhirving commented Jul 8, 2025 •

edited

Loading