DM-38498: rewrite QuantumGraph generation #370

TallJimbo · 2023-08-21T21:08:47Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2023-08-21T21:19:05Z

Codecov Report

Patch coverage: 65.41% and project coverage change: -0.19% ⚠️

Comparison is base (870df52) 83.52% compared to head (6457335) 83.33%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #370      +/-   ##
==========================================
- Coverage   83.52%   83.33%   -0.19%     
==========================================
  Files          77       81       +4     
  Lines        9212     9464     +252     
  Branches     1782     1772      -10     
==========================================
+ Hits         7694     7887     +193     
- Misses       1227     1289      +62     
+ Partials      291      288       -3

Files Changed	Coverage Δ
python/lsst/pipe/base/__init__.py	`100.00% <ø> (ø)`
python/lsst/pipe/base/connections.py	`79.67% <ø> (+0.65%)`	⬆️
python/lsst/pipe/base/pipeline_graph/_edges.py	`83.76% <ø> (+3.14%)`	⬆️
python/lsst/pipe/base/prerequisite_helpers.py	`48.45% <48.45%> (ø)`
...n/lsst/pipe/base/pipeline_graph/_pipeline_graph.py	`92.22% <59.09%> (-1.62%)`	⬇️
.../pipe/base/all_dimensions_quantum_graph_builder.py	`64.50% <64.50%> (ø)`
python/lsst/pipe/base/quantum_graph_builder.py	`64.59% <64.59%> (ø)`
...on/lsst/pipe/base/pipeline_graph/_dataset_types.py	`90.12% <66.66%> (-0.91%)`	⬇️
python/lsst/pipe/base/pipeline_graph/_tasks.py	`91.98% <80.00%> (-0.54%)`	⬇️
python/lsst/pipe/base/quantum_graph_skeleton.py	`81.61% <81.61%> (ø)`
... and 6 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andy-slac

I only looked at the first four commits and I have to stop for today. Here is a small number of comments, will continue tomorrow.

python/lsst/pipe/base/pipeline_graph/_edges.py

python/lsst/pipe/base/quantum_graph_builder.py

andy-slac · 2023-08-22T20:51:35Z

python/lsst/pipe/base/quantum_graph_builder.py

+        # starts with the output run collection, as an optimization to avoid
+        # queries later.
+        if self.skip_existing_in and self.output_run_exists:
+            first, *_ = self.butler.registry.queryCollections(self.skip_existing_in, flattenChains=True)


Can it happen that queryCollections does not find any collections (returns empty list)?

Good catch; it'd definitely be a caller error, but it is possible, and we should re-raise with a better error message.

python/lsst/pipe/base/quantum_graph_builder.py

andy-slac · 2023-08-22T22:20:16Z

python/lsst/pipe/base/quantum_graph_skeleton.py

+        ----------
+        task_key : `QuantumKey` or `TaskInitKey`
+            Identifier for the quantum node.
+        dataset_key : `DatasetKey` or `PrerequisiteKey`


Prerequisites are inputs-only?

python/lsst/pipe/base/prerequisite_helpers.py

TallJimbo

I've addressed the first round of review comments (lots of fixup commits that will be squashed later) and squashed a bug discovered by Jenkins (mostly missing exception-handling around registry queries).

TallJimbo · 2023-08-23T14:38:05Z

python/lsst/pipe/base/quantum_graph_builder.py

+        # starts with the output run collection, as an optimization to avoid
+        # queries later.
+        if self.skip_existing_in and self.output_run_exists:
+            first, *_ = self.butler.registry.queryCollections(self.skip_existing_in, flattenChains=True)


Good catch; it'd definitely be a caller error, but it is possible, and we should re-raise with a better error message.

TallJimbo · 2023-08-23T14:44:21Z

python/lsst/pipe/base/quantum_graph_skeleton.py

+    def update(self, other: QuantumGraphSkeleton) -> None:
+        """Copy all nodes from ``other`` to ``self``."""
+        for task_label, (_, quanta) in other._tasks.items():
+            self._tasks[task_label][1].update(quanta)


Yes. That's fine for the only use case right now and that requirement keeps the implementation simple, so I'll include it in the documentation.

andy-slac

I checked all remaining commits, looks great, a couple of minor comments.

python/lsst/pipe/base/graphBuilder.py

andy-slac · 2023-08-23T16:40:00Z

python/lsst/pipe/base/quantum_graph_builder.py

@@ -169,6 +167,7 @@ def __init__(
        skip_existing_in: Sequence[str] = (),
        clobber: bool = False,
    ):
+        self.log = getLogger(__name__)


General comment - I usually try to avoid having Logger attributes, it makes instances un-picklable, but I think it should not be an issue in this case. Context where it could matter is multiprocessing, we'll probably never use that for graph building.

Thanks for the heads-up; I did not know about the issue with logs and pickling/multiprocessing.

I think I will leave this as it is - as you said, multiprocessing is unlikely to be an issue, and I need both self.log and self.metadata to allow the timeMethod decorator to work.

andy-slac · 2023-08-23T17:39:46Z

python/lsst/pipe/base/quantum_graph_skeleton.py

+    dataset_id_bytes: bytes
+    """Dataset ID (UUID) as raw bytes."""


I could not find how dataset_id_bytes is used in the code. Would it be more efficient to store it as int?

It's never actually accessed directly, but it is used as part of the (generated) namedtuple __eq__ and __hash__.

I guess int might be more efficient if that can be boiled down to a native uint128 on platforms that support it, but I am pretty certain this is not at all a bottleneck - converting the DataCoordinate attributes to tuples on this commit was huge for performance (several orders of magnitude), but the hotspot where the set operations on those occur is in the loop over query results before any prerequisites are added.

These mostly use the same algorithm as the one in graphBuilder.py, but they split the pipeline up into disconnected subgraphs first, which will provide a nice performance boost for some pipelines. The other algorithmic difference is that pruning via adjustQuantum now happens directly in QG generation, making the stuff for pruning in QuantumGraph construction unnecessary, and allowing adjustQuantum to raise NoWorkFound even during QG generation. By building on PipelineGraph's more careful handling of storage class overrides, this should make it much, much harder for those problems to creep back in. And finally, the split here into an ABC and implementation should make it much easier to handle special QG generation cases, like the ones for gathering resource usage and HiPS generation, as they can now delegate the stuff they don't want to customize to the base class. This may be useful for generating simple QGs in Prompt Processing, too (but I'm not convinced it'd gain us much) or as a starting point for a more advanced general-purpose QG generation algorithm (though it's unlikely the base class could stay _exactly_ as is for that.

It's easier for a user who wants to adjust verbosity to only deal with one logger for all of QG generation.

We're always comparing data IDs with the same dimensions, so we can do it much faster by just comparing value tuples.

Make use of the butler serialization caching mechanisms to make sure object are effectively cached instead of reconstructing objects needlessly. Also lower the compression ratio of LZMA. This results in slightly larger graph sizes, but is offset by a large runtime gain.

QuantumGraph doesn't need to be able to prune itself anymore, since that's now handled earlier, inside QuantumGraphBuilder (where we can do it more efficiently). And without that pruning code, it's easy to replace QuantumGraph's _datasetRefDict attribute with a temporary networkx graph, and this sidesteps a problem in which different DatasetRefs don't compare as equal if they have different storage classes, which was causing QGs to lack some edges they should have had with the QuantumGraphBuilder. I believe this was because the old QG generation algorithm passed DatasetRefs with incorrect storage classes to QuantumGraph, and that also sidestepped the bug.

This class was previously used for both task <-> dataset type graphs and quantum <-> dataset graphs, and now it's just used for the former. So it doesn't need to be generic, and it doesn't need to support node removal anymore. Eventually it should be replaced entirely by PipelineGraph, but that's out of scope for now (it's part of DM-40442).

With the switch to tuples instead of DataCoordinates as keys, adding nodes implicitly when edges are added gets trickier; we can only support it for input datasets, because something else (registry queries or upstream tasks) are responsible for making DatasetRefs for those.

We want to allow users to include at least the output run here without it actually existing yet, and let that be a no-op, instead of complaining.

TallJimbo added 3 commits August 20, 2023 12:11

Move 'packages' init-output definition to automatic-connections module.

88022fb

Add some convenience accessors to PipelineGraph node classes.

581557d

Add method to split a PipelineGraph into independent subgraphs.

0069bb5

TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from c5ffc7e to f2093ce Compare August 21, 2023 21:14

andy-slac reviewed Aug 23, 2023

View reviewed changes

TallJimbo commented Aug 23, 2023

View reviewed changes

andy-slac approved these changes Aug 23, 2023

View reviewed changes

TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from 5f0c15a to 72e9272 Compare August 24, 2023 02:46

TallJimbo and others added 6 commits August 24, 2023 10:34

Make GraphBuilder delegate to QuantumGraphBuilder.

a430115

Use per-instance loggers instead of module-level loggers in QG gen.

3e8cf2c

It's easier for a user who wants to adjust verbosity to only deal with one logger for all of QG generation.

Add QG generation timing to QG metadata.

10de19a

Avoid DataCoordinate as comparison keys in QG gen.

66f2f1f

We're always comparing data IDs with the same dimensions, so we can do it much faster by just comparing value tuples.

TallJimbo force-pushed the tickets/DM-38498 branch from 72e9272 to f12fe70 Compare August 24, 2023 14:34

TallJimbo added 12 commits August 24, 2023 11:01

Add changelog entry.

3eab2cd

Guard against missing dataset types in prerequisite lookup.

aa40451

More guards against failures in prerequisite lookups.

c2e4441

Reraise with better error message when given bad skip_existing_in.

8e76925

Document precondition for QuantumGraphSkeleton.update.

6e3a48a

Fix argument type in QuantumGraphSkeleton.add_output_edge.

0d8f41e

Take more care with missing skip-existing-in collections.

636e097

We want to allow users to include at least the output run here without it actually existing yet, and let that be a no-op, instead of complaining.

Make sure skip-existing-in collections are not None.

bc6752a

Sort DatasetRefs in Quanta to make order deterministic.

6457335

TallJimbo force-pushed the tickets/DM-38498 branch from f12fe70 to 6457335 Compare August 24, 2023 15:20

TallJimbo merged commit 6bb983e into main Aug 24, 2023
12 of 14 checks passed

TallJimbo deleted the tickets/DM-38498 branch August 24, 2023 15:30

timj mentioned this pull request Sep 25, 2023

DM-31701: Enable deterministic dataset loading order in Gen3 middleware #241

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38498: rewrite QuantumGraph generation #370

DM-38498: rewrite QuantumGraph generation #370

TallJimbo commented Aug 21, 2023 •

edited

codecov bot commented Aug 21, 2023 •

edited

andy-slac left a comment

andy-slac Aug 22, 2023

TallJimbo Aug 23, 2023

andy-slac Aug 22, 2023

TallJimbo left a comment

TallJimbo Aug 23, 2023

TallJimbo Aug 23, 2023

andy-slac left a comment

andy-slac Aug 23, 2023

TallJimbo Aug 23, 2023

andy-slac Aug 23, 2023

TallJimbo Aug 23, 2023 •

edited

		dataset_id_bytes: bytes
		"""Dataset ID (UUID) as raw bytes."""

DM-38498: rewrite QuantumGraph generation #370

DM-38498: rewrite QuantumGraph generation #370

Conversation

TallJimbo commented Aug 21, 2023 • edited

Checklist

codecov bot commented Aug 21, 2023 • edited

Codecov Report

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo Aug 23, 2023 • edited

Choose a reason for hiding this comment

TallJimbo commented Aug 21, 2023 •

edited

codecov bot commented Aug 21, 2023 •

edited

TallJimbo Aug 23, 2023 •

edited