Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38498: rewrite QuantumGraph generation #370

Merged
merged 21 commits into from Aug 24, 2023
Merged

DM-38498: rewrite QuantumGraph generation #370

merged 21 commits into from Aug 24, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Aug 21, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@TallJimbo TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from c5ffc7e to f2093ce Compare August 21, 2023 21:14
@codecov
Copy link

codecov bot commented Aug 21, 2023

Codecov Report

Patch coverage: 65.41% and project coverage change: -0.19% ⚠️

Comparison is base (870df52) 83.52% compared to head (6457335) 83.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #370      +/-   ##
==========================================
- Coverage   83.52%   83.33%   -0.19%     
==========================================
  Files          77       81       +4     
  Lines        9212     9464     +252     
  Branches     1782     1772      -10     
==========================================
+ Hits         7694     7887     +193     
- Misses       1227     1289      +62     
+ Partials      291      288       -3     
Files Changed Coverage Δ
python/lsst/pipe/base/__init__.py 100.00% <ø> (ø)
python/lsst/pipe/base/connections.py 79.67% <ø> (+0.65%) ⬆️
python/lsst/pipe/base/pipeline_graph/_edges.py 83.76% <ø> (+3.14%) ⬆️
python/lsst/pipe/base/prerequisite_helpers.py 48.45% <48.45%> (ø)
...n/lsst/pipe/base/pipeline_graph/_pipeline_graph.py 92.22% <59.09%> (-1.62%) ⬇️
.../pipe/base/all_dimensions_quantum_graph_builder.py 64.50% <64.50%> (ø)
python/lsst/pipe/base/quantum_graph_builder.py 64.59% <64.59%> (ø)
...on/lsst/pipe/base/pipeline_graph/_dataset_types.py 90.12% <66.66%> (-0.91%) ⬇️
python/lsst/pipe/base/pipeline_graph/_tasks.py 91.98% <80.00%> (-0.54%) ⬇️
python/lsst/pipe/base/quantum_graph_skeleton.py 81.61% <81.61%> (ø)
... and 6 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the first four commits and I have to stop for today. Here is a small number of comments, will continue tomorrow.

python/lsst/pipe/base/pipeline_graph/_edges.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/quantum_graph_builder.py Outdated Show resolved Hide resolved
# starts with the output run collection, as an optimization to avoid
# queries later.
if self.skip_existing_in and self.output_run_exists:
first, *_ = self.butler.registry.queryCollections(self.skip_existing_in, flattenChains=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it happen that queryCollections does not find any collections (returns empty list)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; it'd definitely be a caller error, but it is possible, and we should re-raise with a better error message.

python/lsst/pipe/base/quantum_graph_builder.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/quantum_graph_builder.py Outdated Show resolved Hide resolved
----------
task_key : `QuantumKey` or `TaskInitKey`
Identifier for the quantum node.
dataset_key : `DatasetKey` or `PrerequisiteKey`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisites are inputs-only?

python/lsst/pipe/base/prerequisite_helpers.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/prerequisite_helpers.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/prerequisite_helpers.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/prerequisite_helpers.py Outdated Show resolved Hide resolved
Copy link
Member Author

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the first round of review comments (lots of fixup commits that will be squashed later) and squashed a bug discovered by Jenkins (mostly missing exception-handling around registry queries).

# starts with the output run collection, as an optimization to avoid
# queries later.
if self.skip_existing_in and self.output_run_exists:
first, *_ = self.butler.registry.queryCollections(self.skip_existing_in, flattenChains=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; it'd definitely be a caller error, but it is possible, and we should re-raise with a better error message.

def update(self, other: QuantumGraphSkeleton) -> None:
"""Copy all nodes from ``other`` to ``self``."""
for task_label, (_, quanta) in other._tasks.items():
self._tasks[task_label][1].update(quanta)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's fine for the only use case right now and that requirement keeps the implementation simple, so I'll include it in the documentation.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked all remaining commits, looks great, a couple of minor comments.

python/lsst/pipe/base/graphBuilder.py Outdated Show resolved Hide resolved
@@ -169,6 +167,7 @@ def __init__(
skip_existing_in: Sequence[str] = (),
clobber: bool = False,
):
self.log = getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment - I usually try to avoid having Logger attributes, it makes instances un-picklable, but I think it should not be an issue in this case. Context where it could matter is multiprocessing, we'll probably never use that for graph building.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up; I did not know about the issue with logs and pickling/multiprocessing.

I think I will leave this as it is - as you said, multiprocessing is unlikely to be an issue, and I need both self.log and self.metadata to allow the timeMethod decorator to work.

Comment on lines +115 to +116
dataset_id_bytes: bytes
"""Dataset ID (UUID) as raw bytes."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find how dataset_id_bytes is used in the code. Would it be more efficient to store it as int?

Copy link
Member Author

@TallJimbo TallJimbo Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never actually accessed directly, but it is used as part of the (generated) namedtuple __eq__ and __hash__.

I guess int might be more efficient if that can be boiled down to a native uint128 on platforms that support it, but I am pretty certain this is not at all a bottleneck - converting the DataCoordinate attributes to tuples on this commit was huge for performance (several orders of magnitude), but the hotspot where the set operations on those occur is in the loop over query results before any prerequisites are added.

@TallJimbo TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from 5f0c15a to 72e9272 Compare August 24, 2023 02:46
TallJimbo and others added 6 commits August 24, 2023 10:34
These mostly use the same algorithm as the one in graphBuilder.py, but
they split the pipeline up into disconnected subgraphs first, which
will provide a nice performance boost for some pipelines.

The other algorithmic difference is that pruning via adjustQuantum now
happens directly in QG generation, making the stuff for pruning in
QuantumGraph construction unnecessary, and allowing adjustQuantum to
raise NoWorkFound even during QG generation.

By building on PipelineGraph's more careful handling of storage class
overrides, this should make it much, much harder for those problems to
creep back in.

And finally, the split here into an ABC and implementation should make
it much easier to handle special QG generation cases, like the ones
for gathering resource usage and HiPS generation, as they can now
delegate the stuff they don't want to customize to the base class.
This may be useful for generating simple QGs in Prompt Processing, too
(but I'm not convinced it'd gain us much) or as a starting point for a
more advanced general-purpose QG generation algorithm (though it's
unlikely the base class could stay _exactly_ as is for that.
It's easier for a user who wants to adjust verbosity to only deal with
one logger for all of QG generation.
We're always comparing data IDs with the same dimensions, so we can
do it much faster by just comparing value tuples.
Make use of the butler serialization caching mechanisms to make
sure object are effectively cached instead of reconstructing
objects needlessly. Also lower the compression ratio of LZMA.
This results in slightly larger graph sizes, but is offset by
a large runtime gain.
QuantumGraph doesn't need to be able to prune itself anymore, since
that's now handled earlier, inside QuantumGraphBuilder (where we can
do it more efficiently).

And without that pruning code, it's easy to replace QuantumGraph's
_datasetRefDict attribute with a temporary networkx graph, and this
sidesteps a problem in which different DatasetRefs don't compare as
equal if they have different storage classes, which was causing QGs to
lack some edges they should have had with the QuantumGraphBuilder.  I
believe this was because the old QG generation algorithm passed
DatasetRefs with incorrect storage classes to QuantumGraph, and that
also sidestepped the bug.
This class was previously used for both task <-> dataset type graphs
and quantum <-> dataset graphs, and now it's just used for the former.
So it doesn't need to be generic, and it doesn't need to support node
removal anymore.

Eventually it should be replaced entirely by PipelineGraph, but that's
out of scope for now (it's part of DM-40442).
With the switch to tuples instead of DataCoordinates as keys, adding
nodes implicitly when edges are added gets trickier; we can only
support it for input datasets, because something else (registry queries
or upstream tasks) are responsible for making DatasetRefs for those.
We want to allow users to include at least the output run here without
it actually existing yet, and let that be a no-op, instead of
complaining.
@TallJimbo TallJimbo merged commit 6bb983e into main Aug 24, 2023
12 of 14 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-38498 branch August 24, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants