Skip to content

Conversation

@TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Nov 5, 2025

Checklist

  • ran Jenkins
  • ran and inspected package-docs build
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 91.07143% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.93%. Comparing base (c57584a) to head (dbd4413).
⚠️ Report is 18 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...sst/pipe/base/quantum_graph/aggregator/_scanner.py 82.05% 7 Missing and 7 partials ⚠️
...st/pipe/base/quantum_graph/aggregator/_progress.py 68.42% 10 Missing and 2 partials ⚠️
python/lsst/pipe/base/_status.py 90.90% 1 Missing and 2 partials ⚠️
python/lsst/pipe/base/single_quantum_executor.py 85.00% 1 Missing and 2 partials ⚠️
python/lsst/pipe/base/quantum_graph/_provenance.py 96.77% 0 Missing and 2 partials ⚠️
python/lsst/pipe/base/log_capture.py 97.36% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #529    +/-   ##
========================================
  Coverage   88.93%   88.93%            
========================================
  Files         150      150            
  Lines       20634    20779   +145     
  Branches     2456     2464     +8     
========================================
+ Hits        18350    18479   +129     
- Misses       1702     1714    +12     
- Partials      582      586     +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo TallJimbo force-pushed the tickets/DM-53019 branch 5 times, most recently from 319866a to 1898a95 Compare November 5, 2025 16:52
@TallJimbo TallJimbo marked this pull request as ready for review November 5, 2025 21:09
Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple typos and a question about previous_process_quanta.

Not part of the code changes, but while testing I found it confusing that the ProvenanceQuantumGraphReader.read* functions return the updated reader instead of whatever the function was to read. Returning None might be less confusing.

Merge approved.

# later via Pydantic; don't want one weird value bringing down
# our ability to report on an entire run.
if isinstance(v, float | int | str | bool):
result.metadata[k] = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we want a debugging message about the skipped metadata?

raise
else:
# If the quantum succeeded, we don't need to save the
# metadata in the logs, becauase we'll have saved them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: becauase -> because

FAILED = -1
"""Execution of the quantum failed.
This is always set if the task metadata dataset was not written but logs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the log was?

This is always set if the task metadata dataset was not written but logs,
as is the case when a Python exception is caught and handled by the
execution system. It may also be set in cases where log were not written
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log was not written?

"""

attempts: list[ProvenanceQuantumAttemptModel]
"""Information about each attempt to run this quantum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this includes the last attempt as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That just seemed simpler, even though some of the attributes of the last attempt are lifted up to siblings of this attribute for convenience (note that this lifting is not done for the corresponding storage model).

"""

previous_process_quanta: list[uuid.UUID] = pydantic.Field(default_factory=list)
"""The IDs of other quanta executed in the same process as this one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include "previously" in the description as well since the quanta executed in same process after this one aren't included.

This seems like a bunch of repetitive information when clustering (or straight pipetask run). Same comment for this entry in quantum_provenance_graph.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to "previously".

I admit I was just thinking about small clusters, not what happens when you execute a big graphs in one process - though I suppose there's a practical upper limit when you're talking about a single process (i.e. not even pipetask -j, which spawns/forks a new process for every quantum).

But I also don't see a great alternative. The SingleQuantumExecutor that's writing this information doesn't know the order that we will try to run quanta, and if we don't write anything until the last quantum, we run the risk of a failure that prevents any information from being written (and since quanta don't have any linkage to what's run later in the process, we wouldn't have a way to get from a middle quantum to the list of what was run first).

This assumes we reuse the SingleQuantumExector instance, but that's
what we do in MPGraphExecutor (when it runs in one process) and
SimplePipelineExecutor, and everything else uses one of those two.
Writing task metadata datasets if and only if we succeed is an
important invariant of the execution system that would be difficult to
change, but we still want to record failure metadata for provenance
(for resource usage at least).  So we stuff the metadata in the log
datasets instead.
This involves refactoring the Scanner class a bit to separate the
concept of metadata existence from success.  Note that metadata
_dataset_ existence does still indicate success, so we don't ingest
metadata for failures; instead they'll only be available as
components of provenance datasets, once provenance ingest is
implemented.
This factors the per-attempt information from the main quantum
provenance data structures so we can store it in lists, and it modifies
the low-level interfaces for reading logs and metadata to let those
return lists as well.  Right now this is set up to to be populated only
via the extra information SingleQuantumExecutor stuffs into the log
datasets (in practice, that means BPS auto-retries), but we should be
able to use it to represent post-provenance-ingest graph-level restarts
in the future.
The quantum_provenance_graph (i.e. the implementation for
pipetask report v2) will be deprecated after the new provenance system
reaches feature-parity, so we don't want to be importing types from
that module in the new system.

- ExceptionInfo has just moved without change to _status.py, where
  we define the exceptions it's designed to serialize.

- QuantumRunStatus has been copied to _status.py and renamed to
  QuantumAttemptStatus, with a small change to one enum variant to
  make its meaning more intuitive and docs that reflect the precise
  meanings in the new provenance system.
@TallJimbo TallJimbo merged commit aa98aac into main Nov 12, 2025
22 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-53019 branch November 12, 2025 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants