DM-37163: A manifest checker on the workflow output data #374

eigerx · 2023-09-21T18:30:30Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2023-09-21T18:34:14Z

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (c71f3aa) 82.26% compared to head (ea43222) 82.62%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #374      +/-   ##
==========================================
+ Coverage   82.26%   82.62%   +0.35%     
==========================================
  Files          90       92       +2     
  Lines       10185    10323     +138     
  Branches     1913     1945      +32     
==========================================
+ Hits         8379     8529     +150     
+ Misses       1478     1452      -26     
- Partials      328      342      +14

Files	Coverage Δ
tests/test_execution_reports.py	`100.00% <100.00%> (ø)`
python/lsst/pipe/base/execution_reports.py	`76.22% <76.22%> (ø)`

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TallJimbo

Some general comments that didn't fit on any particular line:

Standard practice for us is to put the Jira ticket into "In Review" and include links to all PRs as a comment there (or at least check to see if Jira linked them already - but it always misses a few of our packages, including pipe_base).
I think you need to add a new automodapi entry to doc/lsst.pipe.base/index.rst for this new module (you'll see entries for other modules and subpackages you can copy there). You can then run package-docs build and then open doc/_build/html/index.html in a web browser to see if the docs look as you'd expect.
While I know we're relying on ci_middleware for most of the testing, it'd be good to at least run these functions in pipe_base itself. The hard part is manufacturing the QG and a butler to test with, but the lsst.pipe.base.tests.simpleQGraph.makeSimpleQGraph function does both. Could you add a test that calls that with no arguments, runs the make_reports and to_summary_dict, and looks at the results? Since that QG won't have actually been run, I expect it to look like everything failed, and since we've got better tests in ci_middleware that's okay.
There are a number of failed linting checks from GitHub Actions that you need to resolve. Some of my PR comments will deal with some of them, but I doubt that will take care of them all. Feel free to ask for help interpreting any you don't understand.

TallJimbo · 2023-09-26T14:00:00Z

python/lsst/pipe/base/__init__.py

@@ -1,4 +1,5 @@
 from . import automatic_connection_constants, connectionTypes, pipeline_graph, pipelineIR
+from ._check_qg_outputs import *


Now that we've settled on names for the classes, we should rename the module to match; how about execution_reports?

I've left out the underscore because I also think we should remove it from this __init__.py, and instead expect users to do

from lsst.pipe.base.execution_reports import QuantumGraphExecutionReport

(etc.)

That's a bit more verbose, but since the vast majority of lsst.pipe.base imports aren't going to involve this module, it's best to not have all of its import-time logic executed all the time.

TallJimbo · 2023-09-26T14:01:58Z

python/lsst/pipe/base/_check_qg_outputs.py

+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+from __future__ import annotations
+


Suggested change

__all__ = (

"QuantumGraphExecutionReport",

"TaskExecutionReport",

"DatasetTypeExecutionReport",

"lookup_quantum_data_id",

)

All modules should have an __all__ entries. Among other things, that tells Sphinx which things should appear in the documentation.

TallJimbo · 2023-09-26T14:05:14Z

python/lsst/pipe/base/_check_qg_outputs.py

+    """Datasets not produced because their inputs were not produced or not
+    found
+    """
+    # bool: predicted inputs to this task were not produced


Should merge this code comment into the docstring above.

TallJimbo · 2023-09-26T14:05:22Z

python/lsst/pipe/base/_check_qg_outputs.py

+
+
+@dataclasses.dataclass
+class DatasetTypeExecutionReport:


Needs a class docstring.

TallJimbo · 2023-09-26T14:05:37Z

python/lsst/pipe/base/_check_qg_outputs.py

+    """Counts of datasets produced by this run.
+    """
+
+    def to_summary_dict(self) -> dict[str, Any]:


Needs a docstring.

TallJimbo · 2023-09-26T14:10:01Z

python/lsst/pipe/base/_check_qg_outputs.py

+
+
+@dataclasses.dataclass
+class QuantumGraphExecutionReport:


Needs a class docstring.

TallJimbo · 2023-09-26T14:10:09Z

python/lsst/pipe/base/_check_qg_outputs.py

+class QuantumGraphExecutionReport:
+    tasks: dict[str, TaskExecutionReport] = dataclasses.field(default_factory=dict)
+
+    def to_summary_dict(self, butler: Butler, logs: bool = True) -> dict[str, Any]:


Needs a docstring.

TallJimbo · 2023-09-26T14:10:16Z

python/lsst/pipe/base/_check_qg_outputs.py

+    def to_summary_dict(self, butler: Butler, logs: bool = True) -> dict[str, Any]:
+        return {task: report.to_summary_dict(butler, logs=logs) for task, report in self.tasks.items()}
+
+    def write_summary_yaml(self, butler: Butler, filename: str, logs: bool = True) -> None:


Needs a docstring.

TallJimbo · 2023-09-26T14:10:52Z

python/lsst/pipe/base/_check_qg_outputs.py

+                    dataset_type.name, collections=collection, findFirst=False
+                )
+            }
+        for taskDef in qg.iterTaskGraph():


Since pretty much everything in this file is snake_case, taskDef should be, too (also below).

TallJimbo · 2023-09-26T14:11:30Z

python/lsst/pipe/base/_check_qg_outputs.py

+        return "\n".join(f"{tasklabel}:{report}" for tasklabel, report in self.tasks.items())
+
+
+def lookup_quantum_dataId(graph_uri: ResourcePathExpression, nodes: Iterable[uuid.UUID]):


Needs a docstring and a -> list[DataCoordinate] return type annotation.

eigerx · 2023-10-04T00:04:21Z

Ok, I've written some documentation and I got package-docs build to run. It looks mostly okay but could definitely use a once-over for formatting; some of the data types aren't looking bolded quite right.

I should be able to add a test pretty soon.

I'm confused though as to why mypy is failing. I don't understand what I can do to avoid this issue:

python/lsst/pipe/base/execution_reports.py:209: error: Value of type "Iterable[DatasetRef]" is not indexable  [index]
[15](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:16)
python/lsst/pipe/base/execution_reports.py:215: error: Incompatible types in assignment (expression has type "DataCoordinate | None", target has type "DataCoordinate")  [assignment]
[16](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:17)
python/lsst/pipe/base/execution_reports.py:227: error: Value of type "Iterable[DatasetRef]" is not indexable  [index]
[17](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:18)
python/lsst/pipe/base/execution_reports.py:261: error: Incompatible types in assignment (expression has type "list[<nothing>]", target has type "dict[str, int | str | None]")  [assignment]
[18](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:19)
python/lsst/pipe/base/execution_reports.py:263: error: Incompatible types in assignment (expression has type "list[Any]", target has type "dict[str, int | str | None]")  [assignment]
[19](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:20)
python/lsst/pipe/base/execution_reports.py:374: error: Value of type "MappingProxyType[str, Any] | None" is not indexable  [index]
[20](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:21)
python/lsst/pipe/base/execution_reports.py:404: error: Argument 3 to "inspect_quantum" of "TaskExecutionReport" has incompatible type "dict[str, dict[UUID, DatasetRef]]"; expected "Iterable[DatasetRef]"  [arg-type]
[21](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:22)
python/lsst/pipe/base/execution_reports.py:406: error: Argument "log_name" to "inspect_quantum" of "TaskExecutionReport" has incompatible type "str | None"; expected "str"  [arg-type]
[22](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:23)
python/lsst/pipe/base/execution_reports.py:434: error: List comprehension has incompatible type List[DataCoordinate | None]; expected List[DataCoordinate]  [misc]
[23](https://github.com/lsst/pipe_base/actions/runs/6386238333/job/17332534471?pr=374#step:7:24)
Found 9 errors in 1 file (checked 76 source files)

timj · 2023-10-04T18:07:41Z