New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40715: move resource-usage summary tasks from analysis_drp #143
Conversation
bf4eacd
to
66eec3a
Compare
a21c9e3
to
6a253d3
Compare
base_dataset_type_filter = re.compile(r"\w+_metadata") | ||
input_dataset_types: Any | ||
if not dataset_type_names: | ||
input_dataset_types = base_dataset_type_filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would more the re compile into this block
# Start by querying for metadata datasets, since we'll need to know | ||
# which dataset types exist in the input collections in order to | ||
# build the pipeline. | ||
base_dataset_type_filter = re.compile(r"\w+_metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For organizational / readability reasons, consider moving much of this logic out of init into a method/function that is called by init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the logic inside the single biggest loop into a separate method; everything else was too intertwined for that to work, as I didn't want to end up with methods that expected the class to be in different stages of constructed-ness or use a lot of output parameters.
default="", | ||
help="Data ID expression used when querying for input metadata datasets.", | ||
) | ||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not entirely clear to me what happens when both output and output-run are specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as usual with pipetask run
: the given output-run name is used instead of creating one by appending a timestamp.
type=str, | ||
action="extend", | ||
help=( | ||
"Glob-style patterns for input metadata dataset types. If a pattern matches a " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording of the second sentence is unclear, or convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed the second sentence as I don't think it adds anything, and I'm not sure what it was supposed to mean either.
abe5729
to
0de5e69
Compare
Logic has not changed, but some type annotations have been adjusted since analysis_tools actually checks those.
Inheriting from the new QuantumGraphBuilder base class is a significant simplification, and it avoids a dependency on soon-to-be deprecated classes like TaskDef and PipelineDatasetTypes.
0de5e69
to
1425926
Compare
No description provided.