New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38682: Add task for summarizing resource usage into a single table. #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
"quanta", | ||
"task", | ||
"memory", | ||
"init_time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we store init_time
and run_time
. Would it be valuable to have a human-readable stop-time also? I'm not sure how useful these quantities are for identifying transient infrastructure errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are actually both durations, not timestamps: init_time
is the time it takes to do all of the stuff pipetask
does before it actually call's the task's runQuantum
method, which amounts to a lot of file-existence checks most of the time. That's the time when things like "Nothing to do for task" or "Skipping already-successful quantum" happen.
).reset_index() | ||
df["task"] = input_name.replace("_resource_usage", "") | ||
df["quanta"] = len(ru_table) | ||
df["integrated_runtime"] = ru_table["run_time"].sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the integrated runtime over a workflow, step, graph... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrated over all of the task metadata files it finds in the input collection you give it, so it varies.
] | ||
) | ||
|
||
qq = pd.concat(quantiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is qq
for quantiles? If so a different name might be easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to full_quantiles
.
3d5a2c9
to
65aa428
Compare
No description provided.