Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38682: Add task for summarizing resource usage into a single table. #70

Merged
merged 1 commit into from Sep 8, 2023

Conversation

TallJimbo
Copy link
Member

No description provided.

Copy link

@eigerx eigerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

"quanta",
"task",
"memory",
"init_time",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we store init_time and run_time. Would it be valuable to have a human-readable stop-time also? I'm not sure how useful these quantities are for identifying transient infrastructure errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually both durations, not timestamps: init_time is the time it takes to do all of the stuff pipetask does before it actually call's the task's runQuantum method, which amounts to a lot of file-existence checks most of the time. That's the time when things like "Nothing to do for task" or "Skipping already-successful quantum" happen.

).reset_index()
df["task"] = input_name.replace("_resource_usage", "")
df["quanta"] = len(ru_table)
df["integrated_runtime"] = ru_table["run_time"].sum()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the integrated runtime over a workflow, step, graph... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrated over all of the task metadata files it finds in the input collection you give it, so it varies.

]
)

qq = pd.concat(quantiles)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is qq for quantiles? If so a different name might be easier to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to full_quantiles.

@TallJimbo TallJimbo merged commit 8a53795 into main Sep 8, 2023
2 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-38682 branch September 8, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants