New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log long running workunits when the dynamic UI is disabled #10890
Log long running workunits when the dynamic UI is disabled #10890
Conversation
…its method. [ci skip-build-wheels]
[ci skip-build-wheels]
src/rust/engine/src/scheduler.rs
Outdated
// When enabled, the interval at which all stragglers that have been running for longer than a | ||
// threshold should be logged. The threshold might become configurable, but this might not need | ||
// to be. | ||
const STRAGGLER_INTERVAL: Duration = Duration::from_secs(30); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name this STRAGGLER_LOGGING_INTERVAL
to clarify that it is the interval for logging of stragglers? Just STRAGGLER_INTERVAL
could be confused with "stragger_threshold" since they both involve time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
// The dynamic UI is disabled, and we should use only logging. | ||
Logging { | ||
straggler_threshold: Duration, | ||
straggler_deadline: Option<Instant>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this straggler_logging_deadline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in a struct called Logging
, so I think that we can skip that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks.
SessionDisplay::Logging { | ||
// TODO: This threshold should likely be configurable, but the interval we render at | ||
// probably does not need to be. | ||
straggler_threshold: Duration::from_secs(60), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this will only log if the workunit has been taking > 60 seconds? I wonder if we want this to be lower, maybe 45 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that in a CI environment, 60 is sufficient. Let's try it out for a while and see whether it is responsive enough.
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Problem
The
--dynamic-ui
is enabled by default for end users, but is generally disabled in CI. Because our log levels have been tuned for the amount of information provided by the dynamic UI, disabling the UI does not leave quite enough information to determine where time is being spent.Solution
Adjusting log-levels based on whether the UI is enabled would add an additional dimension of complexity to choosing the level of a workunit. So instead, we attempt to ensure that the same sort of information usually rendered by the dynamic UI is rendered when it is disabled.
To avoid spamming the log, we periodically (iff
--no-dynamic-ui
) log any "visible, non-blocked workunits" (i.e., the same heuristic used by the dynamic UI) that have been running for longer than a threshold (aka: "stragglers").Result
[ci skip-build-wheels]