Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log long running workunits when the dynamic UI is disabled #10890

Merged

Conversation

stuhood
Copy link
Sponsor Member

@stuhood stuhood commented Oct 1, 2020

Problem

The --dynamic-ui is enabled by default for end users, but is generally disabled in CI. Because our log levels have been tuned for the amount of information provided by the dynamic UI, disabling the UI does not leave quite enough information to determine where time is being spent.

Solution

Adjusting log-levels based on whether the UI is enabled would add an additional dimension of complexity to choosing the level of a workunit. So instead, we attempt to ensure that the same sort of information usually rendered by the dynamic UI is rendered when it is disabled.

To avoid spamming the log, we periodically (iff --no-dynamic-ui) log any "visible, non-blocked workunits" (i.e., the same heuristic used by the dynamic UI) that have been running for longer than a threshold (aka: "stragglers").

Result

./pants --no-dynamic-ui test src/python/pants/backend/python/::
<snip>
20:57:10.96 [INFO] Completed: Run tests - src/python/pants/backend/python/lint/pylint/rules_integration_test.py:integration succeeded.
20:57:17.55 [INFO] Long running tasks:
  284.57s 	Run Pytest for src/python/pants/backend/python/typecheck/mypy/rules_integration_test.py:integration
  319.51s 	Run Pytest for src/python/pants/backend/python/goals/pytest_runner_integration_test.py:integration
20:57:36.78 [INFO] Completed: Run tests - src/python/pants/backend/python/goals/pytest_runner_integration_test.py:integration succeeded.
20:57:47.59 [INFO] Long running tasks:
  314.61s 	Run Pytest for src/python/pants/backend/python/typecheck/mypy/rules_integration_test.py:integration
20:58:16.18 [INFO] Completed: Run tests - src/python/pants/backend/python/typecheck/mypy/rules_integration_test.py:integration succeeded.

[ci skip-build-wheels]

// When enabled, the interval at which all stragglers that have been running for longer than a
// threshold should be logged. The threshold might become configurable, but this might not need
// to be.
const STRAGGLER_INTERVAL: Duration = Duration::from_secs(30);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this STRAGGLER_LOGGING_INTERVAL to clarify that it is the interval for logging of stragglers? Just STRAGGLER_INTERVAL could be confused with "stragger_threshold" since they both involve time.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

// The dynamic UI is disabled, and we should use only logging.
Logging {
straggler_threshold: Duration,
straggler_deadline: Option<Instant>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this straggler_logging_deadline?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in a struct called Logging, so I think that we can skip that.

@coveralls
Copy link

coveralls commented Oct 1, 2020

Coverage Status

Coverage remained the same at 0.0% when pulling 4f34d52 on stuhood:stuhood/long-running-workunits-logging into c03b89d on pantsbuild:master.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks.

SessionDisplay::Logging {
// TODO: This threshold should likely be configurable, but the interval we render at
// probably does not need to be.
straggler_threshold: Duration::from_secs(60),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, this will only log if the workunit has been taking > 60 seconds? I wonder if we want this to be lower, maybe 45 seconds.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that in a CI environment, 60 is sufficient. Let's try it out for a while and see whether it is responsive enough.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood stuhood merged commit c5a397f into pantsbuild:master Oct 1, 2020
@stuhood stuhood deleted the stuhood/long-running-workunits-logging branch October 1, 2020 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants