Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --measure-time flag to libtest #64714

Closed
wants to merge 1 commit into from

Conversation

popzxc
Copy link
Contributor

@popzxc popzxc commented Sep 23, 2019

Overview

This PR introduces a new option flag for rust test command: measure-time. With this flag applied, rustc will measure the execution time of each test and print it out (into file log or if pretty formatter is chosen).

--measure-time is intended for a single-threaded run, so using with --test-threads set to any value other than 1 will result in an error.

Time is measured only when flag is applied, thus there is no overhead for common runs.

In the output, time is written in red if test duration time >= RUST_TEST_TIME_CRITICAL (defaults to 1000 milliseconds), in yellow if test duration time >= RUST_TEST_TIME_WARN (defaults to 500 milliseconds), otherwise default formatting is applied.

rustc_timed_tests

Rationale

Usually, big Rust projects have a lot of tests. And tests are not always written carefully. As a result, if some poorly written test gets merged into a big project, it will be difficult to find that test, because there is no tool for that. libtest checks for timed out tests, but the constant TEST_WARN_TIMEOUT_S is not configurable, and it's not really an instrument to detect not optimized tests.

This PR allows developers to inspect their codebase to detect long-running tests and optimize/rewrite/remove them.

Backward compatibility

This PR is fully backward compatible, it does not introduce any breaking changes.

r? @Centril

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Centril (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 23, 2019
@fbstj
Copy link
Contributor

fbstj commented Sep 23, 2019

is this different to #64663

@popzxc
Copy link
Contributor Author

popzxc commented Sep 23, 2019

Oh... Well, I started implementing this before that PR was submitted :c

However, sort of yes.

This PR does (and linked does not):

  • verifies that only 1 test thread is used (for tests running in parallel results may be inprecise).
  • highlights tests with big execution time.
  • does not report time for failed tests (failed test usually not executed fully, so it's not quite correct to report its execution time).
  • does not introduce overhead for common runs (Instant::now may require Mutex lock in several cases.
  • a bit more configurable.


let time_str = format!(" <{}.{} s>", duration.as_secs(), duration.subsec_millis());
match &self.test_time_params {
Some(params) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use guards here instead so we can fuse the .write_plains.

@@ -391,6 +392,7 @@ fn optgroups() -> getopts::Options {
.optflag("", "exclude-should-panic", "Excludes tests marked as should_panic")
.optflag("", "test", "Run tests and not benchmarks")
.optflag("", "bench", "Run benchmarks instead of tests")
.optflag("", "measure-time", "Measure tests execution time")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.optflag("", "measure-time", "Measure tests execution time")
.optflag("", "measure-time", "Measure test execution time")

let test_threads = match matches.opt_str("test-threads") {
Some(n_str) => match n_str.parse::<usize>() {
Ok(0) => return Some(Err("argument for --test-threads must not be 0".to_string())),
Ok(n) => Some(n),
Ok(n) => {
if measure_time && n != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A guard would be better here.

Ok(n) => Some(n),
Ok(n) => {
if measure_time && n != 1 {
return Some(Err("The \"measure-time\" flag excepts only one thread".into()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Some(Err("The \"measure-time\" flag excepts only one thread".into()));
return Some(Err("The \"measure-time\" flag is only compatible with one thread".into()));

@@ -600,7 +615,14 @@ pub fn parse_opts(args: &[String]) -> Option<OptRes> {
)));
}
},
None => None,
None => {
if measure_time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same; use a guard.

@@ -1436,6 +1520,10 @@ pub fn run_test(
}
}

let nocapture = opts.nocapture;
let measure_time = opts.measure_time;
let run_test_opts = RunTestOpts { nocapture, concurrency, measure_time };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unpack this just to pack it again?

fn calc_result(
desc: &TestDesc,
task_result: Result<(), Box<dyn Any + Send>>,
duration: Option<Duration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
duration: Option<Duration>
duration: Option<Duration>,

@@ -203,6 +204,44 @@ fn parse_include_ignored_flag() {
assert_eq!(opts.run_ignored, RunIgnored::Yes);
}

#[test]
fn parse_measure_time_flag() {
// Provide --measure-time and --test flags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Provide --measure-time and --test flags.
// Provide `--measure-time` and `--test` flags.

assert!(opts.measure_time);
assert_eq!(opts.test_threads, Some(1));

// Provide --measure-time flag without --test flag and with --bench flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Provide --measure-time flag without --test flag and with --bench flag.
// Provide `--measure-time` flag without `--test` flag and with `--bench` flag.

let opts_error = parse_opts(&args_bench).unwrap();
opts_error.expect_err(expected_err);

// Provide --measure-time flag with custom --test-threads flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Provide --measure-time flag with custom --test-threads flag.
// Provide `--measure-time` flag with custom `--test-threads` flag.

@Centril
Copy link
Contributor

Centril commented Sep 23, 2019

Seems reasonable -- should this be the default perhaps?

Not sure if this is feature gated or not, but I think we'd want to gate the behavior initially.

cc @alexcrichton @gnzlbg @killercup

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 23, 2019

See also #64663 which will probably be merged. Maybe this should build on that ?

@Centril
Copy link
Contributor

Centril commented Sep 23, 2019

Seems like a good idea to do so yeah.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 23, 2019

--measure-time is intended for a single-threaded run, so using with --test-threads set to any value other than 1 will result in an error.

Why is this? It should be possible to measure the time it takes to run a test independently of whether the tests are run in parallel or not.

n the output, time is written in red if test duration time >= RUST_TEST_TIME_CRITICAL (defaults to 1000 milliseconds), in yellow if test duration time >= RUST_TEST_TIME_WARN (defaults to 500 milliseconds), otherwise default formatting is applied.

[...] if some poorly written test gets merged into a big project, it will be difficult to find that test, because there is no tool for that. libtest checks for timed out tests, but the constant TEST_WARN_TIMEOUT_S is not configurable, and it's not really an instrument to detect not optimized tests.

With #64663, the test times are written to libtest json output - so if this is a real issue for real projects, that should allow them to write a simple tool that parses that and rejects test with longer runtimes using whatever metrics they want. That feels like a better solution to this problem than trying to solve this on the libtest side.

fn write_ok_with_duration(&mut self, duration: Duration) -> io::Result<()> {
self.write_pretty("ok", term::color::GREEN)?;

let time_str = format!(" <{}.{} s>", duration.as_secs(), duration.subsec_millis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If subsec_millis returns less than three digits the output is wrong:

assert_ne!(" <1.023 s>", format!(" <{}.{} s>", 1, 23));

@popzxc
Copy link
Contributor Author

popzxc commented Sep 24, 2019

Hm... Ok, all the comments are seem reasonable.
Will it be ok then if this PR will be closed, and once #64663 gets merged, I'll create a new PR based on updated libtest with all the review comments fixed?

@gnzlbg, regarding test threads -- ok, then maybe use a following approach: If not of the --test-threads and RUST_TEST_THREADS then tests will be runned in one thread, otherwise a provided amount of threads will be used?

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

ok, then maybe use a following approach: If not of the --test-threads and RUST_TEST_THREADS then tests will be runned in one thread, otherwise a provided amount of threads will be used?

Why do you want to force tests to run in one thread when reporting the run-time ?

@popzxc
Copy link
Contributor Author

popzxc commented Sep 24, 2019

Why do you want to force tests to run in one thread when reporting the run-time ?

Imagine some resource acquired only synchronously (e.g. resource can be reached only after locking a static Mutex). In that hypothetical case, if tests are run in multiple threads (lets take 4) it's possible with 3 quick tests and 1 very long one to get a situation when all 4 tests are reported to be long (long test locked the mutex first, 3 other ones waited for it, as a result, 3 tests were blocked until the long one is completed). You can see the gist for the actual example.

I understand that it's not a common case, but still a possible one, and I believe that compiler tools should avoid ambiguity when possible.

Overall, if developers are sure that it's ok to run tests in parallel, they always will be able to provide a compiler flag for it.

Or am I missing something here?

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

libtest runs tests in parallel by default and it offers an option to choose the number of threads.
A libtest option for reporting how long does it take for a unit test to run should just that, and is orthogonal to the number of threads that are used.

In that hypothetical case, if tests are run in multiple threads (lets take 4) it's possible with 3 quick tests and 1 very long one to get a situation when all 4 tests are reported to be long (long test locked the mutex first, 3 other ones waited for it, as a result, 3 tests were blocked until the long one is completed) [...] I believe that compiler tools should avoid ambiguity when possible.

That's not an "ambiguity", that's printing out run-times that reflect reality. Changing the number of threads used by libtest silently when a user tries to debug these issues would be super confusing: "My tests run very slowly, let's print the run-times... oh! when I print the run-times my tests run fast, ?!?!?! (confusion)".

@popzxc
Copy link
Contributor Author

popzxc commented Sep 24, 2019

Yeah, but when user notices that tests run very slow and decide to measure time, he will get N tests being reported as slow, when only the one of them is slow in fact. Thus this command won't be helping at all.

IMHO, the purpose of that subcommand is to help user find the slow tests, and not just "report run-times that reflect reality".

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

Yeah, but when user notices that tests run very slow and decide to measure time, he will get N tests being reported as slow, when only the one of them is slow in fact.

This is incorrect. If a test is reported as taking N seconds to run, then it is because it took N seconds to run. That's objective and correct data.

Whether a tests takes N seconds to run because it does a lot of computation, or because its waiting doing nothing because some other tests holds a mutex lock, is a different issue.

It appears that what you want is a libtest feature that lets you know which test is making other tests run slowly. Silently messing with the number of threads doesn't achieve that:

static COUNT: AtomicUsize = AtomicUsize::new(0);
#[test] fn foo() { for _ in 0..COUNT.load(SeqCst) { println!("!"); } }
#[test] fn bar() { COUNT.write(usize::max_value(), SeqCst); }

Here, even when using one thread, if bar runs before foo, foo will take a long time to run. Otherwise, foo will take 0 seconds to run - bar always takes 0 seconds. Changing the number of threads to 1 doesn't help the user in any way to find this problem.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

IMHO, the purpose of that subcommand is to help user find the slow tests, and not just "report run-times that reflect reality".

The purpose of the other PR is to report unit test run-times. What users do with that data, is up to them.

If your intent is to solve some other problem (adding some sort of functionality to libtest that points users to the tests that are causing other tests to run slowly), then it is probably better to open an issue to discuss that. The top post in this PR does not mention anything about this.

@popzxc
Copy link
Contributor Author

popzxc commented Sep 24, 2019

OK, you're right. I will left test threads as they are then.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 24, 2019

FWIW I manually have to set the number of threads to 1 as well when running libtest to debug race conditions and similar issues (and i'm not alone), so I think it is worth to discuss somewhere whether there are things that we could be doing to make things easier for users here.

@jakoschiko
Copy link
Contributor

Some thoughts:

I like colored execution times, but I don't like default values. The proposed default values (500ms warn, 1000ms critical) don't fit everyone's use case. For example property based tests that take several seconds are not unusual. Running the tests with --measure-time and seeing only red execution times would confuse me except I had chosen the thresholds myself.

Alternatives:

  • Do not enable colored execution times by default. If the user needs it, he must configure the thresholds.
  • Do not implement the feature at all. If a user really needs it, he can use something like grep:
    cargo test -- --gimme-time | egrep --color -z '[1-9]{1,}[0-9]{0,}\.[0-9]{3}s'
    

@Centril
Copy link
Contributor

Centril commented Sep 24, 2019

r? @wesleywiser

@rust-highfive rust-highfive assigned wesleywiser and unassigned Centril Sep 24, 2019
@wesleywiser
Copy link
Member

@Centril I don't really know anything about libtest. Are you assigning me for some reason specifically or just because you need a reviewer?

@Centril
Copy link
Contributor

Centril commented Sep 24, 2019

@wesleywiser Basically I did the code review, but I'm not best placed to judge whether the motivation of the PR is enough to land / account for the discussion. @varkor thought you'd be well placed as you've dealt with profiling and such things.

@wesleywiser
Copy link
Member

Gotcha, that's fine.

@bors
Copy link
Contributor

bors commented Sep 25, 2019

☔ The latest upstream changes (presumably #64663) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a good feature addition and I don't have concerns about the overall approach. Here's the changes I'd like to see in order to land this PR:

  • Suggested changes from the other reviewers should be done.
  • I agree with @gnzlbg that we shouldn't be messing with the number of test threads. While the motivation is good, I think users will potentially be more confused if we don't replicate the original test environment as closely as possible when this flag is turned on.
  • Since libtest: Add --report-time flag to print test execution time #64663 was landed behind an unstable flag, I think this option should also be behind the unstable flag.

I think including default values for the warning and critical times is a good idea but I think we should use different values for unit tests vs integration tests. Unit tests are supposed to be very, very fast so for those, I'd recommend 100ms as critical, 50ms as warn. For integration tests, I think the current values you have are ok. If detecting what kind of test is running is complicated, then feel free to ignore this. (Note: the Cargo book says that tests in src/* are unit tests and tests in test/* are integration tests.)

I'd also like to see some documentation for this flag and its configuration options in the Unstable Features page of the Cargo book.

When those three main issues are addressed, I'll happily approve this!

@popzxc
Copy link
Contributor Author

popzxc commented Sep 25, 2019

Sure, I'll perform the required changes as soon as I'll have time for this (hopefully this or next week).

@wesleywiser
Copy link
Member

@popzxc Sounds good! If you decide to close this PR and open another, feel free to include

r? @wesleywiser

in the PR body so that I'll be assigned to it since I have the context for this.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 25, 2019

If detecting what kind of test is running is complicated, then feel free to ignore this.

I don't think we have an easy way to detect this right now (IIRC, in both cases, the only thing that libtest "sees" is a list of #[test] #item). It might be better to not block the rest of this PR on trying to achieve that, and maybe leave doing this distinction between unit and integration tests to a subsequent PR.

@popzxc popzxc closed this Sep 27, 2019
@popzxc popzxc deleted the measure-tests-time branch September 27, 2019 05:39
bors added a commit that referenced this pull request Oct 12, 2019
Enhance report-time option

## Short overview

This PR is a follow-up to a previously closed #64714 PR.

## Changes introduced by this PR

* `libtest` now retrieves the type of the test within `TestDesc` (available types are: `UnitTest`, `IntegrationTest`, `DocTest`, `Unknown`).
* `--report-time` subcommand of the `libtest` now supports colored output (disabled by default).
* Colorized output depends on the threshold values. Default values (proposed by @wesleywiser):
  - For unit-tests: 50ms warn/100ms critical,
  - For integration-tests: 500ms warn/1000ms critical,
  - For doctests: same as for integration tests,
  - For unknown tests: `TEST_WARN_TIMEOUT_S` warn/ `TEST_WARN_TIMEOUT_S * 2` critical (it will only applied single-threaded mode, because otherwise test will be interrupted after reaching `TEST_WARN_TIMEOUT_S`).
  - These values can be overrided by setting environment variables (since those thresholds are somewhat constant for every project, it's more flexible to use environment variables than command line arguments).
* New optional flag `--ensure-test-time` for `libtest`. With this flag applied, exectuion time limit excesss will cause test failure.

## What have not been done

There was a comment that it would be nice to have an entry in the Cargo book about it.

However, changes introduced by this PR (and #64663 in which `report-time` flag was added) aren't related directly to `cargo`, it's more about `libtest` itself.
I'm considering that [The Unstable Book](https://doc.rust-lang.org/unstable-book/) is more appropriate place, but not sure if I'm right (and if so, how exactly it should be described).

As one possible option, this PR may be merged without denoting it in the documentation, and in the next PR adding support of this feature to the `cargo` itself, I'll add a note in the Cargo book.

## Scope of this PR

Logical scope of this PR is `libtest` only. However, to get test types, I had to modify also `libsyntax_ext` and `librustdoc` for them to provide information about test type.

## Rationale

Rationale for colored output was submitted in #64714

Providing the information about kind of test was also proposed in #64714, and as an additional benefit this information may be useful for the tools using `libtest` (e.g. `cargo`).

Adding flag to treat time limits excess seems logical to me, so projects that do care about test execution time won't have to invent a wheel.

## Backward compatibility

All the changes are completely backward compatible.

## Demo

![rustc_enhanced_time](https://user-images.githubusercontent.com/12111581/65818381-c04f6800-e219-11e9-9875-322463abe24f.gif)

r? @wesleywiser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants