Add --measure-time flag to libtest #64714

popzxc · 2019-09-23T17:45:02Z

Overview

This PR introduces a new option flag for rust test command: measure-time. With this flag applied, rustc will measure the execution time of each test and print it out (into file log or if pretty formatter is chosen).

--measure-time is intended for a single-threaded run, so using with --test-threads set to any value other than 1 will result in an error.

Time is measured only when flag is applied, thus there is no overhead for common runs.

In the output, time is written in red if test duration time >= RUST_TEST_TIME_CRITICAL (defaults to 1000 milliseconds), in yellow if test duration time >= RUST_TEST_TIME_WARN (defaults to 500 milliseconds), otherwise default formatting is applied.

Rationale

Usually, big Rust projects have a lot of tests. And tests are not always written carefully. As a result, if some poorly written test gets merged into a big project, it will be difficult to find that test, because there is no tool for that. libtest checks for timed out tests, but the constant TEST_WARN_TIMEOUT_S is not configurable, and it's not really an instrument to detect not optimized tests.

This PR allows developers to inspect their codebase to detect long-running tests and optimize/rewrite/remove them.

Backward compatibility

This PR is fully backward compatible, it does not introduce any breaking changes.

r? @Centril

rust-highfive · 2019-09-23T17:45:06Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Centril (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

fbstj · 2019-09-23T17:59:38Z

is this different to #64663

popzxc · 2019-09-23T18:13:13Z

Oh... Well, I started implementing this before that PR was submitted :c

However, sort of yes.

This PR does (and linked does not):

verifies that only 1 test thread is used (for tests running in parallel results may be inprecise).
highlights tests with big execution time.
does not report time for failed tests (failed test usually not executed fully, so it's not quite correct to report its execution time).
does not introduce overhead for common runs (Instant::now may require Mutex lock in several cases.
a bit more configurable.

Centril · 2019-09-23T18:50:24Z

src/libtest/formatters/pretty.rs

+
+        let time_str = format!(" <{}.{} s>", duration.as_secs(), duration.subsec_millis());
+        match &self.test_time_params {
+            Some(params) => {


Use guards here instead so we can fuse the .write_plains.

Centril · 2019-09-23T19:04:49Z

src/libtest/lib.rs

@@ -391,6 +392,7 @@ fn optgroups() -> getopts::Options {
        .optflag("", "exclude-should-panic", "Excludes tests marked as should_panic")
        .optflag("", "test", "Run tests and not benchmarks")
        .optflag("", "bench", "Run benchmarks instead of tests")
+        .optflag("", "measure-time", "Measure tests execution time")


Suggested change

.optflag("", "measure-time", "Measure tests execution time")

.optflag("", "measure-time", "Measure test execution time")

Centril · 2019-09-23T19:06:18Z

src/libtest/lib.rs

    let test_threads = match matches.opt_str("test-threads") {
        Some(n_str) => match n_str.parse::<usize>() {
            Ok(0) => return Some(Err("argument for --test-threads must not be 0".to_string())),
-            Ok(n) => Some(n),
+            Ok(n) => {
+                if measure_time && n != 1 {


A guard would be better here.

Centril · 2019-09-23T19:06:45Z

src/libtest/lib.rs

-            Ok(n) => Some(n),
+            Ok(n) => {
+                if measure_time && n != 1 {
+                    return Some(Err("The \"measure-time\" flag excepts only one thread".into()));


Suggested change

return Some(Err("The \"measure-time\" flag excepts only one thread".into()));

return Some(Err("The \"measure-time\" flag is only compatible with one thread".into()));

Centril · 2019-09-23T19:07:10Z

src/libtest/lib.rs

@@ -600,7 +615,14 @@ pub fn parse_opts(args: &[String]) -> Option<OptRes> {
                )));
            }
        },
-        None => None,
+        None => {
+            if measure_time {


Same; use a guard.

Centril · 2019-09-23T19:16:34Z

src/libtest/lib.rs

@@ -1436,6 +1520,10 @@ pub fn run_test(
        }
    }

+    let nocapture = opts.nocapture;
+    let measure_time = opts.measure_time;
+    let run_test_opts = RunTestOpts { nocapture, concurrency, measure_time };


Why unpack this just to pack it again?

Centril · 2019-09-23T19:16:43Z

src/libtest/lib.rs

+fn calc_result(
+    desc: &TestDesc,
+    task_result: Result<(), Box<dyn Any + Send>>,
+    duration: Option<Duration>


Suggested change

duration: Option<Duration>

duration: Option<Duration>,

Centril · 2019-09-23T19:18:16Z

src/libtest/tests.rs

@@ -203,6 +204,44 @@ fn parse_include_ignored_flag() {
    assert_eq!(opts.run_ignored, RunIgnored::Yes);
 }

+#[test]
+fn parse_measure_time_flag() {
+    // Provide --measure-time and --test flags.


Suggested change

// Provide --measure-time and --test flags.

// Provide `--measure-time` and `--test` flags.

Centril · 2019-09-23T19:18:38Z

src/libtest/tests.rs

+    assert!(opts.measure_time);
+    assert_eq!(opts.test_threads, Some(1));
+
+    // Provide --measure-time flag without --test flag and with --bench flag.


Suggested change

// Provide --measure-time flag without --test flag and with --bench flag.

// Provide `--measure-time` flag without `--test` flag and with `--bench` flag.

Centril · 2019-09-23T19:18:58Z

src/libtest/tests.rs

+    let opts_error = parse_opts(&args_bench).unwrap();
+    opts_error.expect_err(expected_err);
+
+    // Provide --measure-time flag with custom --test-threads flag.


Suggested change

// Provide --measure-time flag with custom --test-threads flag.

// Provide `--measure-time` flag with custom `--test-threads` flag.

Centril · 2019-09-23T19:22:22Z

Seems reasonable -- should this be the default perhaps?

Not sure if this is feature gated or not, but I think we'd want to gate the behavior initially.

cc @alexcrichton @gnzlbg @killercup

gnzlbg · 2019-09-23T19:27:08Z

See also #64663 which will probably be merged. Maybe this should build on that ?

Centril · 2019-09-23T19:29:55Z

Seems like a good idea to do so yeah.

gnzlbg · 2019-09-23T19:39:47Z

--measure-time is intended for a single-threaded run, so using with --test-threads set to any value other than 1 will result in an error.

Why is this? It should be possible to measure the time it takes to run a test independently of whether the tests are run in parallel or not.

n the output, time is written in red if test duration time >= RUST_TEST_TIME_CRITICAL (defaults to 1000 milliseconds), in yellow if test duration time >= RUST_TEST_TIME_WARN (defaults to 500 milliseconds), otherwise default formatting is applied.

[...] if some poorly written test gets merged into a big project, it will be difficult to find that test, because there is no tool for that. libtest checks for timed out tests, but the constant TEST_WARN_TIMEOUT_S is not configurable, and it's not really an instrument to detect not optimized tests.

With #64663, the test times are written to libtest json output - so if this is a real issue for real projects, that should allow them to write a simple tool that parses that and rejects test with longer runtimes using whatever metrics they want. That feels like a better solution to this problem than trying to solve this on the libtest side.

jakoschiko · 2019-09-23T20:52:27Z

src/libtest/formatters/pretty.rs

+    fn write_ok_with_duration(&mut self, duration: Duration) -> io::Result<()> {
+        self.write_pretty("ok", term::color::GREEN)?;
+
+        let time_str = format!(" <{}.{} s>", duration.as_secs(), duration.subsec_millis());


If subsec_millis returns less than three digits the output is wrong:

assert_ne!(" <1.023 s>", format!(" <{}.{} s>", 1, 23));

popzxc · 2019-09-24T03:28:20Z

Hm... Ok, all the comments are seem reasonable.
Will it be ok then if this PR will be closed, and once #64663 gets merged, I'll create a new PR based on updated libtest with all the review comments fixed?

@gnzlbg, regarding test threads -- ok, then maybe use a following approach: If not of the --test-threads and RUST_TEST_THREADS then tests will be runned in one thread, otherwise a provided amount of threads will be used?

gnzlbg · 2019-09-24T08:09:00Z

ok, then maybe use a following approach: If not of the --test-threads and RUST_TEST_THREADS then tests will be runned in one thread, otherwise a provided amount of threads will be used?

Why do you want to force tests to run in one thread when reporting the run-time ?

popzxc · 2019-09-24T08:52:21Z

Why do you want to force tests to run in one thread when reporting the run-time ?

Imagine some resource acquired only synchronously (e.g. resource can be reached only after locking a static Mutex). In that hypothetical case, if tests are run in multiple threads (lets take 4) it's possible with 3 quick tests and 1 very long one to get a situation when all 4 tests are reported to be long (long test locked the mutex first, 3 other ones waited for it, as a result, 3 tests were blocked until the long one is completed). You can see the gist for the actual example.

I understand that it's not a common case, but still a possible one, and I believe that compiler tools should avoid ambiguity when possible.

Overall, if developers are sure that it's ok to run tests in parallel, they always will be able to provide a compiler flag for it.

Or am I missing something here?

gnzlbg · 2019-09-24T09:07:05Z

libtest runs tests in parallel by default and it offers an option to choose the number of threads.
A libtest option for reporting how long does it take for a unit test to run should just that, and is orthogonal to the number of threads that are used.

In that hypothetical case, if tests are run in multiple threads (lets take 4) it's possible with 3 quick tests and 1 very long one to get a situation when all 4 tests are reported to be long (long test locked the mutex first, 3 other ones waited for it, as a result, 3 tests were blocked until the long one is completed) [...] I believe that compiler tools should avoid ambiguity when possible.

That's not an "ambiguity", that's printing out run-times that reflect reality. Changing the number of threads used by libtest silently when a user tries to debug these issues would be super confusing: "My tests run very slowly, let's print the run-times... oh! when I print the run-times my tests run fast, ?!?!?! (confusion)".

popzxc · 2019-09-24T09:12:24Z

Yeah, but when user notices that tests run very slow and decide to measure time, he will get N tests being reported as slow, when only the one of them is slow in fact. Thus this command won't be helping at all.

IMHO, the purpose of that subcommand is to help user find the slow tests, and not just "report run-times that reflect reality".

gnzlbg · 2019-09-24T09:26:31Z

Yeah, but when user notices that tests run very slow and decide to measure time, he will get N tests being reported as slow, when only the one of them is slow in fact.

This is incorrect. If a test is reported as taking N seconds to run, then it is because it took N seconds to run. That's objective and correct data.

Whether a tests takes N seconds to run because it does a lot of computation, or because its waiting doing nothing because some other tests holds a mutex lock, is a different issue.

It appears that what you want is a libtest feature that lets you know which test is making other tests run slowly. Silently messing with the number of threads doesn't achieve that:

static COUNT: AtomicUsize = AtomicUsize::new(0);
#[test] fn foo() { for _ in 0..COUNT.load(SeqCst) { println!("!"); } }
#[test] fn bar() { COUNT.write(usize::max_value(), SeqCst); }

Here, even when using one thread, if bar runs before foo, foo will take a long time to run. Otherwise, foo will take 0 seconds to run - bar always takes 0 seconds. Changing the number of threads to 1 doesn't help the user in any way to find this problem.

gnzlbg · 2019-09-24T09:35:23Z

IMHO, the purpose of that subcommand is to help user find the slow tests, and not just "report run-times that reflect reality".

The purpose of the other PR is to report unit test run-times. What users do with that data, is up to them.

If your intent is to solve some other problem (adding some sort of functionality to libtest that points users to the tests that are causing other tests to run slowly), then it is probably better to open an issue to discuss that. The top post in this PR does not mention anything about this.

popzxc · 2019-09-24T09:39:49Z

OK, you're right. I will left test threads as they are then.

gnzlbg · 2019-09-24T09:45:03Z

FWIW I manually have to set the number of threads to 1 as well when running libtest to debug race conditions and similar issues (and i'm not alone), so I think it is worth to discuss somewhere whether there are things that we could be doing to make things easier for users here.

jakoschiko · 2019-09-24T22:48:37Z

Some thoughts:

I like colored execution times, but I don't like default values. The proposed default values (500ms warn, 1000ms critical) don't fit everyone's use case. For example property based tests that take several seconds are not unusual. Running the tests with --measure-time and seeing only red execution times would confuse me except I had chosen the thresholds myself.

Alternatives:

Do not enable colored execution times by default. If the user needs it, he must configure the thresholds.
Do not implement the feature at all. If a user really needs it, he can use something like grep:
```
cargo test -- --gimme-time | egrep --color -z '[1-9]{1,}[0-9]{0,}\.[0-9]{3}s'
```

Centril · 2019-09-24T23:13:49Z

r? @wesleywiser

wesleywiser · 2019-09-24T23:47:38Z

@Centril I don't really know anything about libtest. Are you assigning me for some reason specifically or just because you need a reviewer?

Centril · 2019-09-24T23:50:20Z

@wesleywiser Basically I did the code review, but I'm not best placed to judge whether the motivation of the PR is enough to land / account for the discussion. @varkor thought you'd be well placed as you've dealt with profiling and such things.

wesleywiser · 2019-09-24T23:59:23Z

Gotcha, that's fine.

bors · 2019-09-25T14:00:47Z

☔ The latest upstream changes (presumably #64663) made this pull request unmergeable. Please resolve the merge conflicts.

wesleywiser

Overall, I think this is a good feature addition and I don't have concerns about the overall approach. Here's the changes I'd like to see in order to land this PR:

Suggested changes from the other reviewers should be done.
I agree with @gnzlbg that we shouldn't be messing with the number of test threads. While the motivation is good, I think users will potentially be more confused if we don't replicate the original test environment as closely as possible when this flag is turned on.
Since libtest: Add --report-time flag to print test execution time #64663 was landed behind an unstable flag, I think this option should also be behind the unstable flag.

I think including default values for the warning and critical times is a good idea but I think we should use different values for unit tests vs integration tests. Unit tests are supposed to be very, very fast so for those, I'd recommend 100ms as critical, 50ms as warn. For integration tests, I think the current values you have are ok. If detecting what kind of test is running is complicated, then feel free to ignore this. (Note: the Cargo book says that tests in src/* are unit tests and tests in test/* are integration tests.)

I'd also like to see some documentation for this flag and its configuration options in the Unstable Features page of the Cargo book.

When those three main issues are addressed, I'll happily approve this!

popzxc · 2019-09-25T14:57:00Z

Sure, I'll perform the required changes as soon as I'll have time for this (hopefully this or next week).

wesleywiser · 2019-09-25T15:03:30Z

@popzxc Sounds good! If you decide to close this PR and open another, feel free to include

r? @wesleywiser

in the PR body so that I'll be assigned to it since I have the context for this.

gnzlbg · 2019-09-25T15:41:16Z

If detecting what kind of test is running is complicated, then feel free to ignore this.

I don't think we have an easy way to detect this right now (IIRC, in both cases, the only thing that libtest "sees" is a list of #[test] #item). It might be better to not block the rest of this PR on trying to achieve that, and maybe leave doing this distinction between unit and integration tests to a subsequent PR.

@wesleywiser

Enhance report-time option ## Short overview This PR is a follow-up to a previously closed #64714 PR. ## Changes introduced by this PR * `libtest` now retrieves the type of the test within `TestDesc` (available types are: `UnitTest`, `IntegrationTest`, `DocTest`, `Unknown`). * `--report-time` subcommand of the `libtest` now supports colored output (disabled by default). * Colorized output depends on the threshold values. Default values (proposed by @wesleywiser): - For unit-tests: 50ms warn/100ms critical, - For integration-tests: 500ms warn/1000ms critical, - For doctests: same as for integration tests, - For unknown tests: `TEST_WARN_TIMEOUT_S` warn/ `TEST_WARN_TIMEOUT_S * 2` critical (it will only applied single-threaded mode, because otherwise test will be interrupted after reaching `TEST_WARN_TIMEOUT_S`). - These values can be overrided by setting environment variables (since those thresholds are somewhat constant for every project, it's more flexible to use environment variables than command line arguments). * New optional flag `--ensure-test-time` for `libtest`. With this flag applied, exectuion time limit excesss will cause test failure. ## What have not been done There was a comment that it would be nice to have an entry in the Cargo book about it. However, changes introduced by this PR (and #64663 in which `report-time` flag was added) aren't related directly to `cargo`, it's more about `libtest` itself. I'm considering that [The Unstable Book](https://doc.rust-lang.org/unstable-book/) is more appropriate place, but not sure if I'm right (and if so, how exactly it should be described). As one possible option, this PR may be merged without denoting it in the documentation, and in the next PR adding support of this feature to the `cargo` itself, I'll add a note in the Cargo book. ## Scope of this PR Logical scope of this PR is `libtest` only. However, to get test types, I had to modify also `libsyntax_ext` and `librustdoc` for them to provide information about test type. ## Rationale Rationale for colored output was submitted in #64714 Providing the information about kind of test was also proposed in #64714, and as an additional benefit this information may be useful for the tools using `libtest` (e.g. `cargo`). Adding flag to treat time limits excess seems logical to me, so projects that do care about test execution time won't have to invent a wheel. ## Backward compatibility All the changes are completely backward compatible. ## Demo ![rustc_enhanced_time](https://user-images.githubusercontent.com/12111581/65818381-c04f6800-e219-11e9-9875-322463abe24f.gif) r? @wesleywiser

Add --measure-time flag to libtest

6ac58fb

rust-highfive assigned Centril Sep 23, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 23, 2019

Centril reviewed Sep 23, 2019

View reviewed changes

jakoschiko reviewed Sep 23, 2019

View reviewed changes

rust-highfive assigned wesleywiser and unassigned Centril Sep 24, 2019

wesleywiser requested changes Sep 25, 2019

View reviewed changes

popzxc closed this Sep 27, 2019

popzxc deleted the measure-tests-time branch September 27, 2019 05:39

popzxc mentioned this pull request Sep 28, 2019

Enhance report-time option #64873

Merged

	.optflag("", "measure-time", "Measure tests execution time")
	.optflag("", "measure-time", "Measure test execution time")

	return Some(Err("The \"measure-time\" flag excepts only one thread".into()));
	return Some(Err("The \"measure-time\" flag is only compatible with one thread".into()));

	// Provide --measure-time and --test flags.
	// Provide `--measure-time` and `--test` flags.

	// Provide --measure-time flag without --test flag and with --bench flag.
	// Provide `--measure-time` flag without `--test` flag and with `--bench` flag.

	// Provide --measure-time flag with custom --test-threads flag.
	// Provide `--measure-time` flag with custom `--test-threads` flag.

Add --measure-time flag to libtest #64714

Add --measure-time flag to libtest #64714

Conversation

popzxc commented Sep 23, 2019 • edited Loading

Overview

Rationale

Backward compatibility

rust-highfive commented Sep 23, 2019

fbstj commented Sep 23, 2019

popzxc commented Sep 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Centril commented Sep 23, 2019

gnzlbg commented Sep 23, 2019

Centril commented Sep 23, 2019

gnzlbg commented Sep 23, 2019

Choose a reason for hiding this comment

popzxc commented Sep 24, 2019

gnzlbg commented Sep 24, 2019

popzxc commented Sep 24, 2019

gnzlbg commented Sep 24, 2019 • edited Loading

popzxc commented Sep 24, 2019 • edited Loading

gnzlbg commented Sep 24, 2019 • edited Loading

gnzlbg commented Sep 24, 2019

popzxc commented Sep 24, 2019

gnzlbg commented Sep 24, 2019

jakoschiko commented Sep 24, 2019

Centril commented Sep 24, 2019

wesleywiser commented Sep 24, 2019

Centril commented Sep 24, 2019

wesleywiser commented Sep 24, 2019

bors commented Sep 25, 2019

wesleywiser left a comment

Choose a reason for hiding this comment

popzxc commented Sep 25, 2019

wesleywiser commented Sep 25, 2019

gnzlbg commented Sep 25, 2019 • edited Loading

popzxc commented Sep 23, 2019 •

edited

Loading

gnzlbg commented Sep 24, 2019 •

edited

Loading

popzxc commented Sep 24, 2019 •

edited

Loading

gnzlbg commented Sep 24, 2019 •

edited

Loading

gnzlbg commented Sep 25, 2019 •

edited

Loading