mtest: test output changes #9930

trhd · 2022-02-01T19:50:54Z

This is the second iteration of #9232.

I've done my best to better split the changes into smaller commits although while doing that I ended writing even more changes.

The changes can now be summarized:

Changes in the textual test log output

Test specific metadata was split to multiple lines to improve readability.
Visual hints were added to group togehter all the lines related to a
specifc test.

Textual test log (too) now respects the and
options.

Changes in the test output in the console

The way test details (when being verbose or printing details for a failed
test) get printed was updated. Most notable changes are:

each line is prefixed with the relevant test number

"headers" and indentation is used to group relevant lines together

the style in which subtest results are listed changed

Results of individual subtests are displaed only in verbose mode or when
printing details of a failed test.

The "live" output (when being verbose and using only one process) now
follows the style of the normal verbose output more closely.

The "live" output is now automatically used when running only one test in
verbose mode.

The statusline was updated to a multi line one. The status line is also
disabled if the size of the terminal is not at least 80 characters wide
and 24 rows tall.

The status line in quiet mode was simplified.

I made a recording where I try to demo the changes: https://asciinema.org/a/dE9DsiPzUXJcW5X0q0XyO4vQ9

I have some concers w.r.t these changes:

Multiple status lines tend to "flicker" a bit when lots of lines are being outputted. Some people might find this annoying.
There is currently no way to disable "live" output when using only one process (though this is true even withtout these changes). I thought about adding an option for that but too may options for the command is not that great either.
The first iteration received feedback that the statusline is not nice on some windows terminal. I failed to reproduce this but then again my Windows skills are lacking. I tried these changes with the "command prompt for VS 2019" but didn't notice any issues there. I feel the statusline should just be disabled when using an incapable terminal (think meson test | cat) but without being able to reproduce the issue I don't quite know what to do.
I've only tried this out in Sourcehut's and GitLab's CI systems. Other CI sytems (Travis, Azures, ...) have not been tested at all.
Do other people like the change?

I'll mark this PR as draft to avoid premature merging until the above concerns get clarified and we can get a blessing from the relevant people. I'll mention @bonzini and @anarazel here by name since they joined the discussion during the first iteration.

lgtm-com · 2022-02-01T20:07:56Z

This pull request introduces 4 alerts when merging b173990 into 5d932f3 - view on LGTM.com

new alerts:

3 for Unused local variable
1 for Unused import

codecov · 2022-02-01T20:20:44Z

Codecov Report

Merging #9930 (e004701) into master (ae6c8d4) will decrease coverage by 6.73%.
The diff coverage is n/a.

❗ Current head e004701 differs from pull request most recent head c77c58b. Consider uploading reports for the commit c77c58b to get more accurate results

@@            Coverage Diff             @@
##           master    #9930      +/-   ##
==========================================
- Coverage   65.10%   58.37%   -6.74%     
==========================================
  Files         207      207              
  Lines       44893    45007     +114     
  Branches     9275     9313      +38     
==========================================
- Hits        29229    26273    -2956     
- Misses      13369    16408    +3039     
- Partials     2295     2326      +31

Impacted Files	Coverage Δ
scripts/run_tool.py	`0.00% <0.00%> (-100.00%)`	⬇️
scripts/test_loaded_modules.py	`0.00% <0.00%> (-100.00%)`	⬇️
scripts/clangtidy.py	`0.00% <0.00%> (-93.34%)`	⬇️
scripts/uninstall.py	`0.00% <0.00%> (-71.88%)`	⬇️
mconf.py	`15.41% <0.00%> (-70.84%)`	⬇️
templates/cstemplates.py	`35.48% <0.00%> (-64.52%)`	⬇️
ast/introspection.py	`15.25% <0.00%> (-63.56%)`	⬇️
templates/javatemplates.py	`36.66% <0.00%> (-63.34%)`	⬇️
rewriter.py	`17.20% <0.00%> (-63.27%)`	⬇️
templates/cpptemplates.py	`37.50% <0.00%> (-62.50%)`	⬇️
... and 91 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

anarazel · 2022-02-01T21:06:25Z

There is currently no way to disable "live" output when using only one process (though this is true even without these changes). I thought about adding an option for that but too may options for the command is not that great either.

That seems fine to me.

The output doesn't look good when the terminal isn't high. The RUNNING lines take up all the space. Occasionally one of the "OK" lines flickers in. Part of that is just due to the number of running tests shown. It's exacerbated by the existing problem of the line-length often being very long and wrapping around.

jpakkane · 2022-02-01T21:35:26Z

How will this tie in with #9834?

trhd · 2022-02-01T21:46:38Z

How will this tie in with #9834?

Needs to be resolved by hand for sure but shouldn't be that big of a problem (guessing after looking at the commits).

lgtm-com · 2022-02-01T21:51:44Z

This pull request introduces 3 alerts when merging 8ad7c66 into 6c1bb02 - view on LGTM.com

new alerts:

2 for Unused local variable
1 for Unused import

trhd · 2022-02-01T22:11:01Z

The output doesn't look good when the terminal isn't high. The RUNNING lines take up all the space. Occasionally one of the "OK" lines flickers in. Part of that is just due to the number of running tests shown. It's exacerbated by the existing problem of the line-length often being very long and wrapping around.

I tried improving handling of small terminals by pushing an additional commit in. Would you say it worked?

lgtm-com · 2022-02-01T22:17:25Z

This pull request introduces 2 alerts when merging 13fd52b into 6c1bb02 - view on LGTM.com

new alerts:

2 for Unused local variable

trhd · 2022-02-01T22:31:25Z

This pull request introduces 2 alerts when merging 13fd52b into 6c1bb02 - view on LGTM.com

new alerts:
* 2 for Unused local variable

False positives.

anarazel · 2022-02-01T22:35:22Z

I tried improving handling of small terminals by pushing an additional commit in. Would you say it worked?

It's a bit better. Previously the progress was useful without using as much space, even in "lower" terminals. But that's perhaps OK to give up.

The run counter seems confusing to me:

205/209 postgresql:tap+ssl / ssl/t/001_ssltests.pl                                         OK                6.48s   110 subtests passed
  2/209 postgresql:pg_regress+main / main/pg_regress                                       OK               29.53s
158/209 postgresql:tap+recovery / recovery/t/018_wal_optimize.pl                           OK               14.42s   38 subtests passed
189/209 postgresql:tap+subscription / subscription/t/022_twophase_cascade.pl               OK                9.81s   41 subtests passed
194/209 postgresql:tap+subscription / subscription/t/027_nosuperuser.pl                    OK                8.45s   14 subtests passed
195/209 postgresql:tap+subscription / subscription/t/100_bugs.pl                           OK               13.08s   7 subtests passed
  3/209 postgresql:isolation+main / main/isolation                                         OK               35.68s

I guess that's the number in which tasks started. But that's not useful information next to the number of tests in total...

trhd · 2022-02-01T22:55:03Z

The run counter seems confusing to me:

[snip]

I guess that's the number in which tasks started. But that's not useful information next to the number of tests in total...

Should be fixed now (again).

Although, the effect comes back if we print the "RUNNING" lines when starting tests (can be enabled by passing two -v options) but I don't think we can avoid that.

lgtm-com · 2022-02-01T23:02:55Z

This pull request introduces 3 alerts when merging c0cd11a into 6c1bb02 - view on LGTM.com

new alerts:

3 for Unused local variable

lgtm-com · 2022-02-02T21:28:39Z

This pull request introduces 3 alerts when merging b1d488a into 9a1a5c2 - view on LGTM.com

new alerts:

3 for Unused local variable

lgtm-com · 2022-02-02T22:24:32Z

This pull request introduces 3 alerts when merging 4a9e187 into 9a1a5c2 - view on LGTM.com

new alerts:

3 for Unused local variable

bonzini · 2022-02-04T13:39:12Z

Thanks, I'll review this early next week.

mesonbuild/mtest.py

Rotating the names of run tests on a single line can make it difficult to read them (or alternatively the rotation becomes very slow). Instead of displaying one of the running tests at a time on the status line use multiple status lines to list the statuses of multiple running tests. Make it so that the longest running tests are shown first and the rest are shown if there is enough room for them. The number of tests being run at a given time shown in the form of $n-$m/$y can be difficult to read (and do math on) when tests are quick and the numbers change rapidly. When displaying any counts of running tests, keep the numbers as static as possible to make them easy to read. Limit the number of tests / status lines to 10 to make things manageable even with large numbers of tests running in parallel. Let the user know how many tests are being run in the backround with an additional status line.

This can be used to print a test's status in a consistent manner.

The current status line(s) is suboptimal when run in quiet mode (it is not that "quiet"). Implement a simple, "quiet", (updating) status message stating the test counts / progress in an easy to ready format. As with the main status line(s) avoid having any rapidly moving parts in it to ensure easy reading even when running lots of short tests.

After moving to callback functions in output handling the prefix argument is no longer needed.

Do not show individual status lines for tests that have been running for less than one second to avoid unnecessary rewriting of the status information due to very quick tests. This reduces the "flickering" of the status lines as they do not need to be updated as much.

The name direct_output makes more sense as it applies not only to stdout but to stderr as well.

Add (or restore) a horizontal line in between tests' outputs to make it more clear where the output of one test ends and another one's starts. Also use a dashed line in between of the test output and the (single) status line when running in "live" verbose mode.

Use horizontal lines also when printing test details of a failed test. This change is an extension to the previous commit where the lines of dashes were added.

Trying to parse the test output while also using GDB causes problems and must not be done.

The previous "look and feel" update of the test outputs left subtest statuses being printed right after the name of the subtest. Align tests' subtest statuses. Align subtests' statuses for each test separately (i.e. do not try to align all subtests from all tests together).

Limit the number of status lines to min(10, rows/5) and replace the default status lines altogether with the quiet version if terminal size is not at least 24 rows tall and 80 characters wide.

The test output changes of recent commits reverted the change c3f145c introduced. Restore this functionality and print stderr of parsed tests at the end of the test.

Show full test output with two verbose options even if the test output was parsed (e.g. TAP output). Use a separate subheading in the output to differentiate it from the parsed output.

Test status lines longer than the width of the terminal make the output in general look ugly and can confuse the new multi-line status lines of running tests. Shorten such status lines by cutting the name of the test short so that the line as a whole fits on a single line on the terminal. Indicate this has happened by appending three periods to the end of the shortened test name.

By aligning the test statuses to the right (within the status lines) we can use the "padding white spaces" to display more of the test's name. This is important with small terminals as the name of the test get cut off if the terminal is not wide enough making every character count.

Omit test exit details from the end of a test status line if there is not enough width left on the outputting terminal. This avoids the need to either shift everything left (and break alignment) or continue the line to a second one (and breaking alignment). Also unify the status line generating code w.r.t. actual test results and the "live" status lines of the running tests.

Rename one of the two identically named variables to avoid mixing those two up by accident.

Make subtest results visually different from the rest of the test results to make it easier to differentiate them from each other when glancing over the test output. Visual differences include lower casing the actual test result and using italic for the subtest's name.

lgtm-com · 2022-11-06T19:13:39Z

This pull request introduces 3 alerts when merging c77c58b into 0b7d935 - view on LGTM.com

new alerts:

3 for Unused local variable

jpakkane · 2022-11-15T11:07:45Z

The test failures are due to the current Mac CI image brokenness.

How ready is this otherwise? Are there known issues or is it ready to merge?

trhd · 2022-11-20T21:35:48Z

I'm not aware of any problems. I believe I've implemented all the suggestions made above.

jpakkane · 2022-11-20T21:46:16Z

If someone has issues that are still pending, speak up quickly.

bonzini · 2022-11-21T06:13:09Z

I will test it today.

bonzini · 2022-11-21T21:25:40Z

At least "mtest: use statusline even with live output" is wrong because the status line's usage of \r can conflict with the test's. In addition the status line has to be limited to the case when stdout is a tty, otherwise the output file gets messy.

So my opinion of this pull request has not changed: it is still doing too much. There are a mix of:

bugfixes, e.g. "mtest: restore printing of stderr on parsed tests". If possible, these should have a Fixes tag to document what's broken.
obviously good new features, e.g. -vv to print the output of parsed tests
unclear changes that seem sensible but aren't obviously correct, for example "mtest: do not parse test outputs when using GDB". It's not clear if that causes the test to falsely report success for example.
changes that seem to be badly sorted. For example, "mtest: show subtest results in real time in verbose" being last means that this feature is broken in the middle of the series. "mtest: add a line of dashes between tests in verbose output" does not explain exactly what it does since there is already a line of dashes between tests, for example:

 72/315 libvirt / viriscsitest                                                         OK              0.21s
―――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
TEST: viriscsitest
      .........                                9   OK
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

With this kind of pull request, the main thing that will happen is that it will be ignored and you will waste a lot of time. Please look at #7830 and how it was split in 7-8 pull requests: one for console output apart from the progress report, one for keyboard input, one for progress report, one for test parsing, a few for bugfixes and so on.

I know it seems like I'm moving the goalposts, but honestly several of these comments have been the same ever since the beginning of #9232. And the bigger a PR is, the more reviewers are going to ask you for every minute detail, because it's much harder to see the big picture than when there are 2-3 small commits.

eli-schwartz · 2022-11-21T21:38:09Z

+1, please split this into individually reviewable PRs.

I'm weary of the frustration that this PR has already caused, such as when we accepted a bunch of changes that turned out to be broken, and which was only broken because it assumed that later commits would fix that breakage, except those commits weren't part of the list that got merged.

Quite frankly, I've just been ignoring this PR altogether. I don't trust it to be correct, I have no will to review it, and I live in terror of the day it gets merged but feel guilty at the idea of being a downer so I've been avoiding saying anything when I cannot point to concrete problems. But I think all those meta-problems could be fixed if this was split into easily digestible chunks, so I'm finally going to say something.

trhd · 2022-11-22T21:21:20Z

At least "mtest: use statusline even with live output" is wrong because the status line's usage of \r can conflict with the test's.

True. Then again output is not a tty in this case and it could be argued that \r shouldn't be used.

In addition the status line has to be limited to the case when stdout is a tty, otherwise the output file gets messy.

That should be the case already. At least redirecting to a file or piping to a catcommand works fine for me. This detection was there already so I didn't need to do this myself (IIRC).

The whole thing was a bit problematic from the beginning. 99% of these changes were unplanned and the first version ended up being more or less one big commit. I then tried to split the changes into smaller commits retroactively (which is not ideal) and that explains some of the weirdness in the commits. As there is not (enough at least) automated tests for output chacnges testing individual commits is very laborous and that is why e.g. some of the latest suggested changes were left as separate commits instead of merged into earlier commits (manual test for each intermediate commit).

So I get that this has been a difficult one. I will give up on this now at least for the time being and close the PR. Perhaps some day I'll give this another go with smaller PRs.

trhd requested a review from jpakkane as a code owner February 1, 2022 19:50

trhd marked this pull request as draft February 1, 2022 19:51

trhd force-pushed the tyrkky2 branch from 706eb18 to b173990 Compare February 1, 2022 19:59

trhd force-pushed the tyrkky2 branch from b173990 to 8ad7c66 Compare February 1, 2022 21:42

trhd force-pushed the tyrkky2 branch from 8ad7c66 to 13fd52b Compare February 1, 2022 22:08

trhd force-pushed the tyrkky2 branch from 13fd52b to c0cd11a Compare February 1, 2022 22:53

trhd force-pushed the tyrkky2 branch from c0cd11a to b1d488a Compare February 2, 2022 21:18

trhd force-pushed the tyrkky2 branch from b1d488a to 4a9e187 Compare February 2, 2022 22:16