Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtest: test output changes #9930

Closed
wants to merge 26 commits into from
Closed

mtest: test output changes #9930

wants to merge 26 commits into from

Conversation

trhd
Copy link
Contributor

@trhd trhd commented Feb 1, 2022

This is the second iteration of #9232.

I've done my best to better split the changes into smaller commits although while doing that I ended writing even more changes.

The changes can now be summarized:

Changes in the textual test log output

Test specific metadata was split to multiple lines to improve readability.
Visual hints were added to group togehter all the lines related to a
specifc test.

Textual test log (too) now respects the and
options.

Changes in the test output in the console

The way test details (when being verbose or printing details for a failed
test) get printed was updated. Most notable changes are:

  • each line is prefixed with the relevant test number
  • "headers" and indentation is used to group relevant lines together
  • the style in which subtest results are listed changed

Results of individual subtests are displaed only in verbose mode or when
printing details of a failed test.

The "live" output (when being verbose and using only one process) now
follows the style of the normal verbose output more closely.

The "live" output is now automatically used when running only one test in
verbose mode.

The statusline was updated to a multi line one. The status line is also
disabled if the size of the terminal is not at least 80 characters wide
and 24 rows tall.

The status line in quiet mode was simplified.

I made a recording where I try to demo the changes: https://asciinema.org/a/dE9DsiPzUXJcW5X0q0XyO4vQ9

I have some concers w.r.t these changes:

  • Multiple status lines tend to "flicker" a bit when lots of lines are being outputted. Some people might find this annoying.
  • There is currently no way to disable "live" output when using only one process (though this is true even withtout these changes). I thought about adding an option for that but too may options for the command is not that great either.
  • The first iteration received feedback that the statusline is not nice on some windows terminal. I failed to reproduce this but then again my Windows skills are lacking. I tried these changes with the "command prompt for VS 2019" but didn't notice any issues there. I feel the statusline should just be disabled when using an incapable terminal (think meson test | cat) but without being able to reproduce the issue I don't quite know what to do.
  • I've only tried this out in Sourcehut's and GitLab's CI systems. Other CI sytems (Travis, Azures, ...) have not been tested at all.
  • Do other people like the change?

I'll mark this PR as draft to avoid premature merging until the above concerns get clarified and we can get a blessing from the relevant people. I'll mention @bonzini and @anarazel here by name since they joined the discussion during the first iteration.

@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 4 alerts when merging b173990 into 5d932f3 - view on LGTM.com

new alerts:

  • 3 for Unused local variable
  • 1 for Unused import

@codecov
Copy link

codecov bot commented Feb 1, 2022

Codecov Report

Merging #9930 (e004701) into master (ae6c8d4) will decrease coverage by 6.73%.
The diff coverage is n/a.

❗ Current head e004701 differs from pull request most recent head c77c58b. Consider uploading reports for the commit c77c58b to get more accurate results

@@            Coverage Diff             @@
##           master    #9930      +/-   ##
==========================================
- Coverage   65.10%   58.37%   -6.74%     
==========================================
  Files         207      207              
  Lines       44893    45007     +114     
  Branches     9275     9313      +38     
==========================================
- Hits        29229    26273    -2956     
- Misses      13369    16408    +3039     
- Partials     2295     2326      +31     
Impacted Files Coverage Δ
scripts/run_tool.py 0.00% <0.00%> (-100.00%) ⬇️
scripts/test_loaded_modules.py 0.00% <0.00%> (-100.00%) ⬇️
scripts/clangtidy.py 0.00% <0.00%> (-93.34%) ⬇️
scripts/uninstall.py 0.00% <0.00%> (-71.88%) ⬇️
mconf.py 15.41% <0.00%> (-70.84%) ⬇️
templates/cstemplates.py 35.48% <0.00%> (-64.52%) ⬇️
ast/introspection.py 15.25% <0.00%> (-63.56%) ⬇️
templates/javatemplates.py 36.66% <0.00%> (-63.34%) ⬇️
rewriter.py 17.20% <0.00%> (-63.27%) ⬇️
templates/cpptemplates.py 37.50% <0.00%> (-62.50%) ⬇️
... and 91 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@anarazel
Copy link
Contributor

anarazel commented Feb 1, 2022

There is currently no way to disable "live" output when using only one process (though this is true even without these changes). I thought about adding an option for that but too may options for the command is not that great either.

That seems fine to me.

The output doesn't look good when the terminal isn't high. The RUNNING lines take up all the space. Occasionally one of the "OK" lines flickers in. Part of that is just due to the number of running tests shown. It's exacerbated by the existing problem of the line-length often being very long and wrapping around.

@jpakkane
Copy link
Member

jpakkane commented Feb 1, 2022

How will this tie in with #9834?

@trhd
Copy link
Contributor Author

trhd commented Feb 1, 2022

How will this tie in with #9834?

Needs to be resolved by hand for sure but shouldn't be that big of a problem (guessing after looking at the commits).

@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 3 alerts when merging 8ad7c66 into 6c1bb02 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

@trhd
Copy link
Contributor Author

trhd commented Feb 1, 2022

The output doesn't look good when the terminal isn't high. The RUNNING lines take up all the space. Occasionally one of the "OK" lines flickers in. Part of that is just due to the number of running tests shown. It's exacerbated by the existing problem of the line-length often being very long and wrapping around.

I tried improving handling of small terminals by pushing an additional commit in. Would you say it worked?

@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 2 alerts when merging 13fd52b into 6c1bb02 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@trhd
Copy link
Contributor Author

trhd commented Feb 1, 2022

This pull request introduces 2 alerts when merging 13fd52b into 6c1bb02 - view on LGTM.com

new alerts:

* 2 for Unused local variable

False positives.

@anarazel
Copy link
Contributor

anarazel commented Feb 1, 2022

I tried improving handling of small terminals by pushing an additional commit in. Would you say it worked?

It's a bit better. Previously the progress was useful without using as much space, even in "lower" terminals. But that's perhaps OK to give up.

The run counter seems confusing to me:

205/209 postgresql:tap+ssl / ssl/t/001_ssltests.pl                                         OK                6.48s   110 subtests passed
  2/209 postgresql:pg_regress+main / main/pg_regress                                       OK               29.53s
158/209 postgresql:tap+recovery / recovery/t/018_wal_optimize.pl                           OK               14.42s   38 subtests passed
189/209 postgresql:tap+subscription / subscription/t/022_twophase_cascade.pl               OK                9.81s   41 subtests passed
194/209 postgresql:tap+subscription / subscription/t/027_nosuperuser.pl                    OK                8.45s   14 subtests passed
195/209 postgresql:tap+subscription / subscription/t/100_bugs.pl                           OK               13.08s   7 subtests passed
  3/209 postgresql:isolation+main / main/isolation                                         OK               35.68s

I guess that's the number in which tasks started. But that's not useful information next to the number of tests in total...

@trhd
Copy link
Contributor Author

trhd commented Feb 1, 2022

The run counter seems confusing to me:

[snip]

I guess that's the number in which tasks started. But that's not useful information next to the number of tests in total...

Should be fixed now (again).

Although, the effect comes back if we print the "RUNNING" lines when starting tests (can be enabled by passing two -v options) but I don't think we can avoid that.

@lgtm-com
Copy link

lgtm-com bot commented Feb 1, 2022

This pull request introduces 3 alerts when merging c0cd11a into 6c1bb02 - view on LGTM.com

new alerts:

  • 3 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Feb 2, 2022

This pull request introduces 3 alerts when merging b1d488a into 9a1a5c2 - view on LGTM.com

new alerts:

  • 3 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Feb 2, 2022

This pull request introduces 3 alerts when merging 4a9e187 into 9a1a5c2 - view on LGTM.com

new alerts:

  • 3 for Unused local variable

@bonzini
Copy link
Contributor

bonzini commented Feb 4, 2022

Thanks, I'll review this early next week.

mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
mesonbuild/mtest.py Outdated Show resolved Hide resolved
Rotating the names of run tests on a single line can make it difficult
to read them (or alternatively the rotation becomes very slow). Instead
of displaying one of the running tests at a time on the status line use
multiple status lines to list the statuses of multiple running tests.
Make it so that the longest running tests are shown first and the rest
are shown if there is enough room for them.

The number of tests being run at a given time shown in the form of
$n-$m/$y can be difficult to read (and do math on) when tests are quick
and the numbers change rapidly. When displaying any counts of running
tests, keep the numbers as static as possible to make them easy to read.

Limit the number of tests / status lines to 10 to make things manageable
even with large numbers of tests running in parallel. Let the user know
how many tests are being run in the backround with an additional status
line.
This can be used to print a test's status in a consistent manner.
The current status line(s) is suboptimal when run in quiet mode (it is
not that "quiet").

Implement a simple, "quiet", (updating) status message stating the test
counts / progress in an easy to ready format. As with the main status
line(s) avoid having any rapidly moving parts in it to ensure easy
reading even when running lots of short tests.
After moving to callback functions in output handling the prefix
argument is no longer needed.
Do not show individual status lines for tests that have been running for
less than one second to avoid unnecessary rewriting of the status
information due to very quick tests. This reduces the "flickering" of
the status lines as they do not need to be updated as much.
The name direct_output makes more sense as it applies not only to stdout
but to stderr as well.
Add (or restore) a horizontal line in between tests' outputs to make it
more clear where the output of one test ends and another one's starts.

Also use a dashed line in between of the test output and the (single)
status line when running in "live" verbose mode.
Use horizontal lines also when printing test details of a failed test.

This change is an extension to the previous commit where the lines of
dashes were added.
Trying to parse the test output while also using GDB causes problems and
must not be done.
The previous "look and feel" update of the test outputs left subtest
statuses being printed right after the name of the subtest. Align tests'
subtest statuses. Align subtests' statuses for each test separately (i.e.
do not try to align all subtests from all tests together).
Limit the number of status lines to min(10, rows/5) and replace the
default status lines altogether with the quiet version if terminal size
is not at least 24 rows tall and 80 characters wide.
The test output changes of recent commits reverted the change
c3f145c introduced. Restore this
functionality and print stderr of parsed tests at the end of the test.
Show full test output with two verbose options even if the test output
was parsed (e.g. TAP output). Use a separate subheading in the output to
differentiate it from the parsed output.
Test status lines longer than the width of the terminal make the output
in general look ugly and can confuse the new multi-line status lines of
running tests. Shorten such status lines by cutting the name of the test
short so that the line as a whole fits on a single line on the terminal.
Indicate this has happened by appending three periods to the end of the
shortened test name.
By aligning the test statuses to the right (within the status lines) we
can use the "padding white spaces" to display more of the test's name.

This is important with small terminals as the name of the test get cut
off if the terminal is not wide enough making every character count.
Omit test exit details from the end of a test status line if there is
not enough width left on the outputting terminal. This avoids the need
to either shift everything left (and break alignment) or continue the
line to a second one (and breaking alignment).

Also unify the status line generating code w.r.t. actual test results
and the "live" status lines of the running tests.
Rename one of the two identically named variables to avoid mixing those
two up by accident.
Make subtest results visually different from the rest of the test
results to make it easier to differentiate them from each other when
glancing over the test output.

Visual differences include lower casing the actual test result and using
italic for the subtest's name.
@lgtm-com
Copy link

lgtm-com bot commented Nov 6, 2022

This pull request introduces 3 alerts when merging c77c58b into 0b7d935 - view on LGTM.com

new alerts:

  • 3 for Unused local variable

@jpakkane
Copy link
Member

The test failures are due to the current Mac CI image brokenness.

How ready is this otherwise? Are there known issues or is it ready to merge?

@trhd
Copy link
Contributor Author

trhd commented Nov 20, 2022

I'm not aware of any problems. I believe I've implemented all the suggestions made above.

@jpakkane
Copy link
Member

If someone has issues that are still pending, speak up quickly.

@bonzini
Copy link
Contributor

bonzini commented Nov 21, 2022

I will test it today.

@bonzini
Copy link
Contributor

bonzini commented Nov 21, 2022

At least "mtest: use statusline even with live output" is wrong because the status line's usage of \r can conflict with the test's. In addition the status line has to be limited to the case when stdout is a tty, otherwise the output file gets messy.

So my opinion of this pull request has not changed: it is still doing too much. There are a mix of:

  • bugfixes, e.g. "mtest: restore printing of stderr on parsed tests". If possible, these should have a Fixes tag to document what's broken.
  • obviously good new features, e.g. -vv to print the output of parsed tests
  • unclear changes that seem sensible but aren't obviously correct, for example "mtest: do not parse test outputs when using GDB". It's not clear if that causes the test to falsely report success for example.
  • changes that seem to be badly sorted. For example, "mtest: show subtest results in real time in verbose" being last means that this feature is broken in the middle of the series. "mtest: add a line of dashes between tests in verbose output" does not explain exactly what it does since there is already a line of dashes between tests, for example:
 72/315 libvirt / viriscsitest                                                         OK              0.21s
―――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
TEST: viriscsitest
      .........                                9   OK
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

With this kind of pull request, the main thing that will happen is that it will be ignored and you will waste a lot of time. Please look at #7830 and how it was split in 7-8 pull requests: one for console output apart from the progress report, one for keyboard input, one for progress report, one for test parsing, a few for bugfixes and so on.

I know it seems like I'm moving the goalposts, but honestly several of these comments have been the same ever since the beginning of #9232. And the bigger a PR is, the more reviewers are going to ask you for every minute detail, because it's much harder to see the big picture than when there are 2-3 small commits.

@eli-schwartz
Copy link
Member

eli-schwartz commented Nov 21, 2022

+1, please split this into individually reviewable PRs.

I'm weary of the frustration that this PR has already caused, such as when we accepted a bunch of changes that turned out to be broken, and which was only broken because it assumed that later commits would fix that breakage, except those commits weren't part of the list that got merged.

Quite frankly, I've just been ignoring this PR altogether. I don't trust it to be correct, I have no will to review it, and I live in terror of the day it gets merged but feel guilty at the idea of being a downer so I've been avoiding saying anything when I cannot point to concrete problems. But I think all those meta-problems could be fixed if this was split into easily digestible chunks, so I'm finally going to say something.

@trhd
Copy link
Contributor Author

trhd commented Nov 22, 2022

At least "mtest: use statusline even with live output" is wrong because the status line's usage of \r can conflict with the test's.

True. Then again output is not a tty in this case and it could be argued that \r shouldn't be used.

In addition the status line has to be limited to the case when stdout is a tty, otherwise the output file gets messy.

That should be the case already. At least redirecting to a file or piping to a catcommand works fine for me. This detection was there already so I didn't need to do this myself (IIRC).

The whole thing was a bit problematic from the beginning. 99% of these changes were unplanned and the first version ended up being more or less one big commit. I then tried to split the changes into smaller commits retroactively (which is not ideal) and that explains some of the weirdness in the commits. As there is not (enough at least) automated tests for output chacnges testing individual commits is very laborous and that is why e.g. some of the latest suggested changes were left as separate commits instead of merged into earlier commits (manual test for each intermediate commit).

So I get that this has been a difficult one. I will give up on this now at least for the time being and close the PR. Perhaps some day I'll give this another go with smaller PRs.

@trhd trhd closed this Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants