Tests: Don't show visually strong errors in successful block tests #683

jlokier · 2021-05-26T16:32:22Z

When developing, and especially when tidying up commits for upstream, it's common (at least for me) to run make test often, and see if there's a scrolling terminal full of good, green OK lines all the way to the end,
or any bad, red FAILED lines.

It only takes one red line in a sea of green. Then the run is stopped and we start looking into what might the cause be. Usually it doesn't matter which test failed; it matters that something changed to cause a failure at all.

It's not feasible to actually read all the output; there's about 13,000 lines. And it's not reasonable to wait for the end during edit-compile-debug cycles because of the time taken.

Point is, a big red coloured error message in the ocean of scrolling test output is a key visual, and it matters.

As well, real failures often have errors about hashes mismatches specifically, "expected" hashes, "Wrong state root", that sort of thing. Unsurprisingly, a clear red ERR reporting hash mismatch, more diagnostic data, and a green OK 8, 16 or more lines below it, looks a lot like failures surrounded by successes.

For example, it's not really obvious the text below is 100% success unless you're used to it. (Failure output is almost the same). Of course the OK looks clear here, but imagine 13,000 lines scrolling past, when the normal output is thousands of clean "[OK] name" lines and this is the anomaly:

  [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/DifficultyIsZero.json
ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block
  tid: 275064
  file: executor.nim:173
  blockNumber: 1
  expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7
  actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808
  arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F

ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block
  tid: 275064
  file: executor.nim:173
  blockNumber: 1
  expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7
  actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808
  arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F

  [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/GasLimitIsZero.json

Even when you're expecting these, it prevents seeing at a glance whether everything succeeds in the scrolling test output.

The test expects this condition, so don't use a prominent error when testing.

(I think we're close to the point where this particular error should be downgraded even in non-test operation; it's just another validation error from network data. We quite confident in our EVM now. But let's wait until we have 100% pass on Hive consensus tests before removing the error entirely.)

When developing, and especially when tidying up commits for upstream, it's common (at least for me) to run `make test` often, and see if there's a scrolling terminal full of good, green OK lines all the way to the end, or any bad, red FAILED lines. It only takes one red line in a sea of green. Then the run is stopped and we start looking into what might the cause be. Usually it doesn't matter which test failed; it matters that something changed to cause a failure at all. It's not feasible to actually read all the output; there's about 13,000 lines. And it's not reasonable to wait for the end during edit-compile-debug cycles because of the time taken. Point is, a big red coloured error message in the ocean of scrolling test output is a key visual, and it matters. As well, real failures often have errors about hashes mismatches specifically, "expected" hashes, "Wrong state root", that sort of thing. Unsurprisingly, a clear red ERR reporting hash mismatch, more diagnostic data, and a green OK 8, 16 or more lines below it, looks a lot like failures surrounded by successes. For example, it's not really obvious the text below is 100% success unless you're used to it. (Failure output is _almost_ the same). Of course the OK looks clear here, but imagine 13,000 lines scrolling past, when the normal output is thousands of clean "[OK] name" lines and this is the anomaly: [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/DifficultyIsZero.json ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block tid: 275064 file: executor.nim:173 blockNumber: 1 expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7 actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808 arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block tid: 275064 file: executor.nim:173 blockNumber: 1 expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7 actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808 arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/GasLimitIsZero.json Even when you're expecting these, it prevents seeing at a glance whether everything succeeds in the scrolling test output. The test expects this condition, so don't use a prominent error when testing. (I think we're close to the point where this particular `error` should be downgraded even in non-test operation; it's just another validation error from network data. We quite confident in our EVM now. But let's wait until we have 100% pass on Hive consensus tests before removing the `error` entirely.) Signed-off-by: Jamie Lokier <jamie@shareable.org>

arnetheduck · 2021-05-26T17:52:18Z

this looks off for many reasons:

error in general should be reserved for logs where one cannot make meaningful progress - if a test is testing errors, the logs should obviously agree and log an error - also, introducing special flags and magic for tests is a good way to make the tests less representative of what real execution looks like - tests should as closely as possible reflect real usage of the module.

if the logging output is distracting attention from the test outcome, the solution would be to redirect logging output to a different stream (stderr or file) - ideally such output is captured per test and stored in the test result.

jlokier · 2021-05-26T18:28:18Z

These tests are not testing errors. These tests are ignoring errors, when the fixture says so. There's a difference: If there was no block validation failure, the tests would still pass.

Moreover, when there are other kinds of block validation failure, the tests get the same result from processBlock but there's no message.

The error isn't useful in my view, as I said in my last paragraph, I think it should be removed anyway. These are blocks which would be discarded as invalid on the network, they do not indicate conditions where no progress can be made. The correct progress is to discard the block as invalid.

You can see this in the result, which is a boolean: There are many ways a block can be found invalid, and all of them tell the caller to discard the block, except this one particular condition prints a prominent error message before returning the same boolean as all the other discard cases.

I'm not sure which action your feedback is recommending (please explain). Is it either of the below?:

If the idea going forward is that passing test results now (starting recently) consist of hundreds of pages of error messages, I'll have to follow @Zahary's advice, which is to essentially consider the output of make test useless unless it is run through grep to cut out the junk, or filtered in another way. I don't think it's good practice to cause people to have to do that, i.e. the defaults should be sane instead, but I can adapt if I have to..

There is also a problem when it so resembles a bug in the test itself (failing to fail), that it makes someone spend time investigating the testsuite.

So I don't mind using grep and an add-back-the-removed-colour sed with my compile commands starting from now if that's the proposal? (But I don't like it or think it's good practice for a testsuite, or any tool, to only be useful when its defaults are overridden.)

Or,

You agree that the output should not have "can't make meaningful progress" loud error messages, and what you are asking is that we should logging auxiliary output from tests in log files separate from the terminal output (this means two files in practice). Until that's done, we leave the anomalous error messages in the mix with test outcomes on the terminal (requiring grep)?

Please note that these are new messages. There are about 13000 lines of output from make test, and until recently there were entirely OK/FAILED lines with a few headers. The new error messages are anomalous, and confined to one suite, that's why they seem out of place for the purpose of the testsuites so far.

Or,

You agree with me that the error isn't useful anyway and can be removed from block validation in normal operations as well, restoring harmony with the tests.

arnetheduck · 2021-05-26T19:05:30Z

My main point is not to have special cases for tests: either the error is useful to log, or it is not - this is an orthogonal issue with respect to test running.

The other point is that if test output is noisy, the proper way to fix it is to capture log output for each test individually and save it to a file (or xml/json report) so it can be viewed per test - specially per failed test. This requires fixing the test runner, rather than changing the production code.

Finally, one could consider not printing anything for tests that succeeded - that typically requires a more sophisticated test runner that can show a progress bar or something like that - it would simply print fails.

jlokier · 2021-05-26T20:20:11Z

Thanks - it's a helpful answer for proceeding.

I'll close this PR without merging then.

For the specific message, I'll raise its level to the same ("debug") as the other reasons for block rejection, because I think it has served its purpose in early EVM development, and isn't useful now. (We have better ways to validate the EVM thanks to good third party test). We'll still count block rejections in normal operation, consensus failures are expected to be rare but have many causes, and will be investigated like in any other client. Most rejection causes don't log below debug, this was an anomaly - as there is no true error or bug, I don't believe it justifies trumpets any more.

I agree with your ideas about test output, and if we have the situation again but the output justifies being kept in normal operation, I'll keep in mind your philosophy to avoid changing program output. (That isn't possible for some other aspects of testing, as some tests are intrusive and demand support in the main code). It could be a bit of a problem if that makes "write a better test runner / script" an impediment to doing the main work at hand, though, but we don't have that problem now.

Short output on success would be nice. I find listing the successes useful sometimes, but there's --verbose for that. (Incidentally, "don't print detailed logs if the return value is high-level success" is something I've found useful in scope-structured logging of normal long-lived programs too. It composes well to produce a good level of diagnostic detail focused where there are reasons to find it useful.)

If there ever is a more sophisticated test runner that captures output to hide it on success, it would be good to run tests in parallel too, and use the capture functionality to serialise their outputs, both to the log and to the terminal. I'm pretty sure there's a tool for this, but I don't recall its name.

arnetheduck · 2021-05-27T07:26:59Z

If there ever is a more sophisticated test runner that captures output to hide it on success,

nim-unittest2, and the way to approach such issues is to simply write the code for it, if it's important enough. The way to not get into that mess is to make the test runner write logs to a separate file without splitting them - unittest2 makes that easier as well.

jlokier · 2021-05-27T09:05:37Z

It is not important enough at the moment, for me anyway, because there are more important things to do. The current make test is adequate if not ideal. Faster parallel testing would be welcome if someone else wants to implement it.

Hiding output (to show only on errors) while running tests in parallel threads and diverting the output to a log file seems... tricky. Chronicles output is easily diverted to a function, but stdout/stderr are not thread-local values, so other output (e.g. assertion failures from Nim, echo in various places, C libraries doing printf) is not easily diverted to a separate file or pipe per-thread.

jlokier mentioned this pull request May 26, 2021

[WIP] parallelise tests using unittest2 #332

Closed

jlokier closed this May 28, 2021

jlokier deleted the jl/clean-test-output branch May 28, 2021 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests: Don't show visually strong errors in successful block tests #683

Tests: Don't show visually strong errors in successful block tests #683

jlokier commented May 26, 2021

arnetheduck commented May 26, 2021

jlokier commented May 26, 2021 •

edited

arnetheduck commented May 26, 2021

jlokier commented May 26, 2021

arnetheduck commented May 27, 2021

jlokier commented May 27, 2021 •

edited

Tests: Don't show visually strong errors in successful block tests #683

Tests: Don't show visually strong errors in successful block tests #683

Conversation

jlokier commented May 26, 2021

arnetheduck commented May 26, 2021

jlokier commented May 26, 2021 • edited

arnetheduck commented May 26, 2021

jlokier commented May 26, 2021

arnetheduck commented May 27, 2021

jlokier commented May 27, 2021 • edited

jlokier commented May 26, 2021 •

edited

jlokier commented May 27, 2021 •

edited