-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests: Don't show visually strong errors in successful block tests #683
Conversation
When developing, and especially when tidying up commits for upstream, it's common (at least for me) to run `make test` often, and see if there's a scrolling terminal full of good, green OK lines all the way to the end, or any bad, red FAILED lines. It only takes one red line in a sea of green. Then the run is stopped and we start looking into what might the cause be. Usually it doesn't matter which test failed; it matters that something changed to cause a failure at all. It's not feasible to actually read all the output; there's about 13,000 lines. And it's not reasonable to wait for the end during edit-compile-debug cycles because of the time taken. Point is, a big red coloured error message in the ocean of scrolling test output is a key visual, and it matters. As well, real failures often have errors about hashes mismatches specifically, "expected" hashes, "Wrong state root", that sort of thing. Unsurprisingly, a clear red ERR reporting hash mismatch, more diagnostic data, and a green OK 8, 16 or more lines below it, looks a lot like failures surrounded by successes. For example, it's not really obvious the text below is 100% success unless you're used to it. (Failure output is _almost_ the same). Of course the OK looks clear here, but imagine 13,000 lines scrolling past, when the normal output is thousands of clean "[OK] name" lines and this is the anomaly: [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/DifficultyIsZero.json ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block tid: 275064 file: executor.nim:173 blockNumber: 1 expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7 actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808 arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F ERR 2021-05-25 23:01:37.184+01:00 Wrong state root in block tid: 275064 file: executor.nim:173 blockNumber: 1 expected: DA68013FA88783CD4DAE287C5D8BAFD03CA3CA3977A28B6379146467E6B405B7 actual: 717E99EBF6BE0A556559C88DB2224039BD215C36E1C37F65144E6B24A22B6808 arrivedFrom: F9A2682EA3B44BA015697535403A3A357710EA1983B6CC5B1AD46A504A45E63F [OK] tests/fixtures/eth_tests/BlockchainTests/InvalidBlocks/bcInvalidHeaderTest/GasLimitIsZero.json Even when you're expecting these, it prevents seeing at a glance whether everything succeeds in the scrolling test output. The test expects this condition, so don't use a prominent error when testing. (I think we're close to the point where this particular `error` should be downgraded even in non-test operation; it's just another validation error from network data. We quite confident in our EVM now. But let's wait until we have 100% pass on Hive consensus tests before removing the `error` entirely.) Signed-off-by: Jamie Lokier <jamie@shareable.org>
this looks off for many reasons:
if the logging output is distracting attention from the test outcome, the solution would be to redirect logging output to a different stream (stderr or file) - ideally such output is captured per test and stored in the test result. |
These tests are not testing errors. These tests are ignoring errors, when the fixture says so. There's a difference: If there was no block validation failure, the tests would still pass. Moreover, when there are other kinds of block validation failure, the tests get the same result from The error isn't useful in my view, as I said in my last paragraph, I think it should be removed anyway. These are blocks which would be discarded as invalid on the network, they do not indicate conditions where no progress can be made. The correct progress is to discard the block as invalid. You can see this in the result, which is a boolean: There are many ways a block can be found invalid, and all of them tell the caller to discard the block, except this one particular condition prints a prominent error message before returning the same boolean as all the other discard cases. I'm not sure which action your feedback is recommending (please explain). Is it either of the below?:
There is also a problem when it so resembles a bug in the test itself (failing to fail), that it makes someone spend time investigating the testsuite. So I don't mind using Or,
Please note that these are new messages. There are about 13000 lines of output from Or,
|
My main point is not to have special cases for tests: either the error is useful to log, or it is not - this is an orthogonal issue with respect to test running. The other point is that if test output is noisy, the proper way to fix it is to capture log output for each test individually and save it to a file (or xml/json report) so it can be viewed per test - specially per failed test. This requires fixing the test runner, rather than changing the production code. Finally, one could consider not printing anything for tests that succeeded - that typically requires a more sophisticated test runner that can show a progress bar or something like that - it would simply print fails. |
Thanks - it's a helpful answer for proceeding. I'll close this PR without merging then. For the specific message, I'll raise its level to the same ("debug") as the other reasons for block rejection, because I think it has served its purpose in early EVM development, and isn't useful now. (We have better ways to validate the EVM thanks to good third party test). We'll still count block rejections in normal operation, consensus failures are expected to be rare but have many causes, and will be investigated like in any other client. Most rejection causes don't log below debug, this was an anomaly - as there is no true error or bug, I don't believe it justifies trumpets any more. I agree with your ideas about test output, and if we have the situation again but the output justifies being kept in normal operation, I'll keep in mind your philosophy to avoid changing program output. (That isn't possible for some other aspects of testing, as some tests are intrusive and demand support in the main code). It could be a bit of a problem if that makes "write a better test runner / script" an impediment to doing the main work at hand, though, but we don't have that problem now. Short output on success would be nice. I find listing the successes useful sometimes, but there's If there ever is a more sophisticated test runner that captures output to hide it on success, it would be good to run tests in parallel too, and use the capture functionality to serialise their outputs, both to the log and to the terminal. I'm pretty sure there's a tool for this, but I don't recall its name. |
nim-unittest2, and the way to approach such issues is to simply write the code for it, if it's important enough. The way to not get into that mess is to make the test runner write logs to a separate file without splitting them - unittest2 makes that easier as well. |
It is not important enough at the moment, for me anyway, because there are more important things to do. The current Hiding output (to show only on errors) while running tests in parallel threads and diverting the output to a log file seems... tricky. Chronicles output is easily diverted to a function, but stdout/stderr are not thread-local values, so other output (e.g. assertion failures from Nim, |
When developing, and especially when tidying up commits for upstream, it's common (at least for me) to run
make test
often, and see if there's a scrolling terminal full of good, green OK lines all the way to the end,or any bad, red FAILED lines.
It only takes one red line in a sea of green. Then the run is stopped and we start looking into what might the cause be. Usually it doesn't matter which test failed; it matters that something changed to cause a failure at all.
It's not feasible to actually read all the output; there's about 13,000 lines. And it's not reasonable to wait for the end during edit-compile-debug cycles because of the time taken.
Point is, a big red coloured error message in the ocean of scrolling test output is a key visual, and it matters.
As well, real failures often have errors about hashes mismatches specifically, "expected" hashes, "Wrong state root", that sort of thing. Unsurprisingly, a clear red ERR reporting hash mismatch, more diagnostic data, and a green OK 8, 16 or more lines below it, looks a lot like failures surrounded by successes.
For example, it's not really obvious the text below is 100% success unless you're used to it. (Failure output is almost the same). Of course the OK looks clear here, but imagine 13,000 lines scrolling past, when the normal output is thousands of clean "[OK] name" lines and this is the anomaly:
Even when you're expecting these, it prevents seeing at a glance whether everything succeeds in the scrolling test output.
The test expects this condition, so don't use a prominent error when testing.
(I think we're close to the point where this particular
error
should be downgraded even in non-test operation; it's just another validation error from network data. We quite confident in our EVM now. But let's wait until we have 100% pass on Hive consensus tests before removing theerror
entirely.)