Allow programmer to ask to stop on first failure. #319

rocky · 2012-06-10T01:12:07Z

Many times when I run a test suite what I want is to stop at the first failure. For example when installing modules from CPAN, CPAN will by default not install if there is any failure, so you may as well save time and stop after the first failure.

To first approximation it seems pretty easy to add this. I mean like one or two lines of Perl code. (testing and documentation not included in this count..) In the merge request code, set environment variable TEST_MORE_BAIL_EARLY.

The forked repository also has the corresponding code for the Test-Builder-1.5 branch as well. The test code however I've not been able to work out yet.

Also I haven't updated the documentation to mention this. However if the approach and code seems reasonable, I'll add and update the merge request.

…f time! Now in master branch.

…Note BAIL_ON_FAIL in pod.

rocky · 2012-06-10T11:52:34Z

Test::Most looks interesting will have to try and use that. Thanks!

Alas there are too many packages out there using Test::More. With the last commit/update I have changed the name of the environment variable to BAIL_ON_FAIL to match Test::Most. And added mention of this in the pod.

schwern · 2012-06-10T23:39:12Z

This idea has come up before. It's a good idea, but unfortunately there is a fundamental problem in the Test::Builder architecture. To see it, simply try BAIL_ON_FAIL with an is() or like() test. You will not see the usual got/expected diagnostics. This is because they come after ok() is done in a separate and unrelated function call. The test will fail and stop without giving you vital information about WHY it failed and stopped.

What would work is to bail at the end of the test if there is any failure. On failure set a flag. When the test is finishing it, bail out if that flag is set. That means if a .t file fails, at least the whole rest of the suite doesn't run.

…ssibly along the lines suggested by Michael Schwern.

rocky · 2012-06-11T04:32:28Z

Is this along the approach you were thinking?

karenetheridge · 2012-06-11T19:40:22Z

Alas there are too many packages out there using Test::More

Just run your test like this:

perl -Ilib -MTest::Most=die t/mytest.t

IMO it's not a good practice to actually edit the bail-on-first-failure behaviour directly into the test, as when you ship a dist, you want all tests to run, even if some fail. It's only during development that the bail-on-first-failure behaviour is desirable.

rocky · 2012-06-12T09:06:58Z

@karenetheride reports:

Just run your test like this:
perl -Ilib -MTest::Most=die t/mytest.t

or

  perl -Ilib -MTest::Most=bail t/mytest.t

or

  BAIL_ON_FAIL=1 perl -Ilib -MTest::Most t/mytest.t

There's more than one way to do it (sm).

But I don't see how this helps with my second scenario. Perhaps I wasn't clear.

When I wrote "installing code from CPAN", I meant using perl -MCPAN -e shell or cpanp. And my CPAN/Config.pm contains:

'build_requires_install_policy' => q[follow/yes],
'prerequisites_policy' => q[follow],

So I don't have to babysit the install process. Many of the existing packages out there use Test::More or Test::Simple. If a change is made to the Test::More that I have installed, then I can control bailout say via an environment variable. I suppose I could install Test::Most in a way that it is used when Test::More is requested, but that seems a bit drastic.

By the way, when I installed my latest change along the lines as I understand Michael suggested, I can't pass the Test::Most tests I think because the Test::More is is bailing in on the Test::Most bailout test. I think this is the same situation as I see in my inability to write a bailout test for the change.

So for my own use, I like the simpler solution even if in some cases I don't get a got/expected output. The times I am likely to use BAIL_ON_FAIL are those times when all I only want to know about is whether things all pass, not why. If I want to know why I can always go back and redo that test.

karenetheridge · 2012-06-13T21:57:34Z

rocky: what I was meaning to say is "when you are developing your own code, you can run tests with Test::Most in 'die' mode to achieve this effect". What I left off, because I thought this was implicit, was "You really shouldn't be altering the behaviour of module installations from the CPAN". I would argue that the minute time savings in aborting earlier do not outweigh the drawbacks of increased complexity in the test system, and losing repeatability in module installations. For example, the cpantesters network installs and tests every (new and old) CPAN release and mails back test results to the authors - which includes full documentation of all tests that fail. Adding in a configuration to bail on the first test would diminish the benefits of such reports.

rocky · 2012-06-13T22:19:12Z

Ok. Yes, I became aware that Test::Most had a BAIL_ON_FAIL from Ovid.

How cpantesters work and and how individuals who run CPAN or cpanp are different problems. That's why CPAN and cpanp come with configuration files that allow for customization. It allows one for example to specify whether dependencies should be followed or asked for. Smokers almost certainly don't use "ask" which would force someone to type "yes" all the time. However "ask" is the default for most users. This is similar. Some may not be interested in seeing all the failures for all the possible required modules; just whether there is at least one which stops installation. Just as I doubt a smoker would use "ask" to follow dependencies I would doubt that a smoker would ask to bail on the first error.

pdl · 2012-06-14T18:42:32Z

unfortunately there is a fundamental problem in the Test::Builder architecture ... got/expected diagnostics ... come after ok() is done in a separate and unrelated function call.

Is there a reason for this?

Is it configurable?

schwern · 2012-06-15T19:43:50Z

@karenetheridge wrt adding complexity and so on, while I generally agree with you... that doesn't mean we're paralyzed. What @rocky is proposing (early abort controlled by an environment variable) is simple to do and entirely optional. AFAIK, bailing out early doesn't break test automation, your test still fails. Maybe you don't get all the stats, but that's the user's choice.

schwern · 2012-06-15T20:13:59Z

@pdl I wrote up the problem, why its a problem, some common solutions and their issues. https://github.com/schwern/test-more/wiki/Stop-On-Fail

pdl · 2012-06-15T22:32:03Z

@schwern OK, so the reason we can't die on ok(0) is that

TB2 doesn't control when the relevant diagnostics are printed, any more than it knows about print calls.
Relevant modules put their diag()s AFTER the ok() by default.

Now, I may be talking pie in the sky here, but let's suppose we changed ok() so that it took another argument, a callback/string to be run on fail - and changed all the Test::* modules that use Test::Builder so that they include fail diags in the ok(). We could then control whether the diags appeared before the ok() or after: that seems to be the ideal solution architecturally.

Then, if we decided this was a) better, and b) worth it (!), we could say "This is the solution", create a flag such that a call to ok() would test, if it fails, run the diags code, report not ok, then abort the test run.

The downside is that even once we've trawled through all the public Test:: modules on CPAN, many other non-test distributions will still put their diags in the .t file after the ok(). You could argue any sort of 'bail on fail' flag is subject to a big caveat of 'don't assume you'll get all the diags'.

Leaving aside the magnitude of the task, is that an accurate summary of the problems for that approach?

pdl · 2012-06-15T22:41:32Z

Stop after the next ok.
What if that test fails, too?

I don't follow.

At the end of Test::Builder::ok, set a flag for the result of the ok().

Then at the beginning of Test::Builder::ok, bail if TEST_MORE_BAIL_EARLY is set and the result of the previous ok() indicates that it failed. Doesn't matter if the second ok() would have passed; it never gets evaluated.

rocky · 2012-06-15T22:52:45Z

@pdl what you describe is almost the logic of my second attempt. However if the failure is the last ok, then there is no subsequent one. So for that I also put a test in Test::Builder->_ending as well. A down side is that you still run some user code after the failure. In particular the code after the ok/is, etc. and before the next one. Still not completely ideal, but catches more than the 90% case of just hooking into just Test::Builder->ending.

Whether my implementation lives up to my intent is a different story.

schwern · 2012-06-16T06:09:37Z

@pdl There's two separate issues here, and it's best to keep them separated. 1) Something that works with the existing way tests are written with Test::Builder. 2) Fixing the design. I'd like to focus on 1 here, and 2 elsewhere. 2 will take a very long time to percolate through CPAN, and there's additional issues to take into account when fixing it which simply bolting a flag onto ok() doesn't cover.

The larger issue is that its very useful to know when a test function starts and when it ends. Related is eliminating the need for $Level. Have a look at lib/Test/Builder2.pm to see the attempt on hold to solve the problem. It doesn't have to be solved that way, but it does outline some of the issues. I stopped work on Test::Builder2 to focus on Test::Builder1.5, but I'd love it if you want to pick it up. Crack open a new issue to continue the discussion.

@rocky brings up the excellent point that bailing on the next test means some user code gets run after the failure. I find this to be disquieting, but I can't really come up with a concrete problem this would cause. Maybe it just seems embarrassing that tests have to coast to a stop. :-) Plow ahead with what you're doing please.

pdl · 2012-06-16T09:34:40Z

@schwern " I'd like to focus on 1 here, and 2 elsewhere. "
Makes sense!

karenetheridge · 2012-08-12T20:17:52Z

@schwern Regarding https://github.com/schwern/test-more/wiki/Stop-On-Fail I have another potential solution to propose:

Change the semantics of $tb->ok so that accompanying diagnostic messages are all part of the same call.

Therefore, change this:

# Do the actual test, prints ok #, the line and file and so on.
my $ok = $tb->ok( $have eq $want, $name );

# Prints out the failure diagnostics.
$tb->diag(<<DIAGNOSTIC);

to this:

# Do the actual test, prints ok #, the line and file and also a more
# detailed message on failure.
my $ok = $tb2->ok( $have eq $want, $name, <<DIAGNOSTIC );

That way, we can still stop on the first failure, and be sure to print all
the necessary diagnostics, all in one go.

I think this is pretty doable - just change the prototype of ok() to ($$;$$).
Existing test libraries that use ok() don't have to immediately change, but
they can pass a fourth parameter to have that diagnostic string printed even
in die-on-first-failure mode.

schwern · 2012-08-18T21:14:48Z

@karenetheridge That is a possibility long considered to fix it at the design level and eventually have it trickle out. And it might be what Test::Builder has to do because of how its written.

But there's trouble. Not insurmountable.

The foremost is if you do it for ok() what about is()? And like()? And is_deeply()? And what about other modules like Test::Exception and Test::Deep and ... you get the idea. Everybody has to change their interface to accept more diagnostic information.

The additional problem is you have to pass in your diagnostic information at the point you run the test. If your diagnostics involve expensive calculations you have to do them pessimistically (or employ hacks).

And maybe these are surmountable.

Other ideas worth considering: if the $name is a hash ref it would be interpreted as diagnostics (the name is then passed in with the diagnostics). This presumes that nothing is doing anything with the name except Test::Builder, which is probably safe. That has the benefit of not having to update every interface. The down side is now we're guessing at the user's intent, which is a big no-no.

The other approach is what Test::Builder2 (the class) tries to do. It keeps track of the stack of test functions which have been called (for example it will know that ok() was called by is()). This was both to get rid of the need or $Test::Builder::Level, Test::Builder2 knows where the stack of assert calls started, but it also means the result does not get posted (and thus output) until the stack has completely returned. What does this mean? It means the result is not output when $tb->ok is called, but when is() (or whatever) finally returns. This allows...

sub is {
    my($have, $want, $name) = @_;

    my $result = $tb->ok($have eq $want, $name);
    $result->diagnostics({ have => $have, want => $want });

    # Only now is $result turned into TAP
    return $result;
}

It's very flexible and it eliminates the need for $Test::Builder::Level, but it also turned out to have a lot of corner cases and was de-prioritized. It also requires people to use Test::Builder2 instead of Test::Builder which would take a long time to convert.

What do you think?

pdl · 2012-08-18T21:50:06Z

The additional problem is you have to pass in your diagnostic information at the point you run the test. If your diagnostics involve expensive calculations you have to do them pessimistically (or employ hacks).

(Appreciate this is a tangent but...) this could (I think) be solved if the onfail argument was a coderef?

schwern · 2012-08-18T23:04:40Z

@pdl Clever! Now all we have to do is add an onfail argument to everything. But I guess by "onfail" you mean the diagnostic argument? Structured diagnostics (not the strings we currently pass into diag()) would be desirable on passing tests as well. They just might not be displayed, but they'd be available for other purposes. For example, you could add timing information.

karenetheridge · 2012-08-19T00:32:01Z

Just to be clear, when we say "user" here we mean "author of a test library", not "author of tests". So it is Test::Builder::ok (and friends) that would change, not the users of Test::More::ok etc.

Since it already looks like every test library will need to be upgraded anyway to use Test::Tester rather than Test::Builder::Tester, this can be thrown in as just one more API change (and one that is backwards compatible too).

I also like the idea of onfail being a coderef.

rocky · 2012-08-19T00:41:44Z

And just to be clear. What I was looking for was a change to Test::More/Test::Simple that would stop on the first failure without having to change any of the vast existing code that uses these.

As a writer of tests, if I want to allow folks to be able to stop at the first failure, I'd use Test::Most or something already available that allows this.

I am coming at this from the standpoint of people who are installing the Perl modules of others and only want to ask the question: does everything check out? If not, please tell me as fast as possible with the least amount of other extraneous information. A secondary concern as the writer of the tests is that I may want to stop at the first failure for the same reason and because my brain wants to focus on one problem at a time.

pdl · 2012-08-19T08:15:30Z

I am coming at this from the standpoint of people who are installing the Perl modules of others and only want to ask the question: does everything check out? If not, please tell me as fast as possible with the least amount of other extraneous information

@rocky I think, in summary, while this is obviously possible (I'd also like to be able to do this!), the question is how do we keep the most amount of relevant information, i.e. diagnostics calculated after the fail. The options seem to be:

Fail, and stop immediately, losing all post-test diagnostics; encourage authors to diagnose first, ok() later.
Fail and stop at a future point, as soon as we can reasonably determine the failing test has ended, and hope that the test author has put test diags before they run the next test, not somewhere at the end as a summary.
Modify Test::Builder::ok() et al. so that test authors can explicitly link diags with tests, then wait for the best-practice to percolate through the CPAN.

If we can do 3 in the long run, it would be better for your use case. The solution for 3 might affect which of 1 and 2 is more suitable as an interim solution.

schwern · 2012-08-20T06:34:59Z

@karenetheridge

Just to be clear, when we say "user" here we mean "author of a test library",
not "author of tests". So it is Test::Builder::ok (and friends) that would
change, not the users of Test::More::ok etc.

Test::More users as well. For example, I was just talking to Salve Nilsen who
was using Test::More to run system checks for Nagios. He wanted to attach a
severity flag to test failures which Nagios could then interpret.

ok $something, { name => "Is that thing working?", severity => "HIGH" };

Since it already looks like every test library will need to be upgraded anyway
to use Test::Tester rather than Test::Builder::Tester

Nope. TBT will remain backwards compatible, but we have better ways now.

I also like the idea of onfail being a coderef.

Yeah, it's worth investigating. But remember, diagnostics are not just for
errors. It can be used for the sort of information we currently put into test
names. Going back to the Nagios example, here's a loop checking if a list of
hosts have a working web server.

for my $host (@host) {
    ok check_service($host, 80), {
        name => "$server:80 works",
        host => $host,
        port => 80
    };
}

That would produce something like:

ok 1 - example.com:80 works
    host: example.com
    port: 80
    file: httpd.t
    line: 32
not ok 2 - foo.com:80 works
    host: foo.com
    port: 80
    file: httpd.t
    line: 32

This can then be read by a machine and produce a report on what hosts and
services are up or down.

schwern · 2012-08-20T06:35:49Z

@rocky

I am coming at this from the standpoint of people who are installing
the Perl modules of others and only want to ask the question: does
everything check out? If not, please tell me as fast as possible with
the least amount of other extraneous information.

You know, you can do that from the Test::Harness side as well. It could have
a prove option / environment variable to stop running .t files as soon as one
fails. That would likely be simple and easy to convince the Test::Harness
folks to accept (Ovid does Test::Harness, did Test::Most, and is very into
"stop on fail"). And it doesn't matter what module the tests are written in!

"Stop the test suite at the end of a failing test file", whether that's via
BAIL_OUT or a switch on Test::Harness, is fairly straight forward and gets 90%
of what you want. It also doesn't suffer from the "test 23 failed, but test
24 tells me more about why test 23 failed" problem.

schwern · 2012-08-20T06:50:26Z

I've proposed the "bail on fail" option to Test::Harness. Perl-Toolchain-Gang/Test-Harness#4

schwern · 2013-04-23T19:07:14Z

This pull request has a lot of interesting discussion about a lot of things having to do with stopping the test after failure. I'd like to close it up, but capture something concrete. I've opened #319 for the idea of providing a means to bail out at the end of a test file.

schwern · 2013-04-23T19:41:24Z

Whoops, I meant #373

Rocky Bernstein added 3 commits June 9, 2012 20:20

Allow programmer to ask to bail after first failure. It saves a lot o…

1971c54

…f time! Now in master branch.

Untabify change.

91dd9a4

Change bail on fail environment to BAIL_ON_FAIL to match Test::Most. …

1ddcd6c

…Note BAIL_ON_FAIL in pod.

This time, for sure!

3f46d9c

Move the failure test to the beginning of the next text or _ending po…

668bb41

…ssibly along the lines suggested by Michael Schwern.

schwern mentioned this pull request Apr 23, 2013

Option and environment variable to bail out at the end of a failing test. #373

Closed

schwern closed this Apr 23, 2013

schwern mentioned this pull request Jun 1, 2017

Stop on first test file failure option Perl-Toolchain-Gang/Test-Harness#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow programmer to ask to stop on first failure. #319

Allow programmer to ask to stop on first failure. #319

rocky commented Jun 10, 2012

rocky commented Jun 10, 2012

schwern commented Jun 10, 2012

rocky commented Jun 11, 2012

karenetheridge commented Jun 11, 2012

rocky commented Jun 12, 2012

karenetheridge commented Jun 13, 2012

rocky commented Jun 13, 2012

pdl commented Jun 14, 2012

schwern commented Jun 15, 2012

schwern commented Jun 15, 2012

pdl commented Jun 15, 2012

pdl commented Jun 15, 2012

rocky commented Jun 15, 2012

schwern commented Jun 16, 2012

pdl commented Jun 16, 2012

karenetheridge commented Aug 12, 2012

schwern commented Aug 18, 2012

pdl commented Aug 18, 2012

schwern commented Aug 18, 2012

karenetheridge commented Aug 19, 2012

rocky commented Aug 19, 2012

pdl commented Aug 19, 2012

schwern commented Aug 20, 2012

schwern commented Aug 20, 2012

schwern commented Aug 20, 2012

schwern commented Apr 23, 2013

schwern commented Apr 23, 2013

Allow programmer to ask to stop on first failure. #319

Allow programmer to ask to stop on first failure. #319

Conversation

rocky commented Jun 10, 2012

rocky commented Jun 10, 2012

schwern commented Jun 10, 2012

rocky commented Jun 11, 2012

karenetheridge commented Jun 11, 2012

rocky commented Jun 12, 2012

karenetheridge commented Jun 13, 2012

rocky commented Jun 13, 2012

pdl commented Jun 14, 2012

schwern commented Jun 15, 2012

schwern commented Jun 15, 2012

pdl commented Jun 15, 2012

pdl commented Jun 15, 2012

rocky commented Jun 15, 2012

schwern commented Jun 16, 2012

pdl commented Jun 16, 2012

karenetheridge commented Aug 12, 2012

schwern commented Aug 18, 2012

pdl commented Aug 18, 2012

schwern commented Aug 18, 2012

karenetheridge commented Aug 19, 2012

rocky commented Aug 19, 2012

pdl commented Aug 19, 2012

schwern commented Aug 20, 2012

schwern commented Aug 20, 2012

schwern commented Aug 20, 2012

schwern commented Apr 23, 2013

schwern commented Apr 23, 2013