Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to fail when a test fails #936

Closed
kenahoo opened this issue Sep 11, 2019 · 32 comments
Closed

Failure to fail when a test fails #936

kenahoo opened this issue Sep 11, 2019 · 32 comments

Comments

@kenahoo
Copy link
Contributor

kenahoo commented Sep 11, 2019

I'm tracking down a Heisenbug where sometimes a test failure in a testthat script doesn't actually cause failure of the test suite.

Changing various random things in the tests can get it to fail properly, and most importantly, running under devtools::test() makes it fail properly while running under R CMD check masks the failure. It also masks the failure when I run tools:::.check_packages(), the function in R that implements R CMD check.

Also, importantly, changing my testthat.R file from tmp <- test_check("gene", reporter="tap") to tmp <- test_check("gene") also makes it fail properly when running under R CMD check. So I'm suspecting a problem in TapReporter somehow.

The testthat.Rout file does show a failure in all cases - it includes the following (I've put in [...] various places to elide the output):

R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.7.0 (64-bit)
[...]
Loading required package: gene
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
WARN [2019-09-11 14:27:47] New power surface models need at least a timestamp argument ruc.ts, falling back to old models
1..23
# Context Testing sag and windspeed model generation
ok 1 # WARNING namespace 'wra.ops' is not available and has been replaced
by .GlobalEnv when processing object ''
ok 2 # WARNING namespace 'wra.ops' is not available and has been replaced
by .GlobalEnv when processing object ''
[...]
ok 20 # WARNING the condition has length > 1 and only the first element will be used
ok 21 # WARNING the condition has length > 1 and only the first element will be used
not ok 22 (unknown)
  object 'latitude' not found
  1: runTest(outputDirectory = "test-output", expectedFile = "expectedDailyClassification.rds") at testthat/test_daily-classification_output-files.R:68
  2: run.daily.classification(jsonInputFilePath) at testthat/test_daily-classification_output-files.R:38
[...]
  24: map(.x[sel], .f, ...)
  25: .f(.x[[i]], ...)
ok 23 multiplication works
Warning message:
replacing previous import 'dplyr::rename' by 'plyr::rename' when loading 'gene' 
> cat('Tests done', file='/tmp/testsdone')
> 
> proc.time()
   user  system elapsed 
  3.727   0.546   4.936 

Notice the not ok 22 in the output. But the R CMD check output contains this:

* checking tests ...
  Running ‘testthat.R’
 OK

To help me track this down - does this file get written and then parsed to determine whether there was a failure? Or do the results get tracked internally and then TapReporter simply writes out a file, the contents of which testthat never does anything with?

I've been trying to isolate the problem by doing the following:

# Location in R version 3.6.1:
trace(tools:::.check_packages, browser, at=list(c(34, 3, 3, 4, 3, 3, 14)))
tools:::.check_packages('../gene_2.18.tar.gz')

But as you probably know, debugging the R CMD blah code is super difficult because of how it keeps calling R through the command line over & over, so I'm trying to boil this down to something inside testthat that will be easier to figure out.

Thanks, hope that was coherent enough! 😄

@hadley
Copy link
Member

hadley commented Sep 11, 2019

Do you have the dev version? It might be the same problem as #851

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 11, 2019

I don't have the dev version, but I'll give that a shot.

I don't understand #851 very well yet, but I don't think it's the same problem - in my case the end result is a "failure to fail" (succeeds when it shouldn't), whereas it looks like in #851 it was failing correctly but just not soon enough?

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 11, 2019

I just confirmed that I'm seeing the same problem on testthat_2.2.1.9000.tar.gz from GitHub:

tools:::.check_packages('../gene_2.18.tar.gz') completes without reporting an error, despite there being a failure in the tests.

After the run, gene.Rcheck/tests/testthat.Rout does show the failure. It doesn't get moved to gene.Rcheck/tests/testthat.Rout.fail since the failure wasn't detected.

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 11, 2019

I figured out that this actually doesn't depend on R CMD check or its in-R equivalent tools:::.check_packages(), it can be demonstrated only using testthat::test_dir() (much easier and faster!) or even testthat::test_file():

> result <- testthat::test_file('tests/testthat/test_daily-classification_output-files.R')
[... bunch of output, ending with: ]
══ Results ═════════════════════════════════════════════════════════════════════════════════════════════
Duration: 0.3 s

OK:       0
Failed:   1
Warnings: 6
Skipped:  0

> result
 [1] file    context test    nb      failed  skipped error   warning user    system  real    passed 
<0 rows> (or 0-length row.names)

So even though it shows a failure in the results, there's no information about the failure in the return value, so it doesn't fail the suite overall.

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 11, 2019

Simpler demonstration:

> cat("stop('failing test')\n", file="/tmp/test-file.R")
> result <- testthat::test_file('/tmp/test-file.R')
✔ |  OK F W S | Context
⠋ |   0 1     | 
══ Results ═════════════════════════════════════════════════════════════════════════════════════════════
OK:       0
Failed:   1
Warnings: 0
Skipped:  0
> result
 [1] file    context test    nb      failed  skipped error   warning user    system  real    passed 
<0 rows> (or 0-length row.names)

I think this shows that if a test fails by dying, instead of (or in my original case, in addition to) a failed assertion, the failure won't be propagated upwards and the overall tests will appear to succeed, and an automated testing harness (like R CMD check) won't be able to detect the failure by looking at the exit value.

@hadley
Copy link
Member

hadley commented Sep 11, 2019

Here the problem isn't so much a test failure as an uncaught error, but it may have same root cause as #781

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 12, 2019

I don't see the connection to #781 myself, but it could be.

I believe this bug didn't appear in previous versions of testthat, because an older test server we had, running an older testthat, didn't show this problem. I haven't confirmed that yet, though. Might do a git bisect on it.

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 12, 2019

Looks like this was introduced in 25e7d93:

index 9117518..0856985 100644
--- a/R/reporter-progress.R
+++ b/R/reporter-progress.R
@@ -60,6 +60,10 @@ ProgressReporter <- R6::R6Class("ProgressReporter",
 
     start_file = function(file) {
       self$file_name <- file
+
+      # Need to set here in case file doesn't contain any tests
+      self$ctxt_issues <- Stack$new()
+      self$ctxt_start_time <- proc.time()
     },
 
     start_test = function(context, test) {
diff --git a/R/test-files.R b/R/test-files.R
index 1582b44..5ad8247 100644
--- a/R/test-files.R
+++ b/R/test-files.R
@@ -128,7 +128,7 @@ test_file <- function(path, reporter = default_reporter(), env = test_env(),
     {
       # We need to notify the lister separately from the reporter, which is why
       # we call start_file methods twice.
-      get_reporter()$start_file(basename(path))
+      reporter$start_file(basename(path))
       lister$start_file(basename(path))
 
       source_file(
@@ -136,7 +136,8 @@ test_file <- function(path, reporter = default_reporter(), env = test_env(),
         chdir = TRUE, wrap = wrap
       )
 
-      end_context()
+      reporter$end_context()
+      reporter$end_file()
     }
   )
 

Before that commit:

% Rscript -e 'devtools::load_all(); test_files("/tmp/test-file.R", stop_on_failure=TRUE)'
Loading testthat
✔ |  OK F W S | Context
Error in x[[method]](...) : attempt to apply non-function
Calls: test_files ... <Anonymous> -> o_apply -> lapply -> FUN -> <Anonymous>

══ Results ═════════════════════════════════════════════════════════════════════
OK:       0
Failed:   2
Warnings: 0
Skipped:  0
Execution halted

% echo $?
1

After that commit:

% Rscript -e 'devtools::load_all(); test_files("/tmp/test-file.R", stop_on_failure=TRUE)'
Loading testthat
✔ |  OK F W S | Context
✖ |   0 1     | 
────────────────────────────────────────────────────────────────────────────────
test-file.R:1: error: (unknown)
failing test
1: stop("failing test") at /tmp/test-file.R:1
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
OK:       0
Failed:   1
Warnings: 0
Skipped:  0

% echo $?
0

Note the difference in exit status.

@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 12, 2019

For completeness (to make sure I wasn't doing something dumb), I also ran a non-failing test before & after this commit, here are the results:

% echo "expect_equal(2, 2)" > /tmp/test-ok.R

% co 25e7d93030bc594aa92f287e10d8eb3d678e4977^                                         
Previous HEAD position was 25e7d93 Call end_file() on reporter
HEAD is now at 5db9225 Better error messages for missing arguments

% Rscript -e 'devtools::load_all(); test_files("/tmp/test-ok.R", stop_on_failure=TRUE)'
Loading testthat
✔ |  OK F W S | Context
⠋ |   1       | 
══ Results ═════════════════════════════════════════════════════════════════════
OK:       1
Failed:   0
Warnings: 0
Skipped:  0

Nice code.

% co 25e7d93030bc594aa92f287e10d8eb3d678e4977                                          
Previous HEAD position was 5db9225 Better error messages for missing arguments
HEAD is now at 25e7d93 Call end_file() on reporter

% Rscript -e 'devtools::load_all(); test_files("/tmp/test-ok.R", stop_on_failure=TRUE)'
Loading testthat
✔ |  OK F W S | Context
✔ |   1       | 

══ Results ═════════════════════════════════════════════════════════════════════
OK:       1
Failed:   0
Warnings: 0
Skipped:  0

@kenahoo kenahoo changed the title TapReporter fails to fail when a test fails Failure to fail when a test fails Sep 12, 2019
@kenahoo
Copy link
Contributor Author

kenahoo commented Sep 23, 2019

I have created a branch here that shows a failing test for this problem: https://github.com/kenahoo/test_that/tree/bug-936-failure-to-fail-when-test-fails

I also started to make a fix, I found the problem point in the code (see the 2nd commit), but I don't know how to tell the testing infrastructure that something failed. Any help?

@kenahoo
Copy link
Contributor Author

kenahoo commented Oct 3, 2019

@hadley I'd love a pointer here in the right direction if you could, this is hitting us kind of hard lately.

@hadley
Copy link
Member

hadley commented Oct 3, 2019

@lionel- can you take a look please?

@kenahoo
Copy link
Contributor Author

kenahoo commented Oct 9, 2019

@hadley and @lionel- this seems like quite a major bug to be lying dormant, any updates?

@lionel-
Copy link
Member

lionel- commented Oct 10, 2019

Will take a look today, sorry for missing this.

@lionel-
Copy link
Member

lionel- commented Oct 10, 2019

I'm still discovering testthat internals but it looks like:

  • An empty results value is to be expected if you're stopping or failing outside a testthat block. It looks unrelated to the problem at hand.

  • Most reporters are not designed to throw an error if a failure occurs. Exceptions are the "check" and "fail" reporters. This is why the tap reporter doesn't stop on failure.

  • The documentation of the "fail" reporter mentions it is designed to be combined with other reporters to force them to throw an error when an unexpected result occurs.

The following seems to fix the issue:

reporter <- MultiReporter$new(list(
  FailReporter$new(),
  TapReporter$new()
))
test_check("mypackage", reporter = reporter)

Perhaps test_check() should always combine non-default reporters with the fail reporter. This would make it easy to use reporters within tests/ files.

@hadley
Copy link
Member

hadley commented Oct 10, 2019

This doesn't sound like a major bug, in fact, it sounds like testthat is acting as designed.

@lionel-
Copy link
Member

lionel- commented Oct 10, 2019

That's my take as well. Perhaps the design could be improved along the lines suggested above, since test_check() is meant for checking packages and it doesn't make much sense to not report failures to R CMD check (IIUC).

@kenahoo

This comment has been minimized.

@lionel-

This comment has been minimized.

@kenahoo

This comment has been minimized.

@lionel-

This comment has been minimized.

@kenahoo

This comment has been minimized.

@lionel-

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Oct 10, 2019

The point of most of the reporters is to produce output, and you don't want them to fail because that would prevent other code from running (I think there may have been a previous bug where errors outside of test blocks caused reporters to fail). If you're using a reporter other than CheckReporter you have to figure out yourself what should happen if the tests fail.

@kenahoo

This comment has been minimized.

@lionel-

This comment has been minimized.

@lionel-
Copy link
Member

lionel- commented Oct 10, 2019

Another possible improvement would be for test_check() to test the reporter on a dummy failing test and signal an informative warning when the reporter doesn't fail.

Both of the possible behaviours (auto-combining with "fail" reporter or informative warning) should probably occur only when the reporter was passed as a string.

@kenahoo
Copy link
Contributor Author

kenahoo commented Oct 10, 2019

Am I correct that the TapReporter object is never informed that a test died? Could that be made a common property of all reporters - to know whether the tests completed successfully without dying? That does seem like something within its purview, inasmuch as it's something worth "producing output" about.

Then the failure fact wouldn't have to be passed using exceptions (or combining with another reporter like FailReporter), it could be a simple property lookup.

@lionel-
Copy link
Member

lionel- commented Oct 11, 2019

A redesign of reporters is most likely not warranted at this point.

@hadley
Copy link
Member

hadley commented Oct 13, 2019

Minimal reprex for future reference:

cat("stop('failing test')\n", file="/tmp/test-file.R")
(testthat::test_file('/tmp/test-file.R', reporter = NULL))
#>  [1] file    context test    nb      failed  skipped error   warning
#>  [9] user    system  real    passed 
#> <0 rows> (or 0-length row.names)

Created on 2019-10-13 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Oct 13, 2019

@kenahoo Ok, now I get what you're saying — very shortly you'll get these results:

cat("stop('failing test')\n", file="/tmp/test-file.R")
(testthat::test_file('/tmp/test-file.R', reporter = NULL))
#>          file     context   test nb failed skipped error warning user
#> 1 test-file.R test-file.R <bare>  0      0   FALSE  TRUE       0   NA
#>   system real passed result
#> 1     NA   NA      0   NULL
(testthat:::test_files('/tmp/test-file.R', stop_on_failure = TRUE))
#> ✔ |  OK F W S | Context
#> |   0 1     | 
#> ══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════
#> Duration: 0.1 s
#> 
#> OK:       0
#> Failed:   1
#> Warnings: 0
#> Skipped:  0
#> Error: Test failures

Created on 2019-10-13 by the reprex package (v0.3.0)

@hadley hadley closed this as completed in 6463e58 Oct 13, 2019
@kenahoo
Copy link
Contributor Author

kenahoo commented Oct 13, 2019

Perfect, @hadley - thanks for sticking with me. This will be a big help and let us remove a bunch of workarounds for detecting failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants