Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_that in v3 and higher does not work with some autograding tools due to reporter change? #1430

Closed
ttimbers opened this issue Aug 26, 2021 · 3 comments · Fixed by #1443
Closed
Labels
feature a feature request or enhancement reporter 📝

Comments

@ttimbers
Copy link

ttimbers commented Aug 26, 2021

A couple of autograding tools (nbgrader and otter-grader), which can autograde R code using software unit tests in Jupyter notebooks and R Markdown, require errors to be thrown in an interactive environment. If errors are not thrown, then the autograding software incorrectly marks student code as correct.

For example, we might as a question like:

In R, calculate sin(pi /4) and name it answer.

The students would then provide some code like (answer below is intentionally incorrect):

answer <- sin(pi / 3)

And then we would test it to give them marks if it is correct (and in this case we wouldn't because the answer is wrong).

test_that statements worked well up until the move to version 3.0, at that point the reporter changed and test_that tests run interactively no longer threw an error. Below I contrast the two behaviours:

Behaviour of {testthat} 2.3.2 (which is what is needed for the autograders):

library(testthat)

test_that("trigonometric functions match identities", {
  expect_equal(sin(pi / 4), answer)
})

print("test kept running")

output:

Error: Test failed: 'trigonometric functions match identities'
* <text>:2: sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159
Traceback:

1. test_that("trigonometric functions match identities", {
 .     expect_equal(sin(pi/4), answer)
 . })
2. test_code(desc, code, env = parent.frame())
3. get_reporter()$end_test(context = get_reporter()$.context, test = test)
4. stop(message, call. = FALSE)

Behaviour of {testthat} 3.0.0:

library(testthat)

test_that("trigonometric functions match identities", {
  expect_equal(sin(pi / 4), answer)
})

print("test kept running")

output:

── Failure (<text>:2:3): trigonometric functions match identities ──────────────
sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159

[1] "test kept running"

I have tried changing the reporter to MultiReporter (CheckReporter and FailReporter to give the students info and force a failure), which works, but only if a new MultiReporter is created each time the tests are called.

This works if run in one code chunk/code cell:

library(testthat)

reporter <- MultiReporter$new(list(
        CheckReporter$new(),
        FailReporter$new()
    ))

with_reporter(reporter, {
  test_that("trigonometric functions match identities", {
    expect_equal(sin(pi / 4), answer)
  })
})

print("test kept running")

Output after running the cell many times (as a student might as they work through the problem:

══ Failed tests ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
── Failure (<text>:10:5): trigonometric functions match identities ─────────────────────
sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
Error: Failures detected.
Traceback:

1. with_reporter(reporter, {
 .     test_that("trigonometric functions match identities", {
 .         expect_equal(sin(pi/4), answer)
 .     })
 . })
2. reporter$end_reporter()
3. o_apply(self$reporters, "end_reporter")
4. lapply(objects, f)
5. FUN(X[[i]], ...)
6. x$end_reporter(...)
7. stop("Failures detected.", call. = FALSE)

But if we create the MultiReporter once, in a code cell/code chunk and then use it in another code cell/code chunk (so we do not keep repeating ourselves), the past fails and passes remain. Below I show running the same test in another code cell/code chunk 3 times incorrectly and then finally once correctly:

with_reporter(reporter, {
  test_that("trigonometric functions match identities", {
    expect_equal(sin(pi / 4), answer)
  })
})

print("test kept running")

output:

══ Failed tests ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
── Failure (<text>:10:5): trigonometric functions match identities ─────────────────────
sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159
── Failure (<text>:3:5): trigonometric functions match identities ─────────────────────
sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159
── Failure (<text>:3:5): trigonometric functions match identities ─────────────────────
sin(pi/4) not equal to `answer`.
1/1 mismatches
[1] 0.707 - 0.866 == -0.159

[ FAIL 3 | WARN 0 | SKIP 0 | PASS 1 ]
Error: Failures detected.
Traceback:

1. with_reporter(reporter, {
 .     test_that("trigonometric functions match identities", {
 .         expect_equal(sin(pi/4), answer)
 .     })
 . })
2. reporter$end_reporter()
3. o_apply(self$reporters, "end_reporter")
4. lapply(objects, f)
5. FUN(X[[i]], ...)
6. x$end_reporter(...)
7. stop("Failures detected.", call. = FALSE)
@hadley hadley added feature a feature request or enhancement reporter 📝 labels Sep 10, 2021
@hadley
Copy link
Member

hadley commented Sep 13, 2021

I think I might've accidentally introduced this bug in 9b207fa. Prior to that change StopReporter always stopped, and afterward it only did if you deliberately opt-ed in to it (I think maybe I confused myself as to what the default was?). There's also a separate bug in 1a6d95f where I used 1 instead of 0 😬

@hadley
Copy link
Member

hadley commented Sep 13, 2021

And fixing it doesn't break any tests, so this makes me more confident I did it by accident 😢

@ttimbers
Copy link
Author

Thanks so much @hadley for looking into this and working on a PR to fix this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement reporter 📝
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants