Skip to content

testthat 3e#1165

Open
VisruthSK wants to merge 27 commits intomasterfrom
testthat-3e
Open

testthat 3e#1165
VisruthSK wants to merge 27 commits intomasterfrom
testthat-3e

Conversation

@VisruthSK
Copy link
Copy Markdown
Member

@VisruthSK VisruthSK commented Mar 24, 2026

Migrated to latest testthat edition.

Closes #1155.

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Mar 24, 2026

Thanks for working on this!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.85%. Comparing base (5809552) to head (d0348c6).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1165   +/-   ##
=======================================
  Coverage   90.85%   90.85%           
=======================================
  Files          14       14           
  Lines        5924     5924           
=======================================
  Hits         5382     5382           
  Misses        542      542           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VisruthSK
Copy link
Copy Markdown
Member Author

@jgabry do you know why macos devel would be failing in the R dep setup? Doesn't seem to happen in main--could this be a cache/GHA error? https://github.com/stan-dev/cmdstanr/actions/runs/23573417706/job/68640740296

@VisruthSK VisruthSK marked this pull request as ready for review March 26, 2026 16:14
@VisruthSK
Copy link
Copy Markdown
Member Author

Looks like something is broken in R Core or pak which is causing macos devel to fail.

@VisruthSK VisruthSK requested a review from jgabry March 26, 2026 22:16
@jgabry
Copy link
Copy Markdown
Member

jgabry commented Mar 26, 2026

Looks like something is broken in R Core or pak which is causing macos devel to fail.

Took a quick glance at the logs, I agree this seems likely. I’ll try to take a look at the actual PR tomorrow.

@VisruthSK
Copy link
Copy Markdown
Member Author

VisruthSK commented Mar 26, 2026

Thanks! Looks like this (new) function in tools is broken, planning on submit a patch.

Got fixed, so should be gtg soon

Copy link
Copy Markdown
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great, just a few comments/questions.

before_time <- Sys.time()
mod <- expect_interactive_message(constructor_call, "Compiling Stan program...")
constructor_call <- substitute(constructor_call)
before_time <- Sys.time() - 10
Copy link
Copy Markdown
Member

@jgabry jgabry Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to subtract 10 here when we didn't need to do that previously? If we do need to do something, why not do what you did above in expect_compilation? I'm worried that the 10 seconds could result in false positives here since in our tests we often repeatedly rebuild the same test models in quick succession. But also I might be misunderstanding this.

expect_snapshot_output({
fit$print(c("theta", "tau", "lp__", "lp_approx__"))
})
expect_snapshot_error(fit$print(variable = "unknown", max_rows = 20))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you migrated vb, mle, and laplace like this (snapshotting full print output) but didn't do that for gq and mcmc? I'm a bit worried that snapshotting the full print output like you do here will be brittle since it hardcodes the exact floating point output, which can be sensitive to OS/toolchain/CmdStan. I guess it's passing now on different OSes, but still.

Before you change anything, I'm curious if there was a reason for the different approach here compared to the MCMC tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to ask codex about this concern and I had it run these locally and it found a few things:

  • fit-mle on the PR worktree passed cleanly.
  • fit-vb on the PR worktree failed, while the same fit-vb test on current master passed on this machine. So there is at least one real local regression signal in the PR, not just a theoretical concern.
  • fit-laplace executed successfully, but testthat tried to add fresh snapshots in the temp PR worktree instead of cleanly matching the checked-in ones, which reinforces the concern that these new full-output snapshots are over-specified / need closer validation

"Download of CmdStan failed with error: cannot open URL 'https://github.com/stan-dev/cmdstan/releases/download/v2.35.5/cmdstan-2.35.5.tar.gz'\nPlease check if the supplied version number is valid."
)
expect_error(
expect_snapshot_error(install_cmdstan(version = "2.35.5", wsl = os_is_wsl()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(applies generally, not just to this particular test or error messages, just putting it here)

I have mixed feelings about using expect_snapshot_error (and to some extent, but a lesser extent, expect_snapshot more broadly). It's cleaner but less transparent. I have to check separate files to see what the expected output is that we're checking for. It’s also not explicitly enforcing behavior we want and instead asking just “did it change?”, which is also very useful but not exactly the same thing. It more or less amounts to the same thing if we’ve verified and trust everything before creating the snapshots, but it’s not actually the same thing.

What's your take on this tradeoff?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Swap to testthat 3e

3 participants