testthat 3e by VisruthSK · Pull Request #1165 · stan-dev/cmdstanr

VisruthSK · 2026-03-24T17:37:12Z

Migrated to latest testthat edition.

Closes #1155.

jgabry · 2026-03-24T18:22:04Z

Thanks for working on this!

codecov-commenter · 2026-03-24T21:24:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.85%. Comparing base (5809552) to head (d0348c6).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1165   +/-   ##
=======================================
  Coverage   90.85%   90.85%           
=======================================
  Files          14       14           
  Lines        5924     5924           
=======================================
  Hits         5382     5382           
  Misses        542      542

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… potentially

VisruthSK · 2026-03-26T03:29:37Z

@jgabry do you know why macos devel would be failing in the R dep setup? Doesn't seem to happen in main--could this be a cache/GHA error? https://github.com/stan-dev/cmdstanr/actions/runs/23573417706/job/68640740296

VisruthSK · 2026-03-26T22:16:03Z

Looks like something is broken in R Core or pak which is causing macos devel to fail.

jgabry · 2026-03-26T22:39:13Z

Looks like something is broken in R Core or pak which is causing macos devel to fail.

Took a quick glance at the logs, I agree this seems likely. I’ll try to take a look at the actual PR tomorrow.

VisruthSK · 2026-03-26T22:40:28Z

Thanks! Looks like this (new) function in tools is broken, planning on submit a patch.

Got fixed, so should be gtg soon

jgabry

Mostly looks great, just a few comments/questions.

jgabry · 2026-03-27T17:08:58Z

tests/testthat/helper-custom-expectations.R

-  before_time <- Sys.time()
-  mod <- expect_interactive_message(constructor_call, "Compiling Stan program...")
+  constructor_call <- substitute(constructor_call)
+  before_time <- Sys.time() - 10


Why do we need to subtract 10 here when we didn't need to do that previously? If we do need to do something, why not do what you did above in expect_compilation? I'm worried that the 10 seconds could result in false positives here since in our tests we often repeatedly rebuild the same test models in quick succession. But also I might be misunderstanding this.

jgabry · 2026-03-27T17:17:49Z

tests/testthat/test-fit-vb.R

+  expect_snapshot_output({
+    fit$print(c("theta", "tau", "lp__", "lp_approx__"))
+  })
+  expect_snapshot_error(fit$print(variable = "unknown", max_rows = 20))


Is there a reason why you migrated vb, mle, and laplace like this (snapshotting full print output) but didn't do that for gq and mcmc? I'm a bit worried that snapshotting the full print output like you do here will be brittle since it hardcodes the exact floating point output, which can be sensitive to OS/toolchain/CmdStan. I guess it's passing now on different OSes, but still.

Before you change anything, I'm curious if there was a reason for the different approach here compared to the MCMC tests.

I decided to ask codex about this concern and I had it run these locally and it found a few things:

fit-mle on the PR worktree passed cleanly.

fit-vb on the PR worktree failed, while the same fit-vb test on current master passed on this machine. So there is at least one real local regression signal in the PR, not just a theoretical concern.

fit-laplace executed successfully, but testthat tried to add fresh snapshots in the temp PR worktree instead of cleanly matching the checked-in ones, which reinforces the concern that these new full-output snapshots are over-specified / need closer validation

jgabry · 2026-03-27T17:25:09Z

tests/testthat/test-install.R

-    "Download of CmdStan failed with error: cannot open URL 'https://github.com/stan-dev/cmdstan/releases/download/v2.35.5/cmdstan-2.35.5.tar.gz'\nPlease check if the supplied version number is valid."
-  )
-  expect_error(
+  expect_snapshot_error(install_cmdstan(version = "2.35.5", wsl = os_is_wsl()))


(applies generally, not just to this particular test or error messages, just putting it here)

I have mixed feelings about using expect_snapshot_error (and to some extent, but a lesser extent, expect_snapshot more broadly). It's cleaner but less transparent. I have to check separate files to see what the expected output is that we're checking for. It’s also not explicitly enforcing behavior we want and instead asking just “did it change?”, which is also very useful but not exactly the same thing. It more or less amounts to the same thing if we’ve verified and trust everything before creating the snapshots, but it’s not actually the same thing.

What's your take on this tradeoff?

VisruthSK added 2 commits March 24, 2026 10:04

Moved to 3e and removed context calls

1e10511

LLMd testthat 3e syntax changes

5115c34

VisruthSK and others added 2 commits March 24, 2026 12:52

Tweak some minor testing things

a91cff8

Fixed tests

c98740e

VisruthSK and others added 22 commits March 24, 2026 14:40

Refresh test binaries

e1399ba

Use withr for tests

ce5f289

Fix LF issue in tests

2970907

Using more withr and testthat 3e features; setting up parallelization…

a46a125

… potentially

Stabilize parallel test runs

e5e7de5

Avoid pak local install in CI

c66a807

Serialize test model compilation

e2d2e5f

Run stateful tests sequentially

8d926bb

Stabilize OpenCL test

6797833

Trim unstable OpenCL checks

86bfac6

Run threaded tests sequentially

29d75db

Simplify test harness

1366910

Use repo pak on macOS devel

773ca2b

Use devel pak on macOS devel

159335c

Install local package outside pak on macOS devel

dfaee86

No parallel tests

09c3fcd

Cleaning tests up; more snapshots

6a01a93

More snapshots

faf24a8

Small changes

08ef444

Bump testthat requirement

b7cb687

Removed brittle snapshots

e55042f

Transform windows snapshots to remove .exe

d684bc3

More snapshots

d0348c6

VisruthSK marked this pull request as ready for review March 26, 2026 16:14

VisruthSK requested a review from jgabry March 26, 2026 22:16

jgabry reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

testthat 3e#1165

testthat 3e#1165
VisruthSK wants to merge 27 commits intomasterfrom
testthat-3e

VisruthSK commented Mar 24, 2026 •

edited

Loading

Uh oh!

jgabry commented Mar 24, 2026

Uh oh!

codecov-commenter commented Mar 24, 2026 •

edited

Loading

Uh oh!

VisruthSK commented Mar 26, 2026

Uh oh!

VisruthSK commented Mar 26, 2026

Uh oh!

jgabry commented Mar 26, 2026

Uh oh!

VisruthSK commented Mar 26, 2026 •

edited

Loading

Uh oh!

jgabry left a comment

Uh oh!

jgabry Mar 27, 2026 •

edited

Loading

Uh oh!

jgabry Mar 27, 2026

Uh oh!

jgabry Mar 27, 2026

Uh oh!

jgabry Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

VisruthSK commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgabry commented Mar 24, 2026

Uh oh!

codecov-commenter commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

VisruthSK commented Mar 26, 2026

Uh oh!

VisruthSK commented Mar 26, 2026

Uh oh!

jgabry commented Mar 26, 2026

Uh oh!

VisruthSK commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgabry left a comment

Choose a reason for hiding this comment

Uh oh!

jgabry Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgabry Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

jgabry Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

jgabry Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VisruthSK commented Mar 24, 2026 •

edited

Loading

codecov-commenter commented Mar 24, 2026 •

edited

Loading

VisruthSK commented Mar 26, 2026 •

edited

Loading

jgabry Mar 27, 2026 •

edited

Loading