Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Run unit test multiple times #11354

Open
JarredAllen opened this issue Nov 9, 2022 · 10 comments
Open

Feature Request: Run unit test multiple times #11354

JarredAllen opened this issue Nov 9, 2022 · 10 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-test S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@JarredAllen
Copy link

Problem

In my code, there are some flaky unit tests that pass most, but not all, of the time. To check if a change has fixed the flakiness of the test, it would be convenient to run the test many times and see if it succeeded or not, but afaict cargo doesn't have a way of doing this.

Proposed Solution

It'd be nice if there could be a command-line flag to run tests repeatedly. I'm imagining a command-line syntax like cargo test --repeat=100 testname, which will search for tests named "testname" (like cargo presently does) and then run the test(s) found 100 times, but I'm not too picky about the exact syntax.

Notes

No response

@JarredAllen JarredAllen added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Nov 9, 2022
@jswang
Copy link

jswang commented Nov 9, 2022

This would be very helpful for my use cases as well!

@ehuss
Copy link
Contributor

ehuss commented Nov 9, 2022

This would definitely be useful, I have a macro in my editor for repeating a test. Something built-in would be nicer. However, it is not clear exactly how this should work. For example, it may be better for this to be implemented in the harness itself, in which case rust-lang/rust#65218 would be the issue for that.

@andrewgazelka
Copy link

While this is in development, you can also use cargo nextest, which does this.

@ImmanuelSegol
Copy link

ImmanuelSegol commented Mar 4, 2023

@JarredAllen How do I get this working ?

@andrewgazelka
Copy link

@JarredAllen How do I get this working ?

This is a feature request. It still needs to be implemented.

@weihanglo weihanglo added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label May 15, 2023
@heisen-li
Copy link
Contributor

heisen-li commented Dec 22, 2023

It seems that this work is relatively important. I am currently learning testing related codes, can I try to complete this issue? Is there anything I need to pay special attention to? Or other plans?

@epage
Copy link
Contributor

epage commented Dec 22, 2023

This is marked needs-design, meaning someone needs to put forward a more detailed proposal for what to do before we move forward with implementation.

In particular, we need to figure out which combination of layers this belongs in,

  • if libtest, t-libs-api is likely to defer that to custom test harnesses
  • if cargo-test and we just repeat what was said
  • or some other design that mixes these

@heisen-li
Copy link
Contributor

Sorry for sharing my negligible view., it seems best to make modifications in libtest, fine control is not possible with cargo.

@epage
Copy link
Contributor

epage commented Dec 27, 2023

Our plans for cargo would allow fine control in the future.

We are looking at making cargo and libtest communicate through a greater knowledge of the CLI, including being able to enable json output, putting the responsibility for rendering on cargo.

This would allow cargo test to track individual tests and decide what to do with them, like re-running a failed test.

What would be good is to explore prior art to see if it has any affect on the design. For example, would people want to be able to annotate individual tests about retrying? If so, we'd either want retrying within the test harness or that would be good feedback for the test runner/harness communication.

@bjackman
Copy link

bjackman commented Apr 15, 2024

For example, would people want to be able to annotate individual tests about retrying?

The way you say "retry" and "annotate" makes me think there are two separate usecases in poeple's minds here:

  1. I have discovered that a test is flaky. I am trying to debug/fix it, I need to run it 1000 times to reproduce the failure/evaluate my fix.
  2. Our tests are flaky, we will use retries to make our CI dashboard green.

For usecase 1, I have found that a single global "repeat all selected tests N times" commandlflag works fine. I have seen various names for this, I think --runs-per-test is my favourite because it makes the "repeat all tests" thing obvious. rust-lang/rust#65218 mentions a couple of examples of prior art for this, I have used both of those examples and they seem to work well. I think this is also what the OP suggested.

For usecase 2, there is the philosophical question of how opinionated Cargo wants to be about software engineering. I have worked on a project where the test tools have a global "retry any test that fails" flag. This probably sounds like a bad idea, and my experience has indeed suggested that, exactly as you would suspect, using it means your tests get flakier over time instead of less flaky. My impression of the Rust culture is that people would instinctively agree that this is a harmful feature for test tools to have.

A less toxic design IMO is to be able to annotate invididual tests as "known flaky", and then leave it up to the person/tool running the tests to decide if that means "don't bother running it at all" or "run it up to N times until is passes". Google's monorepo has a tag that works like that and it seems OK to me. This means you can maintain your nice green CI dashboard, but you have a ratchet where you at least notice if a formerly-stable test becomes flaky. It also means if you run a test to change your WIP PR, and the test fails, and it doesn't have the "flaky" tag, you know you probably broke the test. Whereas with the global retry flag you have to go and look in your CI history and do some informal Bayesian analysis.

Anyway, I think that this FR was probably motivated by usecase 1, and that usecase seems much easier to solve, so it might make sense to focus on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-test S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
Status: No status
Development

No branches or pull requests

9 participants