Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upTracking issue for eRFC 2318, Custom test frameworks #50297
Comments
Centril
added
B-RFC-approved
T-dev-tools
T-cargo
A-libtest
C-tracking-issue
labels
Apr 28, 2018
killercup
referenced this issue
May 14, 2018
Closed
Reusable handle to thread-local replaceable stdout/err #50457
This comment has been minimized.
This comment has been minimized.
|
On stdout/err capture (after doing some research around the proof-of-concept #50457): The current state is sub-optimal, to say the least. Both Relevant libstd snippetsLines 23 to 27 in 935a2f1 Lines 38 to 42 in 935a2f1 Lines 614 to 711 in 935a2f1 The existing stdout/err capture for the test harness works by setting the thread-local, which, if set, the The reason for this is that any solution that's generic over The way I see it, there are three ways to resolve this quirk:
(
|
This comment has been minimized.
This comment has been minimized.
|
TL;DR: To have sane capture behavior for stdout and stderr for |
This comment has been minimized.
This comment has been minimized.
This would be significantly slower on certain platforms where process creation is slow, notably Windows. Especially when the tests are all very quick this will cause severe regressions in total time. |
This comment has been minimized.
This comment has been minimized.
|
Another possibility for process separation of tests is to spawn a process for each unit of parallelism, but each process is reused for multiple tests. Of course, we could also just promote the thread-local replaceable stdout/err to The issue is that stdout is, as far as I can tell, a process-local resource. To capture |
fitzgen
referenced this issue
May 22, 2018
Closed
Tracking upstream Rust feature stability and RFCs that affect wasm #63
geofft
referenced this issue
Jun 1, 2018
Closed
Come up with a strategy for writing automated tests #34
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Jun 13, 2018
|
if I want to argue about this do I argue about it here? so like I made but I still want the users to be able to use plain old vanilla rust (disclaimer: I don't fully understand the RFC, but it seems to require either replacing everything, or not having tests) |
This comment has been minimized.
This comment has been minimized.
|
@SoniEx2 this RFC is about how we'd like to change the underlying mechanism that is used to arrange testing in Rust so that authors can write new test frameworks (e.g., quickcheck) that have access to the same mechanisms as existing built-in tests ( able to access private functions, can find all tagged functions, etc.). The current proposal is relatively wholesale (you change the test framework for a whole crate), but this is something that may be improved upon. We decided against more sophisticated designs in the initial implementation phase (this is just an "experimental" RFC) so that we don't bite off more than we can chew, but I think down the line we'd want more fine-grained test framework selection too. As for your specific example, you don't actually need custom frameworks for that feature :) It seems like you want to only run a test if hexchat isn't running? To do that, you could either mark the test with |
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Jun 13, 2018
|
no you don't understand, they have to run inside hexchat. loaded with dlopen (or equivalent). after plugin registration. |
This comment has been minimized.
This comment has been minimized.
|
Hey all, I've come up with a simplified proposal and I have an implementation now (though it needs a little cleaning up before a PR). https://blog.jrenner.net/rust/testing/2018/08/06/custom-test-framework-prop.html Lemme know what you think! |
This comment has been minimized.
This comment has been minimized.
|
I've thought some more about stdin/stderr capture. Here's a draft of a system that should be able to "perfectly" capture stdin/stdout, though not without O(1) (once per test suite) overhead:
|
This comment has been minimized.
This comment has been minimized.
|
@djrenren finally had a chance to read your proposal. I'm excited by the progress on this, though your write-up did raise some questions for me:
|
This comment has been minimized.
This comment has been minimized.
|
I think in most cases the CUT won't need to change itself to integrate with the IDE test runner, so long as the IDE understands the test runner you're working with. As I understand how JetBrains' IntelliJ IDEA integrates with both JUnit and Rust's libtest is that it communicates using the standard API and stdin/out. JUnit's integration is obviously tighter, as IDEA was a Java IDE first, but the integration with libtest tests in Rust is already quite good even without any specific integration required. Obviously the IDE needs to understand how the test runner works or the test runner needs to understand how the IDE works, but under most cases of using a popular test runner, the CUT shouldn't need to change to be tested with in-IDE functionality. |
This comment was marked as off-topic.
This comment was marked as off-topic.
SoniEx2
commented
Aug 21, 2018
|
I'm gonna assume this isn't for me, as a crate author, but for IDE developers. Let me know when crate authors are part of the plan. |
This comment has been minimized.
This comment has been minimized.
I don't want to get too bikesheddy on how that particular macro would work but basically we'd just produce a single const that had the
I agree
This is still entirely possible under the current design, it just requires the test runner to be designed to be pluggable. |
This comment has been minimized.
This comment has been minimized.
|
@SoniEx2 It is for crate authors! For most crate authors (unless you're authoring test machinery), you'll just import any macros you want for declaring tests or benchmarks, and then you'll pick a test runner. For most cases the built-in one should be fine though. |
This comment was marked as off-topic.
This comment was marked as off-topic.
SoniEx2
commented
Aug 21, 2018
|
but how do you run tests inside a hexchat instance, considering the whole hexchat plugin loading system and stuff? |
This comment was marked as off-topic.
This comment was marked as off-topic.
SoniEx2
commented
Aug 21, 2018
|
I mean you can't (easily) run hexchat (a GUI-based networked program) in cargo, can you? |
This comment was marked as off-topic.
This comment was marked as off-topic.
SoniEx2
commented
Aug 21, 2018
|
can I assume this won't help test integrations (how your plugin behaves running inside a host program) and is more geared towards testing your plugin's independent parts? |
This comment has been minimized.
This comment has been minimized.
|
For those worried about stdout capture, I've just demonstrated how this proposal can allow for @CAD97's style of multiprocess stdout capturing: This code compiles and runs under my PR: #53410 The output of |
This comment has been minimized.
This comment has been minimized.
|
And just for clarity, the example does spawn a process for each test, which is suboptimal on OSes where process creation is slow, e.g. Windows. A real runner would implement a scheme like I described earlier to avoid spawning more processes than necessary. (And actually parallelize the running of tests.) |
This comment has been minimized.
This comment has been minimized.
|
Whoops, I'm (very) late to the party. I have read the blog post but not looked at the rustc implementation and I'm concerned with the presence of How will this proposal work with the version of Also it would be best if using a custom test runner didn't require an allocator as that requires quite a bit of boilerplate (e.g. Of course I'm not going to block any PR since this is in eRFC state and the idea is to experiment; cc @Manishearth I thought |
This comment has been minimized.
This comment has been minimized.
|
@japaric : support for the embedded use case has been a goal for me from day one, though I will confess I'm not intimately familiar with it. The Test runner's definitely don't require dynamic allocation now because the slice that's passed is static. As for the |
This comment has been minimized.
This comment has been minimized.
|
Yeah, cargo-fuzz uses no_main but I thought we had dropped the use case from the original eRFC? (or had considered it low priority). cargo-fuzz can be changed to be non-no_main with some libfuzzer edits. |
This comment has been minimized.
This comment has been minimized.
|
Hey all, As I'm wrapping up my internship and will have less time to work on this, I figured I'd leave some notes for the community on the state of things merged so far and going forward. (I'll still be around, but a full-time job will definitely slow my contributions). Current Status:
Work to be done:
Having presented this once at Mozilla and at a meetup, I've gotten a bit of feedback that's worth mentioning. There's concern about only having a single test runner for the crate. @SergioBenitez recommended having I think this adds a reasonably heavy requirement on configuration and build-tools so I'm not sure I agree but the point is well-taken that cooperation between tests and runners could be tenuous under the current circumstances. I believe that having a default I'm sure he could say more about it and offer a stronger argument, but either way this is something we should sort out. Luckily these things could be implemented on top of the current primitives so experimentation should be doable. |
This comment has been minimized.
This comment has been minimized.
|
Happy to expand on that, @djrenren. The ProblemThe current implementation makes it impossible, without workarounds or test runner intervention, to write tests that are intended to be run by different runners in a single crate. For example, consider a crate that would like to use both #[quickcheck]
fn identity(a: i32) -> bool { ... }
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }Under the current plan, this would expand to: #[test_case]
fn identity(a: i32) -> bool { ... }
#[test_case]
fn benchmark(c: &mut Criterion) { ... }To run these under the current proposal, a user would need to provide a single runner that can execute both of these test cases. Of course, that's clearly not the intention here; one runner ( SolutionsI see two potential solutions to this problem: one puts the onus on the test runner and #[test_case]
#[cfg(target_runner = quickcheck::run)]
fn identity(a: i32) -> bool { ... }
#[test_case]
#[cfg(target_runner = criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }When a runner The second solution is almost exactly like the first but moves the responsibility for emitting the #[test_case(quickcheck::run)]
fn identity(a: i32) -> bool { ... }
#[test_case(criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }Then, a SummaryI believe this second approach is the better approach because it is less error prone, more succinct, and hides implementation details. With the first approach, all crates must follow the appropriate convention or risk breaking consumer crate: if even a single crate does not
I disagree. If AddendumWhy is the user restricted to selecting a single test runner? It would be nice to compile one binary that executes test cases from multiple runners. |
This comment has been minimized.
This comment has been minimized.
|
I like the idea of fn main() { // entry generated by `cargo test`
let libtest_tests = &[ /* collected tests */ ];
let quickcheck_tests = &[ /* collected tests */ ];
let criterion_tests = &[ /* collected tests */ ];
::test::run_tests(libtest_tests);
::quickcheck::run(quickcheck_tests);
::criterion::bench(criterion_tests);
}
|
This comment has been minimized.
This comment has been minimized.
|
@SergioBenitez - My rebuttal: I think your view of test runners is different from what I envisioned and that's causing some confusion but firstly, I'd like to point out that the multiple test runner case works under the current implementation and looks roughly like so: #![cfg_attr(quickcheck, test_runner(quickcheck::runner))]
#![cfg_attr(criterion, test_runner(criterion::runner))]
#[cfg(quickcheck)]
mod quickcheck_tests {
#[quickcheck]
fn identity(a: i32) -> bool { ... }
}
#[cfg(criterion)]
mod criterion_benches {
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }
}This misrepresents the relationship between a test and its runner though. In general, tests should know how to run themselves. In fact, this is how #[test]
fn foo(){}expands to: fn foo(){}
#[test_case]
const foo_gensym: TestDescAndFn = TestDescAndFn {
desc: TestDesc {
name: TestName::Static("foo"),
ignore: false,
should_panic: ShouldPanic::No
},
testfn: TestFn::StaticTestFn(|| {
assert_test_result(foo())
}),
}Where Test runners are concerned with structural information about the test. Anything needed for reporting, filtering, or dispatching. This is why quickcheck can reduce to a It's reasonable for a minimal test interface to look like: trait Testable {
// panic on failure
fn run(&self);
fn name(&self) -> String;
fn is_bench(&self) -> bool;
}Then your example: #[quickcheck]
fn identity(a: i32) -> bool { ... }
#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }Would expand to: fn identity(a: i32) -> bool { ... }
#[test_case]
const identity_test: SimpleTest = SimpleTest {
test_fn: || { quickcheck_run(identity) },
test_name: "identity",
is_bench: false
}
fn benchmark(c: &mut Criterion) { ... }
#[test_case]
const benchmark_test: CriterionTest = CriterionTest {
bench_fn: benchmark,
test_name: "benchmark"
}And there would be respective impls for the rest runner: impl Testable for SimpleTest {
fn run(&self) {
self.test_fn()
}
fn name(&self) -> String {
self.name.to_string()
}
fn is_bench(&self) -> bool {
self.is_bench
}
}
impl Testable for CriterionTest {
fn run(&self) {
self.bench_fn(criterion::global_instance())
}
fn name(&self) -> String {
self.name.to_string()
}
fn is_bench() -> bool {
true
}Explictly: there should not be a 1:1 relationship between test runners and test declaration styles This allows the test runner to manage the execution of tests that were not created with it in mind. Under the proposed addition of
A test runner is essentially a main function. It doesn't make sense to have two for the same reasons that it doesn't make sense to have two For example, @CAD97's suggestion is actually what I implemented at first. The issue here is that if $ ./test_bin mod_a::mod_b::test_nameYou'd need a more complex mediator that did something like: $ ./test_bin quickcheck mod_a::mod_b::test_nameIf we allow multiple test runner then this mediation would have to be provided by the compiler, which puts us back into the realm of being opinionated about how test binaries should work. Such a setup could be implemented in user-space with the current implementation. |
This comment has been minimized.
This comment has been minimized.
I'm not sure what you could mean by "tests should know how to run themselves". A
Sure, but I don't see how my proposal enjoins this.
I think this is the desire in almost every case. I understand the principal behind this plug-and-play, swappable runner, trait-based testing, but I don't see it getting use in practice. To think about what such use would mean: one crate would propose a testing interface, and some other crate would say "Oh, sure, I can run that!". It won't happen serendipitously, since the interface will become part of the public API, so one crate is making a decision to allow other runners to run "its" tests. If that's the case, then that crate can simply take in a runner in its own attribute: #[quickcheck(some::runner)]
This is only true because you're defining it that way.
I think this is an interesting design point: should a test runner be able to define the CLI for the binary? I can see arguments both ways. If not, then this is a non-issue, but if so, I wouldn't be opposed to passing test-runner specific arguments, as you suggest, with something like: $ ./test_bin --quickcheck quickcheck args here --criterion some other argsThis would mean that either the binary can't accept any Overall, I think your design optimizes for the wrong thing while making a common thing very difficult and tedious to accomplish. I don't see a compelling argument for swappable runners, and making it cumbersome to have tests for multiple runners in a single crate, a situation I postulate will be very common (something my code does now, for instance), is unfortunate. |
This comment has been minimized.
This comment has been minimized.
|
I'd also like to push back a little bit here @djrenren: it's not clear to me that a "clean" division of "test runners are just main functions that deal with CLI + output" and "tests that know how to run themselves" is sufficient. It could be, but it'd be nice to see some evidence for it (similarly, @SergioBenitez, can you think of any examples where that separation is not sufficient?). There are definitely cases where a test runner should have more knowledge about the exact tests, such as "run at most this many tests in parallel within this test suite", or "this suite of tests share the following set-up, which should only be run once". It could be that these can all be dealt with using shared state among the generated tests + mutexes, but some discussion around it would be good :) |
This comment has been minimized.
This comment has been minimized.
Could you share some other places where this is common? For example, my experience has not included any language where a project has multiple test runners in a single "place". The closest I can think of is multi-language projects. For example, the Rust Playground has Rust unit tests, JS unit tests, and Ruby integration tests, each with their own test runners. Thus I postulate it is not common Even in a Rust-only project, I think I'd prefer to place benchmarks in a completely separate file (a.k.a. a separate crate) from unit tests. I never want to run tests and benchmarks together and the information reported for each is drastically different.
I agree that it seems unlikely that there will be One True Trait for defining a test. I'd expect there to be multiple test runners, each with their own highly-specific trait. What I would expect is to see adapters for tools like proptest to connect to popular test runners, leveraging each runners strengths. Thus, the crate It's possible that over time some basic commonality can be discovered (perhaps something along the lines of TAP?), but there will always be something special a test runner will want to provide. |
This comment has been minimized.
This comment has been minimized.
|
@jonhoo I tried to give a couple of example indicative of this in my previous comment, but perhaps we have different notions of what "separation" means in this case. Could you perhaps expand on the question? @shepmaster In Rocket, I'd love to have the following in a single crate:
I'm totally okay with benchmarks being in another crate. The first four are currently regular
Note that my proposal continues to make this possible, albeit by making it explicit in the code. |
Centril commentedApr 28, 2018
This is a tracking issue for the eRFC "Custom test frameworks" (rust-lang/rfcs#2318).
Steps:
Unresolved questions: