Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for eRFC 2318, Custom test frameworks #50297

Open
Centril opened this issue Apr 28, 2018 · 34 comments

Comments

Projects
None yet
@Centril
Copy link
Member

commented Apr 28, 2018

This is a tracking issue for the eRFC "Custom test frameworks" (rust-lang/rfcs#2318).

Steps:

Unresolved questions:

@CAD97

This comment has been minimized.

Copy link
Contributor

commented May 14, 2018

On stdout/err capture (after doing some research around the proof-of-concept #50457):

The current state is sub-optimal, to say the least. Both stdout and stderr exist as thread-local handles that only the stdlib has access to, which print family of macros use, as does panic.

Relevant libstd snippets

thread_local! {
static LOCAL_STDOUT: RefCell<Option<Box<Write + Send>>> = {
RefCell::new(None)
}
}

thread_local! {
pub static LOCAL_STDERR: RefCell<Option<Box<Write + Send>>> = {
RefCell::new(None)
}
}

rust/src/libstd/io/stdio.rs

Lines 614 to 711 in 935a2f1

/// Resets the thread-local stderr handle to the specified writer
///
/// This will replace the current thread's stderr handle, returning the old
/// handle. All future calls to `panic!` and friends will emit their output to
/// this specified handle.
///
/// Note that this does not need to be called for all new threads; the default
/// output handle is to the process's stderr stream.
#[unstable(feature = "set_stdio",
reason = "this function may disappear completely or be replaced \
with a more general mechanism",
issue = "0")]
#[doc(hidden)]
pub fn set_panic(sink: Option<Box<Write + Send>>) -> Option<Box<Write + Send>> {
use panicking::LOCAL_STDERR;
use mem;
LOCAL_STDERR.with(move |slot| {
mem::replace(&mut *slot.borrow_mut(), sink)
}).and_then(|mut s| {
let _ = s.flush();
Some(s)
})
}
/// Resets the thread-local stdout handle to the specified writer
///
/// This will replace the current thread's stdout handle, returning the old
/// handle. All future calls to `print!` and friends will emit their output to
/// this specified handle.
///
/// Note that this does not need to be called for all new threads; the default
/// output handle is to the process's stdout stream.
#[unstable(feature = "set_stdio",
reason = "this function may disappear completely or be replaced \
with a more general mechanism",
issue = "0")]
#[doc(hidden)]
pub fn set_print(sink: Option<Box<Write + Send>>) -> Option<Box<Write + Send>> {
use mem;
LOCAL_STDOUT.with(move |slot| {
mem::replace(&mut *slot.borrow_mut(), sink)
}).and_then(|mut s| {
let _ = s.flush();
Some(s)
})
}
/// Write `args` to output stream `local_s` if possible, `global_s`
/// otherwise. `label` identifies the stream in a panic message.
///
/// This function is used to print error messages, so it takes extra
/// care to avoid causing a panic when `local_stream` is unusable.
/// For instance, if the TLS key for the local stream is
/// already destroyed, or if the local stream is locked by another
/// thread, it will just fall back to the global stream.
///
/// However, if the actual I/O causes an error, this function does panic.
fn print_to<T>(
args: fmt::Arguments,
local_s: &'static LocalKey<RefCell<Option<Box<Write+Send>>>>,
global_s: fn() -> T,
label: &str,
)
where
T: Write,
{
let result = local_s.try_with(|s| {
if let Ok(mut borrowed) = s.try_borrow_mut() {
if let Some(w) = borrowed.as_mut() {
return w.write_fmt(args);
}
}
global_s().write_fmt(args)
}).unwrap_or_else(|_| {
global_s().write_fmt(args)
});
if let Err(e) = result {
panic!("failed printing to {}: {}", label, e);
}
}
#[unstable(feature = "print_internals",
reason = "implementation detail which may disappear or be replaced at any time",
issue = "0")]
#[doc(hidden)]
pub fn _print(args: fmt::Arguments) {
print_to(args, &LOCAL_STDOUT, stdout, "stdout");
}
#[unstable(feature = "print_internals",
reason = "implementation detail which may disappear or be replaced at any time",
issue = "0")]
#[doc(hidden)]
pub fn _eprint(args: fmt::Arguments) {
use panicking::LOCAL_STDERR;
print_to(args, &LOCAL_STDERR, stderr, "stderr");
}


The existing stdout/err capture for the test harness works by setting the thread-local, which, if set, the print family of macros will use, otherwise falling back to io::stdout(). This setup is problematic because it's fairly trivial to escape this (#12309) and 99.9% of logging frameworks will escape accidentally (#40298).

The reason for this is that any solution that's generic over io::Write is going to use io::stdout() as the sink to hit stdout, and that doesn't go through the tread-local plumbing, so hits the true stdout directly.

The way I see it, there are three ways to resolve this quirk:

  • Create a new stdlib io::Write sink that uses the thread-local plumbing that the print macros do today, and implement the print macros as writing to that sink. That's the approach I proved was possible with #50457.
  • Create a new io::Write sink that uses (e)print! behind the scenes. This is possible entirely in an external crate:
(E)PrintWriter - feel free to use these, I am the author and release them under MIT/Unlicense
/// IO sink that uses the `print` macro, for test stdout capture.
pub struct PrintWriter;
impl std::io::Write for PrintWriter {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        print!("{}", String::from_utf8_lossy(buf));
        Ok(buf.len())
    }
    fn flush(&mut self) -> std::io::Result<()> { std::io::stdout().flush() }
}

/// IO sink that uses the `eprint` macro, for test stdout capture.
pub struct EPrintWriter;
impl std::io::Write for EPrintWriter {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        eprint!("{}", String::from_utf8_lossy(buf));
        Ok(buf.len())
    }
    fn flush(&mut self) -> std::io::Result<()> { std::io::stdout().flush() }
}

Unfortunately, this suffers from a large number of drawbacks. Because we only have access to the print macro, we have to interpret whatever sequence of bytes the sink gets as a UTF-8 string, even though that's potentially not true, etc. etc. not zero-cost when it's not, and so on. It's a decent solution for hooking up logging in tests, but is very much not a general solution that could be used outside tests.

That takes us to the third option, which is almost certainly the best, but also the most complicated:

  • Actually change stdout/err

It sounds simple, but it's a lot simpler said than done with the current test harness setup. io::Stderr boils down to a handle to a write stream available to the process, and is a process-local resource -- there's only one stdout per process. The test harness currently runs tests parallel on different threads within the same process, thus the current thread-local replacement.

The other issue is that the test harness doesn't just want to capture stdout of the tests, it wants to capture stdout of the tests separately for each test, so it can print just the stdout of the failing test(s) and not have the other tests' output interspersed with it.

The only solution I see at this time that maintains current behavior of separate captures per test run is to run them all in their own process, and capture stdout that way. That way each process can pipe its stdout/err to the test runner, which can then synchronize them and decide which ones to output to the true stdout/err if desired.

A sketch of the plumbing required to make an out-of-tree libtest that works with this behavior:

  • #[test] fn are all collected into a list
  • A fn main is generated that takes an argument of one of the paths, and runs that test
  • (The existing fn main is hidden)
  • As normal, successful execution is success, and a panic is a failure exit code
  • Our runner calls the generated binary with each of the test paths that match any filtering done as separate processes, captures their stdout/err and reports them on failure

Note that this is unfortunately stateful: we need to fold over all #[test] fn before we can emit the proper main. I'm not knowledgeable enough around compiler internals to be much help implementing the machinery right now (though I'd love to learn my way around and help out!), but give me an API to build against and I'll definitely help with any test frameworks built against the plumbing the eRFC has us set to experiment with.

@CAD97

This comment has been minimized.

Copy link
Contributor

commented May 14, 2018

TL;DR:

To have sane capture behavior for stdout and stderr for print-family macros, panic, and io::{stdout, stderr}, running each test in its own (child) process is probably required. I'd love to help out wherever I can, but lack confidence.

@retep998

This comment has been minimized.

Copy link
Member

commented May 15, 2018

The only solution I see at this time that maintains current behavior of separate captures per test run is to run them all in their own process, and capture stdout that way.

This would be significantly slower on certain platforms where process creation is slow, notably Windows. Especially when the tests are all very quick this will cause severe regressions in total time.

@CAD97

This comment has been minimized.

Copy link
Contributor

commented May 15, 2018

Another possibility for process separation of tests is to spawn a process for each unit of parallelism, but each process is reused for multiple tests.

Of course, we could also just promote the thread-local replaceable stdout/err to io::Stdout/Err, but that would be suboptimal itself.

The issue is that stdout is, as far as I can tell, a process-local resource. To capture io::Stdout we either need our own stdout replacement machinery on top or to separate processes.

@SoniEx2

This comment has been minimized.

Copy link

commented Jun 13, 2018

if I want to argue about this do I argue about it here?

so like I made hexchat-plugin crate. the problem is you can't test hexchat side without hexchat. so I'd need some sort of #[hexchat_test].

but I still want the users to be able to use plain old vanilla rust #[test]s. I shouldn't have to reimplement/replace the whole testing system just to be able to produce some hexchat-based unit tests (in addition to normal tests).

(disclaimer: I don't fully understand the RFC, but it seems to require either replacing everything, or not having tests)

@jonhoo

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2018

@SoniEx2 this RFC is about how we'd like to change the underlying mechanism that is used to arrange testing in Rust so that authors can write new test frameworks (e.g., quickcheck) that have access to the same mechanisms as existing built-in tests ( able to access private functions, can find all tagged functions, etc.). The current proposal is relatively wholesale (you change the test framework for a whole crate), but this is something that may be improved upon. We decided against more sophisticated designs in the initial implementation phase (this is just an "experimental" RFC) so that we don't bite off more than we can chew, but I think down the line we'd want more fine-grained test framework selection too.

As for your specific example, you don't actually need custom frameworks for that feature :) It seems like you want to only run a test if hexchat isn't running? To do that, you could either mark the test with #[allow_fail], or you could have the test just return if it fails to connect. If you wanted a more fine-grained approach, you could write a procedural macro that re-writes any function annotated with #[hexchat_test] into a #[test] that returns if hexchat isn't running.

@SoniEx2

This comment has been minimized.

Copy link

commented Jun 13, 2018

no you don't understand, they have to run inside hexchat. loaded with dlopen (or equivalent). after plugin registration.

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2018

Hey all, I've come up with a simplified proposal and I have an implementation now (though it needs a little cleaning up before a PR).

https://blog.jrenner.net/rust/testing/2018/08/06/custom-test-framework-prop.html

Lemme know what you think!

@CAD97

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

I've thought some more about stdin/stderr capture. Here's a draft of a system that should be able to "perfectly" capture stdin/stdout, though not without O(1) (once per test suite) overhead:

  • The main process is started by cargo test. It does no actual testing; its job is to coordinate child processes to do the actual tests.
  • At startup of the main process, it spawns a child process for each unit of parallelism in the test runner. This is a one-time cost, as these child processes are reused for multiple tests.
  • With each child process spawned, a communication channel is created between the processes distinct of stdin/out/err. This is used to instruct the child process which test to run next, and for the child process to report back the success/failure of the test.
  • We avoid talking to the child process on stdin because the test could listen to stdin, and we want to avoid accidentally polluting that. (Also, we need a back-channel that's not stdout to communicate test results anyway.)
  • The stdout/stderr of the child processes are piped into communication channels into the main process, which handles reporting those if the test failed.
  • Tests are run off the main thread in child processes, so that the child process doesn't die to a panic and that it can (or the main thread can instruct it to) terminate runaway tests early.
  • After the main process has handed out all tests to the child processes, it shuts them down as it receives the final reports, and then outputs the final report to the user before terminating.
@jonhoo

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

@djrenren finally had a chance to read your proposal. I'm excited by the progress on this, though your write-up did raise some questions for me:

  • Early on, you say that "Annotated items must be a nameable const, static, or fn.". Further down, when you discuss the use-case of a framework that supports test suites, you show a #[test_suite] annotation on a mod with #[suite_member] on the nested tests. It'd be good to show how that expansion works. Does a proc macro turn #[test_suite] into a const with an element for each #[suite_member]? How does it find all of them? Are the #[suite_member]s all turned into #[test_case]? How do you avoid double-executing those test cases (once as a "child" and once due to the #[test-case] annotation)?
  • For #![test_runner] you say "The type of the function must be Fn(&[&mut Foo]) -> impl Termination for some Foo which is the test type", yet further down you write a Fn(&[&dyn Foo]). I assume the latter is the correct version?
  • Is TestDescAndFn sufficient for our purposes here? I don't think that can (currently) wrap a struct or const in a meaningful way?
  • I find the "test runner" and "test format" examples a little backwards. To me, "formatting" is about output, and "running" is about the structure of the tests themselves (and how they're run). We can bikeshed names later though.
  • The "editor wants to integrate with testing" example makes me sad. It seems really sad for the crate-under-test to have to be modified to work with my editor. One of the earlier RFCs had a section on separating the test harness from the test executor, with the former controlling output and the latter how tests are written and executed. And crucially, the former should be possible to swap without changing the source code of the c-u-t. That doesn't solve the case of an editor wanting to control test execution as well, but I think ideally we'd be able to fold that into the harness somehow too.
@CAD97

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

I think in most cases the CUT won't need to change itself to integrate with the IDE test runner, so long as the IDE understands the test runner you're working with. As I understand how JetBrains' IntelliJ IDEA integrates with both JUnit and Rust's libtest is that it communicates using the standard API and stdin/out. JUnit's integration is obviously tighter, as IDEA was a Java IDE first, but the integration with libtest tests in Rust is already quite good even without any specific integration required.

Obviously the IDE needs to understand how the test runner works or the test runner needs to understand how the IDE works, but under most cases of using a popular test runner, the CUT shouldn't need to change to be tested with in-IDE functionality.

@SoniEx2

This comment was marked as off-topic.

Copy link

commented Aug 21, 2018

I'm gonna assume this isn't for me, as a crate author, but for IDE developers.

Let me know when crate authors are part of the plan.

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

@jonhoo

Early on, you say that...

I don't want to get too bikesheddy on how that particular macro would work but basically we'd just produce a single const that had the #[test_case] annotation. The children would be aggregated inside it.

I assume the latter is the correct version?

Fn(&[&Foo]) -> () is a subtype Fn(&[&mut Foo]) -> impl Termination. It's not required that Foo be a Trait but if it is, then you need dyn, so those two lines aren't in contradiction.

Is TestDescAndFn sufficient for our purposes here?

I agree TestDescAndFn is not sufficient, though it could successfully be constructed from all sorts of things with the use of proc macros. I think for libtest we should just make a trait that essentially mirrors TestDescAndFn and accept that.

One of the earlier RFCs had a section on separating the test harness from the test executor, with the former controlling output and the latter how tests are written and executed.

This is still entirely possible under the current design, it just requires the test runner to be designed to be pluggable.

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

@SoniEx2 It is for crate authors! 😄

For most crate authors (unless you're authoring test machinery), you'll just import any macros you want for declaring tests or benchmarks, and then you'll pick a test runner. For most cases the built-in one should be fine though.

@SoniEx2

This comment was marked as off-topic.

Copy link

commented Aug 21, 2018

but how do you run tests inside a hexchat instance, considering the whole hexchat plugin loading system and stuff?

@SoniEx2

This comment was marked as off-topic.

Copy link

commented Aug 21, 2018

I mean you can't (easily) run hexchat (a GUI-based networked program) in cargo, can you?

@SoniEx2

This comment was marked as off-topic.

Copy link

commented Aug 21, 2018

can I assume this won't help test integrations (how your plugin behaves running inside a host program) and is more geared towards testing your plugin's independent parts?

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

For those worried about stdout capture, I've just demonstrated how this proposal can allow for @CAD97's style of multiprocess stdout capturing:
https://github.com/djrenren/rust-test-frameworks/blob/master/multiprocess_runner/src/lib.rs

This code compiles and runs under my PR: #53410

The output of cargo test is available in the same directory and is called output.txt

@CAD97

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2018

And just for clarity, the example does spawn a process for each test, which is suboptimal on OSes where process creation is slow, e.g. Windows. A real runner would implement a scheme like I described earlier to avoid spawning more processes than necessary. (And actually parallelize the running of tests.)

@japaric

This comment has been minimized.

Copy link
Member

commented Sep 1, 2018

Whoops, I'm (very) late to the party.

I have read the blog post but not looked at the rustc implementation and I'm concerned with the presence of impl Termination and String (see Testable.name) in the proposal.

How will this proposal work with the version of #![no_std] binaries that's being stabilized for the edition release? Those #![no_std] binaries use #![no_main] to avoid the unstable start (i.e. fn main() interface) and termination (i.e. Termination trait) lang items. Without start if rustc --test injects a main function into the crate it will end up being discarded as these binaries use a custom entry point. Without termination the Termination trait doesn't exist (it is defined in std after all).

Also it would be best if using a custom test runner didn't require an allocator as that requires quite a bit of boilerplate (e.g. #[alloc_error_handler]) and uses up precious Flash memory.

Of course I'm not going to block any PR since this is in eRFC state and the idea is to experiment;
i just want to make sure that the embedded use case is not forgotten and that what we end up stabilizing does support it.

cc @Manishearth I thought cargo fuzz used #![no_main] and that it had similar needs as the embedded use case, but perhaps the fuzzed programs do link to std so they are not that similar?.

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2018

@japaric : support for the embedded use case has been a goal for me from day one, though I will confess I'm not intimately familiar with it.

The Testable trait in the proposal is defined by the testing framework, not the language, so it can definitely be designed without the need for allocation.

Test runner's definitely don't require dynamic allocation now because the slice that's passed is static.

As for the #![no_main] problem I'm not very familiar with it so I'll have to review a bit, but I see the issue... Might have to revise a bit so the test runner can be called from a generated start instead of main.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Sep 2, 2018

Yeah, cargo-fuzz uses no_main but I thought we had dropped the use case from the original eRFC? (or had considered it low priority).

cargo-fuzz can be changed to be non-no_main with some libfuzzer edits.

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

Hey all,

As I'm wrapping up my internship and will have less time to work on this, I figured I'd leave some notes for the community on the state of things merged so far and going forward. (I'll still be around, but a full-time job will definitely slow my contributions).

Current Status:

  • #[test_case] and #![test_runner] are merged
  • No cargo integration currently (though they work just fine with cargo test)

Work to be done:

  • Migrate test::test_main_static to accepting a trait instead of a struct so it can be compatible with other test data formats beyond TestDescAndFn
  • Work with various test crate authors to evaluate the ergonomics of the system. (criterion or wasm-bindgen is probably a good place to start).
  • Experiment with abstractions on top of these tools. It may turn out that the primitives are too primitive

Having presented this once at Mozilla and at a meetup, I've gotten a bit of feedback that's worth mentioning. There's concern about only having a single test runner for the crate. @SergioBenitez recommended having #[test_case] be parameterized with what kind of test it's annotating, and building separate binaries for different kinds of tests.

I think this adds a reasonably heavy requirement on configuration and build-tools so I'm not sure I agree but the point is well-taken that cooperation between tests and runners could be tenuous under the current circumstances. I believe that having a default Testable trait accessible in the compiler could act as defacto standard for compatibility.

I'm sure he could say more about it and offer a stronger argument, but either way this is something we should sort out. Luckily these things could be implemented on top of the current primitives so experimentation should be doable.

@SergioBenitez

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2018

Happy to expand on that, @djrenren.

The Problem

The current implementation makes it impossible, without workarounds or test runner intervention, to write tests that are intended to be run by different runners in a single crate. For example, consider a crate that would like to use both quickcheck and criterion:

#[quickcheck]
fn identity(a: i32) -> bool { ... }

#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }

Under the current plan, this would expand to:

#[test_case]
fn identity(a: i32) -> bool { ... }

#[test_case]
fn benchmark(c: &mut Criterion) { ... }

To run these under the current proposal, a user would need to provide a single runner that can execute both of these test cases. Of course, that's clearly not the intention here; one runner (quickcheck) is responsible for identity, while a different runner (criterion) is responsible for benchmark. Alas, there is no way to express this.

Solutions

I see two potential solutions to this problem: one puts the onus on the test runner and cargo while the other on the std. lib and cargo. The first requires test runners to emit a cfg when they emit a test_case to ensure that annotated items are only compiled when that runner is requested. With this approach, criterion and quickcheck would be modified so that the original code would instead expand to something like:

#[test_case]
#[cfg(target_runner = quickcheck::run)]
fn identity(a: i32) -> bool { ... }

#[test_case]
#[cfg(target_runner = criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }

When a runner foo::bar is selected, cargo would set the target_runner = foo::bar flag automatically.

The second solution is almost exactly like the first but moves the responsibility for emitting the cfg attribute from the test runner crate to the standard-library. Under this approach, the test_case attribute takes a single parameter: the path to the runner intended to be used for the given test case. With this approach, criterion and quickcheck would expand the original code to something like:

#[test_case(quickcheck::run)]
fn identity(a: i32) -> bool { ... }

#[test_case(criterion::bench)]
fn benchmark(c: &mut Criterion) { ... }

Then, a test_case(runner) is only compiled when runner is the selected runner; how this happens is an implementation detail, but I presume it would work similar to the cfg approach above.

Summary

I believe this second approach is the better approach because it is less error prone, more succinct, and hides implementation details. With the first approach, all crates must follow the appropriate convention or risk breaking consumer crate: if even a single crate does not cfg a test_case, it again becomes impossible to write test cases for multiple runners in a single crate. What's more, the first approach results in adding boilerplate to every test runner crate and requires cargo to recognize a new cfg that must be stabilized.

I think this adds a reasonably heavy requirement on configuration and build-tools...

I disagree. If cargo is going to be aware of test runners anyway, having it emit a cfg that recognizes which runner was selected is a trivial addition.

Addendum

Why is the user restricted to selecting a single test runner? It would be nice to compile one binary that executes test cases from multiple runners.

@CAD97

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2018

I like the idea of #[test_case(quickcheck::run)]. Just to strawman out an idea that would allow for running multiple test runners in one go:

fn main() { // entry generated by `cargo test`
    let libtest_tests = &[ /* collected tests */ ];
    let quickcheck_tests = &[ /* collected tests */ ];
    let criterion_tests = &[ /* collected tests */ ];

    ::test::run_tests(libtest_tests);
    ::quickcheck::run(quickcheck_tests);
    ::criterion::bench(criterion_tests);
}

cargo-test would allow running specific tests filtering as normal, but also allow filtering by runner if you, say, only wanted to run the criterion benchmarks, or wanted to exclude them from your normal test. (For that matter, maybe the same treatment should be given to cargo-bench, add a #[bench_case] attribute, so that benchmark crates don't have to pretend to be test runners.)

@djrenren

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

@SergioBenitez - My rebuttal:

I think your view of test runners is different from what I envisioned and that's causing some confusion but firstly, I'd like to point out that the multiple test runner case works under the current implementation and looks roughly like so:

#![cfg_attr(quickcheck, test_runner(quickcheck::runner))]
#![cfg_attr(criterion, test_runner(criterion::runner))]


#[cfg(quickcheck)]
mod quickcheck_tests {
    #[quickcheck]
    fn identity(a: i32) -> bool { ... }
}

#[cfg(criterion)]
mod criterion_benches {
    #[criterion_bench]
    fn benchmark(c: &mut Criterion) { ... }
}

This misrepresents the relationship between a test and its runner though. In general, tests should know how to run themselves. In fact, this is how #[test] works today.

#[test]
fn foo(){}

expands to:

fn foo(){}

#[test_case]
const foo_gensym: TestDescAndFn = TestDescAndFn {
    desc: TestDesc {
        name: TestName::Static("foo"),
        ignore: false,
        should_panic: ShouldPanic::No
    },
    testfn: TestFn::StaticTestFn(|| {
         assert_test_result(foo())
    }),
}

Where testfn captures how to run the test. It enforces the interface that tests panic if they return a non-zero termination.

Test runners are concerned with structural information about the test. Anything needed for reporting, filtering, or dispatching. This is why quickcheck can reduce to a #[test] function with a simple () -> () interface (and transitively into a TestDescAndFn.

It's reasonable for a minimal test interface to look like:

trait Testable {
     // panic on failure
     fn run(&self);
     fn name(&self) -> String;
     fn is_bench(&self) -> bool;
}

Then your example:

#[quickcheck]
fn identity(a: i32) -> bool { ... }

#[criterion_bench]
fn benchmark(c: &mut Criterion) { ... }

Would expand to:

fn identity(a: i32) -> bool { ... }

#[test_case]
const identity_test: SimpleTest = SimpleTest {
   test_fn: || { quickcheck_run(identity) },
   test_name: "identity",
   is_bench: false
}

fn benchmark(c: &mut Criterion) { ... }

#[test_case]
const benchmark_test: CriterionTest = CriterionTest {
   bench_fn: benchmark,
   test_name: "benchmark"
}

And there would be respective impls for the rest runner:

impl Testable for SimpleTest {
   fn run(&self) {
      self.test_fn()
   }

  fn name(&self) -> String {
     self.name.to_string()
  }

  fn is_bench(&self) -> bool {
     self.is_bench
  }
}

impl Testable for CriterionTest {
  fn run(&self) {
    self.bench_fn(criterion::global_instance())
  }

  fn name(&self) -> String {
    self.name.to_string()
  }

  fn is_bench() -> bool {
    true
  }

Explictly: there should not be a 1:1 relationship between test runners and test declaration styles

This allows the test runner to manage the execution of tests that were not created with it in mind.

Under the proposed addition of #[test_case(some_runner)] then the test declaration-style author (for example, quickcheck) would make the call of the test runner.

Why is the user restricted to selecting a single test runner? It would be nice to compile one binary that executes test cases from multiple runners.

A test runner is essentially a main function. It doesn't make sense to have two for the same reasons that it doesn't make sense to have two main functions. Because a test runner defines the command line interface for a test binary, having multiple would result in contention on this interfaces.

For example, @CAD97's suggestion is actually what I implemented at first. The issue here is that if ::test::run_tests and ::quickcheck::run allow for filtering by test name. What happens when I run:

$ ./test_bin mod_a::mod_b::test_name

You'd need a more complex mediator that did something like:

$ ./test_bin quickcheck mod_a::mod_b::test_name

If we allow multiple test runner then this mediation would have to be provided by the compiler, which puts us back into the realm of being opinionated about how test binaries should work. Such a setup could be implemented in user-space with the current implementation.

@SergioBenitez

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

In general, tests should know how to run themselves. In fact, this is how #[test] works today.

I'm not sure what you could mean by "tests should know how to run themselves". A #[test] clearly doesn't know how to run itself: it requires the testing library to parse things like should_panic, to execute the function, and to catch panics. A #[quickcheck] test also clearly doesn't know how to run itself: it requires quickcheck to run the function against a myriad of inputs and then prune to find a minimal failure, which again, are foreign to the test itself.

Explictly: there should not be a 1:1 relationship between test runners and test declaration styles

Sure, but I don't see how my proposal enjoins this.

Under the proposed addition of #[test_case(some_runner)] then the test declaration-style author (for example, quickcheck) would make the call of the test runner.

I think this is the desire in almost every case. I understand the principal behind this plug-and-play, swappable runner, trait-based testing, but I don't see it getting use in practice. To think about what such use would mean: one crate would propose a testing interface, and some other crate would say "Oh, sure, I can run that!". It won't happen serendipitously, since the interface will become part of the public API, so one crate is making a decision to allow other runners to run "its" tests. If that's the case, then that crate can simply take in a runner in its own attribute:

#[quickcheck(some::runner)]

A test runner is essentially a main function. It doesn't make sense to have two for the same reasons that it doesn't make sense to have two main functions.

This is only true because you're defining it that way.

Because a test runner defines the command line interface for a test binary, having multiple would result in contention on this interfaces.

I think this is an interesting design point: should a test runner be able to define the CLI for the binary? I can see arguments both ways. If not, then this is a non-issue, but if so, I wouldn't be opposed to passing test-runner specific arguments, as you suggest, with something like:

$ ./test_bin --quickcheck quickcheck args here --criterion some other args

This would mean that either the binary can't accept any -- args, or the top-level CLI parser is clever enough to group things appropriately.

Overall, I think your design optimizes for the wrong thing while making a common thing very difficult and tedious to accomplish. I don't see a compelling argument for swappable runners, and making it cumbersome to have tests for multiple runners in a single crate, a situation I postulate will be very common (something my code does now, for instance), is unfortunate.

@jonhoo

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

I'd also like to push back a little bit here @djrenren: it's not clear to me that a "clean" division of "test runners are just main functions that deal with CLI + output" and "tests that know how to run themselves" is sufficient. It could be, but it'd be nice to see some evidence for it (similarly, @SergioBenitez, can you think of any examples where that separation is not sufficient?). There are definitely cases where a test runner should have more knowledge about the exact tests, such as "run at most this many tests in parallel within this test suite", or "this suite of tests share the following set-up, which should only be run once". It could be that these can all be dealt with using shared state among the generated tests + mutexes, but some discussion around it would be good :)

@shepmaster

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

tests for multiple runners in a single crate, a situation I postulate will be very common

Could you share some other places where this is common?

For example, my experience has not included any language where a project has multiple test runners in a single "place". The closest I can think of is multi-language projects. For example, the Rust Playground has Rust unit tests, JS unit tests, and Ruby integration tests, each with their own test runners. Thus I postulate it is not common 😇

Even in a Rust-only project, I think I'd prefer to place benchmarks in a completely separate file (a.k.a. a separate crate) from unit tests. I never want to run tests and benchmarks together and the information reported for each is drastically different.

I understand the principal behind this plug-and-play, swappable runner, trait-based testing, but I don't see it getting use in practice

I agree that it seems unlikely that there will be One True Trait for defining a test. I'd expect there to be multiple test runners, each with their own highly-specific trait.

What I would expect is to see adapters for tools like proptest to connect to popular test runners, leveraging each runners strengths. Thus, the crate proptest-stainless would provide the macro support needed to define and connect a proptest-based test with the stainless test runner. Ideally, these macros would have high overlap with other adapters like proptest-builtin and proptest-some-new-cool-runner, but they would also allow tying into runner-specific functionality (e.g. the test output can show the number of runs or the failing inputs in a better way).

It's possible that over time some basic commonality can be discovered (perhaps something along the lines of TAP?), but there will always be something special a test runner will want to provide.

@SergioBenitez

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

@jonhoo I tried to give a couple of example indicative of this in my previous comment, but perhaps we have different notions of what "separation" means in this case. Could you perhaps expand on the question?

@shepmaster In Rocket, I'd love to have the following in a single crate:

  • CLI Tests
  • HTTP Client/Server Integration Tests
  • Quickcheck Type Tests
  • Fuzzers, with a time limit
  • Regular #[test] tests

I'm totally okay with benchmarks being in another crate. The first four are currently regular #[test] tests, since custom test frameworks don't exist, but I can see a nice interface for all four given this new functionality. It'd be odd to split them up into four (or five, if #[test] uses #[test_case]) crates, and having to manually specify a cfg to run the particular tests means not only more code, but also that with high probability, users won't run them before they submit a PR.

I agree that it seems unlikely that there will be One True Trait for defining a test. I'd expect there to be multiple test runners, each with their own highly-specific trait.

Note that my proposal continues to make this possible, albeit by making it explicit in the code.

@mathstuf

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Apologies if this is the wrong place for this, but it seems to be the furthest along the road I've traveled in search of skipping tests by detecting support for what is being tested at runtime. Basically something like #[ignore(lacks_support)] where fn lacks_support() -> bool (or maybe -> Result<(), String> to give a reason support is missing) is a function within the relevant scope. My use case is that I'm testing features tightly tied to the kernel that is being run (this is for the keyutils crate) and I'd like to have the tests indicate that they are not actually giving useful information rather than just bailing out and saying "yep, totally works".

This was mentioned in https://internals.rust-lang.org/t/pre-rfc-make-test-and-bench-more-flexible/1328, but that discussion has died down.

So, is this something that is basically under this RFC impl (as something like an additional method on the trait for whether the test should be ignored or not) or is it something that should go into a new RFC? Thanks.

@shepmaster

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

@mathstuf I think Is there a simple way to conditionally enable or ignore entire test suites in Rust? and Is it possible to conditionally enable an attribute like derive? cover your needs if it's statically determinable, like a feature flag.

You can run a build script to test your system and set those feature flags, as well.

@mathstuf

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Using a build.rs to determine it is probably sufficient for now (though the Rust bindings for performing these checks is what is being built…so that seems awkward…unless tests can have their own build.rs?), but I think other things also fit under this umbrella. This also has precedent in other test frameworks (e.g., pytest has support for skipping based on runtime checks).

@mathstuf

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

Actually, build.rs would completely break for cross-compiling because what matters is the target machine's feature set, not the host machine's, so runtime-determined skipping still seems very relevant to me in the wider ecosystem.

@sdbondi sdbondi referenced this issue Jul 16, 2019

Merged

Remove logging from tests #520

3 of 11 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.