Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests concurrently #177

Open
aantron opened this issue Aug 13, 2019 · 12 comments
Open

Run tests concurrently #177

aantron opened this issue Aug 13, 2019 · 12 comments

Comments

@aantron
Copy link
Contributor

aantron commented Aug 13, 2019

This is a follow-on to #167 (comment).

When using Alcotest with a concurrency monad, it would be extremely valuable to allow the tests to run concurrently. In particular, this would be a major win for the testers of both Lwt and Luv, saving lots of waiting time for developers, and some time in CI.

I think this code

alcotest/src/alcotest.ml

Lines 49 to 56 in cbdedf5

let map_s f l =
let rec inner acc = function
| [] -> return (List.rev acc)
| hd :: tl ->
f hd >>= fun r ->
(inner [@ocaml.tailcall]) (r :: acc) tl
in
inner [] l

can be adjusted to start tests without waiting for preceding tests to finish. It will be a bit tricky, because test results should be displayed in order, and ideally, as soon as they are available. When the monad is just ordinary evaluation, this "scheduler" should degenerate to normal, serialized test execution, with output displayed after each test.

The Alcotest user can take care of preventing tests that interfere with each from running concurrently using the mechanisms typically provided by concurrency monads. However, thinking about Luv, where the monad will be just plain CPS, and that Luv lacks its own concurrent locking mechanisms, I suggest adding some simple mechanism for mutual exclusion to Alcotest. I think the cleanest way to do this might be to expose additional helpers in Alcotest, i.e. something to "bind" on in test function bodies.

If running all tests concurrently is a scary default, I suggest to control this with an optional argument to run.

@craigfe
Copy link
Member

craigfe commented Aug 19, 2019

This seems like a cool feature. I'm interested to know what sort of time this would save for a typical Lwt/Luv test use-case. And yes, as you say, this would likely be non-trivial change due to the need to display tests results in order. I think it probably requires an initial refactor to try to decouple the testing core from the log capturing / console printing frontend.

Regarding the implementation, I don't see how it could be done by just modifying map_s; won't backends need to supply something like a join operation that works with their monad in order to implement a map_p?

I'm somewhat cautious of requiring tests to manage their own concurrency problems; this seems likely to get messy when selectively executing tests. What about supplying combinators to allow the user to specify a dependency DAG instead? This could then be executed by just mapping the DAG to bind/join operators.

@aantron
Copy link
Contributor Author

aantron commented Aug 19, 2019

The Lwt Unix tests take 45 seconds on my system at the moment. It should be possible to get this down to around 5-10 seconds. More importantly, only a small part of the Unix test suite is currently done. Doing all of ocsigen/lwt#539 could increase the testing time to maybe 5 minutes with the current sequential tester, but with concurrent testing, it should remain roughly constant at 5-10 seconds - except, of course, if we bottleneck the main thread with CPU processing.

A basic join-like operation can be implemented using two binds:

let p_1 = ... in
let p_2 = ... in

p_1 >>= fun () ->
p_2 >>= fun () ->

(* Poor man's join accomplished *)

For comparison, Lwt's map_p doesn't use join.

Alcotest will need something like this anyway, because it probably will want more control than any default "real" join provides.

I'm somewhat cautious of requiring tests to manage their own concurrency problems; this seems likely to get messy when selectively executing tests. What about supplying combinators to allow the user to specify a dependency DAG instead? This could then be executed by just mapping the DAG to bind/join operators.

Having "locks" that test cases can bind on would interact gracefully with selective execution. Selective execution would just mean less test case functions ever binding on ech lock. What would be the problem in this case?

I am thinking of something like (very roughly) Alcotest.lock : unit -> (unit -> unit M.t)

For each set of tests that can interfere with each other, the test suite author calls Alcotest.lock () once to create one lock: let directory_lock = Alcotest.lock ().

Each test from that set begins with directory_lock () >>=.

If fewer tests run, it just means less contention on the lock.

If the test author wants to use another concurrency mechanism from their monad, they can just do so.

This scheme has the advantage of keeping the Alcotest API simple. It seems to make locking orthogonal somehow to the rest of Alcotest.

@aantron
Copy link
Contributor Author

aantron commented Aug 19, 2019

My apology, it would have to be something like Alcotest.lock : unit -> (unit -> unit M.t) -> M.t, so the usage would be directory_lock @@ fun () -> the_test_case, after creating it with let directory_lock = Alcotest.lock ().

@craigfe
Copy link
Member

craigfe commented Aug 19, 2019

I'm still not clear on how the poor man's join exploits any parallelism, what with the bind being there, but perhaps this is not the place for you to teach me about concurrency monads 🙂

You make a good argument for the use of locks. Do you see it being the user's responsibility to drop the lock?

Personally, if we were to go that route I prefer a more general notion of a 'context' type that can be registered at the Alcotest.test_case level and consumed the corresponding test-cases; this would avoid some nasty patterns in the Irmin unit tests where the test context must be propped up before running Alcotest. Perhaps it also solves the issue of requiring the user to explicitly hold and drop locks in their tests?

@aantron
Copy link
Contributor Author

aantron commented Aug 19, 2019

In the pseudcode,

let p_1 = ... in
let p_2 = ... in

p_1 >>= fun () ->
p_2 >>= fun () ->

the ... code that creates p_1, and the one that creates p_2, is what spawns the concurrent operation. The bind doesn't trigger any operation (they are already running), it just collects results. I suggest trying it in Lwt, something like

open Lwt.Infix

let p_1 = Lwt_unix.sleep 1. >|= fun () -> print_endline "Hello" in
let p_2 = Lwt_unix.sleep 2. >|= fun () -> print_endline "world!" in

Lwt_main.run (p_1 >>= fun () -> p2)

I think the same is true of CPS. I think in directly-written CPS code, it's often not true, due to how the functions are written, but it is typically true if the CPS code was written using return. I actually better try this before continuing to recommend it :)

You make a good argument for the use of locks. Do you see it being the user's responsibility to drop the lock?

With the Alcotest.lock : unit -> (unit -> unit M.t) -> unit M.t signature, the lock lock returned by Alcotest.lock (), of type (unit -> unit M.t) -> unit M.t would behave life this: lock f attempts to take the lock. When successful, it runs f (). When that computation (of f's result type unit M.t) completes, it releases the lock. So, releasing it would be built into the API.

I guess the signature should be unit -> (unit -> 'a M.t) -> 'a M.t, now.

I'm not sure what the issue with Irmin is. It could be that it requires explicit contexts. But for the use cases I see, locks already have minimal overhead for the user. Either way, the user has to manually specify some kind of mutual exclusion class. With the signature (unit -> 'a M.t) -> 'a M.t, if you had a test case body that looks like this:

fun () ->
  some_code;
  some_more_code

making it take the lock just requires adding one isolated line (so a very compact diff):

fun () ->
  directory_lock @@ fun () ->
  some_code;
  some_more_code

directory_lock is a function that automatically takes the underlying lock, calls its argument function representing the actual test case, and releases the lock after that computation completes. With this kind of usage, this can already be considered to be a sort of "lock combinator."

Of course, let directory_lock = Alcotest.lock () has to be called above this.

One way to avoid having to define directory_lock is to have a signature Alcotest.lock : string -> (unit -> 'a M.t) -> 'a M.t, used like this:

fun () ->
  Alcotest.lock "directory" @@ fun () ->
  some_code;
  some_more_code

where Alcotest.lock now creates a lock corresponding to the string directory on demand. But I think this is a bit more dangerous, because the string could be misspelled in some of the test cases.

@craigfe
Copy link
Member

craigfe commented Aug 19, 2019

I see, thanks 🙂 I suppose this speaks to the fact that Lwt monads really are more like 'promises' than I'd previously thought.

I like the closure-oriented lock, and think having to define it explicitly as a resource is a good thing. It even allows for something like:

Lwt_main.run @@ run "LwtUtils"
       [ ( "basic",
           [ test_case "Plain" `Quick (directory_lock test_unix);
             test_case "Lwt" `Quick test_lowercase
           ] )
... ]

which I like because it keeps test description separate from the test implementation. Perhaps there is still a solution that can also accommodate more heavyweight test contexts, but that would need me to come up with a proof of concept 😉 Thanks for the insights.

@aantron
Copy link
Contributor Author

aantron commented Aug 19, 2019

Great :)

The only other thing I'd note is that it might (might, not sure) be better to create an API like

type lock
val make_lock : unit -> lock
val lock : lock -> (unit -> 'a M.t) -> 'a M.t

i.e. make the lock object explicit. The man reason for that is to have

test_case "Plain" `Quick (Alcotest.lock directory_lock test_unix);

which might be clearer for casual readers, due to the explicit Alcotest.lock call.

But I don't know what's better here.

@aantron
Copy link
Contributor Author

aantron commented Aug 27, 2019

For reference, I just parallelized Bisect_ppx's tester, which uses OUnit (aantron/bisect_ppx@2784eef). This changed its running time from about 13 seconds to 3 seconds (on my machine).

Bisect_ppx's tests are CPU-intensive, because they call the OCaml compiler, so useful parallelism is limited by the number of processors. By comparison, a concurrent tester that has artificial waiting inserted is not bottlenecked by processors, and might see a greater speedup.

This does introduce a potential complication, however. It might be useful to also run tests in multiple threads and/or in multiple processes. Using processes would make mutual exclusion of tests more complicated, but it should be still relatively easy with threads.

@aantron
Copy link
Contributor Author

aantron commented Aug 27, 2019

I guess the issue with using threads is that concurrency libraries usually assume they are running in the main thread.

For the fastest testing, the tester would run a number of threads or processors about equal to the number of CPU cores, and the testing process in each one would allow as much concurrency as the concurrency monad allows.

@aantron
Copy link
Contributor Author

aantron commented Feb 1, 2020

Quoting the relevant comment from ocsigen/lwt#712 here. ocsigen/lwt#712 restructured Lwt's custom tester to run tests concurrently instead of sequentially. The result:

This decreases testing time for Lwt_unix from about 47 seconds to about 6 seconds, and the overall testing time is reduced from 53 seconds to 12 seconds (all measurements on my system).

More importantly, the testing time becomes roughly constant in the number of tests, except for the small subset of tests that must be run sequentially.

Without ocsigen/lwt#712, it was not practical to test Lwt's entire API for developers while working locally, since the 47 second running time was only for a small portion of the API, and already difficult to deal with. With the change, the time and its growth rate are greatly decreased.

@aantron
Copy link
Contributor Author

aantron commented Feb 1, 2020

This also changes the nature of the tests one is comfortable writing. I/O tests often introduce artificial delays. When tests are run sequentially, these tests end up wasting developer time when running, and a developer is right to be concerned about writing hundreds of tests that each have a one-second delay.

Making the tests run concurrently changes the total time function from sum to max for such tests, and it's no longer a problem to have artificial delays.

@cemerick
Copy link

I wish I could thumb this issue up x10. I've been using ounit to ~parallelize tests, but I suspect its "sharding" approach as the underlying cause of very annoying transient test timeouts (meanwhile, the code under test uses lwt heavily in production without a blip). I'd love to port these suites to alcotest and get the same kind of concurrency.

(FWIW: with the default 8 shards, my largest suite runs in ~a minute, but takes 4-5 with one shard, so abandoning concurrency entirely would be very painful.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants