add --sort-first and --sort-last to mix test #13589

SteffenDE · 2024-05-22T17:01:05Z

This is based on an idea @josevalim had when debugging Phoenix tests. It is not working quite as intended though (yet?).

The goal is to provide an option to sort specific modules to be compiled first / last (and I extended this to also specify the order of tests based on the ExUnit filter syntax).

The problem as far as I can tell is that the test files are still compiled in parallel, so changing the order of the files passed to Kernel.ParallelCompiler.require does not guarantee that the modules are indeed compiled first or last.

Examples on how to use:

mix test --sort-first file:test/phoenix/router/resource_test.exs --sort-last mytag:true

This would ideally compile the given file first and run all tests matching the mytag:true filter after all the other tests in the file.

josevalim · 2024-05-22T20:20:22Z

What if we have --sort-first and --sort-last accept files, and we use Code.require_file to explicitly those files and return the test modules, which we then accordingly schedule first and last? I think the issue is that async ones will run immediately but we probably have ways to work around that too?

SteffenDE · 2024-05-22T21:34:48Z

Code.require_file does the trick with compilation order (I added a TODO about parallel_require_callbacks). I'll need to think a little bit more about the order of test runs. I guess this would need some bigger changes to the ExUnit.Server.

josevalim · 2024-05-22T21:48:18Z

Perhaps we don't need to do anything? The ones loaded first will run first, the ones loaded last will run last. Perhaps the only challenge is that async tests always run before the sync ones, but we can add that as a footnote? We only guarantee execution order within those groups?

SteffenDE · 2024-05-22T21:52:12Z

It works for sync tests when we Enum.reverse the modules in take_sync_modules (as we prepend them, so the last registered sync module is actually first currently).

I think the problem for async tests is similar: while we might compile the async test last, it is still prepended to the list of async modules in ExUnit.Server. So if there are other async tests waiting, it will run before them.

josevalim · 2024-05-22T22:04:41Z

It works for sync tests when we Enum.reverse the modules in take_sync_modules (as we prepend them, so the last registered sync module is actually first currently).

Sounds good to me!

I think the problem for async tests is similar: while we might compile the async test last, it is still prepended to the list of async modules in ExUnit.Server. So if there are other async tests waiting, it will run before them.

Perhaps we should make it use :queue?

SteffenDE · 2024-05-22T22:15:36Z

Good idea with :queue. In the meantime I added some code to append only in case we know the next modules required are going to be ones that should be appended. I'll try a queue next.

In the current state, testing in the phoenix repo

mix test --sort-first file:test/phoenix/router/resource_test.exs --trace --sort-first file:test/phoenix/integration/websocket_channels_test.exs 
--sort-last file:test/phoenix/token_test.exs --sort-last file:test/mix/tasks/phx.gen.notifier_test.exs

correctly runs the async resource_test first, the async token_test as the last async test right before the sync websocket_channels_test and the sync phx.gen.notifier_test right at the end.

SteffenDE · 2024-05-22T22:28:45Z

In the case of phoenixframework/phoenix@2c70068 even --sort-first does not help in every run as apparently other async tests are able to require the file just in time. In that case the initial Code.require_file correctly emits a warning, but the test then still passes.

sabiwara · 2024-05-22T22:47:53Z

Just wondering (apologies if it is a silly question), could something like this be achieved with consecutive calls using different tag selections?

mix test --only first
mix test --except first --except last
mix test --only last

I suppose it has the same downside regarding parallel compilation though.

josevalim · 2024-05-23T07:41:27Z

@sabiwara the goal is to find dependency between tests, so separate runs won't help us. :(

josevalim · 2024-05-23T07:42:43Z

Good idea with :queue. In the meantime I added some code to append only in case we know the next modules required are going to be ones that should be appended. I'll try a queue next.

If you don't want to use a :queue, another idea is to have a "prepend/append" mode inside the ExUnit server and swap accordingly, but :queue seems to be the semantically correct operation anyway.

SteffenDE · 2024-05-23T13:56:11Z

lib/ex_unit/lib/ex_unit/server.ex

+        async_modules = :queue.to_list(state.async_modules) |> Enum.uniq() |> :queue.from_list()
+        sync_modules = :queue.to_list(state.sync_modules) |> Enum.uniq() |> :queue.from_list()


Not really pretty

btw @josevalim speaking about reducing complexity, I've seen that you changed the server to append tests instead of prepending them in b665ddd, so the changes in ExUnit.Server are not necessary any more. The :queue may be a little bit faster, so keep it or not?

Did I? I think the code is still prepending, since &1 (the value in state) is on the right side?

well never mind, you're right of course

SteffenDE · 2024-05-23T14:09:04Z

Maybe it would be more helpful to have the option to enforce a deterministic compilation order for tests? At least as for CI, where the speed difference might not matter for smaller projects? I feel like this could make reproducing such problems much easier, even without --sort-first/last.

josevalim · 2024-05-23T15:02:22Z

@SteffenDE we could make it so --trace uses Enum.map+Code.require_file but i wouldn't do it by default because it will definitely slow down test suite runs?

And if we use the seed to Enum.shuffle the files before requiring them, perhaps we don't need these options? Another option is to introduce a specific --trace-require that traces and logs each file loaded serially (i.e. the same as above but not attached to --trace).

It depends on what you feel would be easier to expose those Phoenix bugs. Thoughts?

josevalim · 2024-06-02T13:08:46Z

lib/mix/lib/mix/compilers/test.ex

+    # require one file at a time
+    failed =
+      for file <- test_files do
+        do_require(false, [file], parallel_require_callbacks, warnings_as_errors?)


I think we can use Code.require_file instead... but I wonder if that largely changes the semantics of the code?

I changed it to still use the ParallelCompiler because of the parallel_require_callbacks and the warnings_as_errors? option (134eace), otherwise --stale or --warnings-as-errors won't work if combined with --sort-first/last or --trace-require.

josevalim · 2024-06-02T13:09:17Z

lib/ex_unit/lib/ex_unit/runner.ex

@@ -278,13 +280,42 @@ defmodule ExUnit.Runner do
          end
      end

+    to_run = sort_first_last(config, to_run)


I don't think these changes are necessary? We only sort files, not actual tests?

If we only want to support sorting files, then yes. But as I am using the existing ExUnit filters, I saw the opportunity to also support --sort-first=mytag to sort tests themselves.

I see. Let's start with either tags or files first. I think files will reduce complexity, so that gets my vote. :)

Hmm so it would only remove the changes in ExUnit.Runner, but then I'd probably need to validate that the parameters given to --sort-first/last are only using file: filters, or would you recommend not using the ExUnit filter syntax and directly pass files:

Instead of:

mix test --sort-first file:path/to/test.exs --sort-first file:path/to/other/test.exs

just

mix test --sort-first path/to/test.exs --sort-first path/to/other/test.exs

?

Maybe test complexity would be reduced, I'll need to look at those.

Exactly, just:

mix test --sort-first path/to/test.exs --sort-first path/to/other/test.exs

SteffenDE · 2024-06-02T16:10:27Z

I've been thinking about another flag that would make debugging flaky tests caused by global state easier: --force-sync. This could force all tests to be added as sync, even if they are marked as async.

Combining this with --trace-require and a --seed could make such tests pretty much reproducible. I'm not so sure about the benefits of --sort-first/last, but those could still be added as advanced features.

So maybe it makes sense to split this:

Add --trace-require to force a deterministic compilation order
Add --force-sync to force a deterministic run order
Optionally add --sort-first/last to influence compilation and/or run order
Optionally change ExUnit.Server to use a :queue instead of appending to a list

Properly naming the flags can be further discussed. Maybe --sync-require would be better (sync vs parallel require).

josevalim · 2024-06-02T18:07:59Z

--force-sync. This could force all tests to be added as sync, even if they are marked as async.

Wouldn't it be the same as --trace or setting --max-cases 1?

Add --trace-require to force a deterministic compilation order

It seems this is definitely a minimum requirement. I am afraid, however, this is not enough for reproducing a workflow. For example, imagine CI fails. --trace-require with a given seed would make it deterministic but not necessarily run in the order that made CI fail. So you may still want to have --sort-first and --sort-last?

Maybe, however, the flag we need to add is --deterministic. This sets max_cases: 1 and disables parallel requires? And steps 3 and 4?

SteffenDE · 2024-06-02T20:35:40Z

Wouldn't it be the same as --trace or setting --max-cases 1?

Not exactly the same, as even with --max-cases 1 async tests would run before sync tests. But I guess you're right that --max-cases 1 should be enough together with sequential compilation. So yeah no need for --force-sync.

For example, imagine CI fails. --trace-require with a given seed would make it deterministic but not necessarily run in the order that made CI fail.

Exactly. It only helps for runs with the same settings. But it's still a win, as you can then try running locally until you can hopefully reproduce it with --deterministic and then say to a colleague "just run mix test --seed 1234 --deterministic to reproduce". Or, if the slowdown is acceptable also set it on CI, at least until the cause for the flaky tests is found and fixed :)

So I'd propose the following:

Change ExUnit.Server to use a :queue for FIFO ordering
Add --sync-require / --trace-require / --sequential-require which only enforces sequential require
Add --deterministic which sets the prior and --max-cases 1 (can be the same PR as 2)
Think more about --sort-first/last.

After all this I'm not fully convinced that --sort-first/last is truly that helpful. Imagine the case with Phoenix where mix compile in a mix task test messed up the code paths. We would not detect this with --sort-first as the paths are still good. And in my testing we also don't run into it if we run the affected test last, as there are other tests in between that seem to fix it again.
So I'm not sure if it's worth the complexity. Running the test suite with --deterministic multiple times until it (hopefully) fails might be just enough.

josevalim · 2024-06-02T20:44:08Z

I think we only need to add --deterministic, which sets sequential requires AND --max-cases 1, without adding a flag for sequential require. If you want sequential requires and another max cases, you can always set max cases explicitly.

Change ExUnit.Server to use a :queue for FIFO ordering

Do we even need the queue for FIFO ordering? If they are loaded in order, it is deterministic anyway. I think we only need FIFO ordering if we add --sort-first and --sort-last.

So overall, I think it is:

Add --deterministic which sets sequential requires and --max-cases 1 (can be the same PR as 2)
Think more about --sort-first/last (which would require FIFO ordering)

SteffenDE · 2024-06-02T20:59:41Z

Do we even need the queue for FIFO ordering? If they are loaded in order, it is deterministic anyway. I think we only need FIFO ordering if we add --sort-first and --sort-last.

I don't think so. Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. Maybe it does a database operation or something. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as compilation happens concurrently we could have:

FooTest is compiled and because it's async it is immediately started. It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest.

The order of test runs now is: FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run. While FooTest runs, more than two other tests are compiled. The order of test runs is: FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

So for deterministic order either compilation would need to wait for each test to complete, or we need a FIFO order, right?

josevalim · 2024-06-02T21:13:44Z

So for deterministic order either compilation would need to wait for each test to complete, or we need a FIFO order, right?

I see. That's why you also proposed --force-sync OR at least disable the feature that runs test modules as soon as they are loaded (these two options are quite similar but they are implemented differently).

It feels that loading tests and running tests as two distinct events, instead of concurrently, would be even better if we want to guarantee determinism?

So maybe we should add these two options:

--force-sync - force all tests to run synchronously. The difference with --max-cases 1 is that --max-cases still runs tests as they are loaded. --force-sync only runs after they are loaded.
--parallel-requires N - the concurrency for loading test files, set it to 1 for determinism (which should be easier to implement, we only need to add a max_concurrency to Kernel.ParallelCompiler?

SteffenDE · 2024-06-02T21:57:26Z

It feels that loading tests and running tests as two distinct events, instead of concurrently, would be even better if we want to guarantee determinism?

I don't know if it's "better". --force-sync (I implemented it locally) definitely slows down a mix test run considerably, if combined with sequential requires. So if --max-cases 1 and FIFO order is good enough I would prefer that for --deterministic. Having --force-sync as another option could be nice, maybe?

Phoenix:

mix test --trace: Finished in 23.1 seconds (2.2s async, 20.8s sync)
mix test --force-sync --trace: Finished in 23.7 seconds (0.00s async, 23.7s sync)
mix test --force-sync --trace --trace-require: Finished in 30.3 seconds (0.00s async, 30.3s sync)

we only need to add a max_concurrency to Kernel.ParallelCompiler?

I like that! While initially implementing --sort-first I kind of expected to see such an option available already. No matter what we're ending up with here, I think that's a good addition.

josevalim · 2024-06-02T22:06:49Z

So let's do --max-requires N (to mirror --max-cases) and FIFO. That should add more determinism and you can pair it with --max-cases too. Let's have --force-sync/--sort-first/--sort-last for after.

SteffenDE · 2024-06-02T22:09:36Z

That sounds good to me! I’ll close this one and work on new PRs. Thank you so much for the guidance :)

In order to have more deterministic test runs when using `--max-cases 1` and `--max-requires 1` (elixir-lang#13635) (see also elixir-lang#13589), we need to run tests in compilation order (FIFO). In the past, ExUnit.Server appended new tests to the front of a list, which would result in the most recently added test to be run first. Let's quickly demonstrate the problem this causes for deterministic runs with a simple example: Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as async tests execute in parallel with compilation of other test files, we could have the following scenario: FooTest is compiled and because it's async it is immediately started. It takes 1 second to run. In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest. The order of test runs now is: FooTest, BazTest, then whatever is last compiled while BazTest runs, ... Now another run, FooTest takes 5 seconds to run. While FooTest runs, more than two other tests are compiled. The order of test runs is: FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest This can be fixed either by appending new test modules to the end of the list, or - and that's what this commit does - by using a `:queue` instead.

In order to have more deterministic test runs when using `--max-cases 1` and `--max-requires 1` (#13635) (see also #13589), we need to run tests in compilation order (FIFO). In the past, ExUnit.Server appended new tests to the front of a list, which would result in the most recently added test to be run first. Let's quickly demonstrate the problem this causes for deterministic runs with a simple example: Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as async tests execute in parallel with compilation of other test files, we could have the following scenario: FooTest is compiled and because it's async it is immediately started. It takes 1 second to run. In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest. The order of test runs now is: FooTest, BazTest, then whatever is last compiled while BazTest runs, ... Now another run, FooTest takes 5 seconds to run. While FooTest runs, more than two other tests are compiled. The order of test runs is: FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest This can be fixed either by appending new test modules to the end of the list, or - and that's what this commit does - by using a `:queue` instead.

SteffenDE commented May 23, 2024

View reviewed changes

SteffenDE added 6 commits May 31, 2024 22:29

add --sort-first and --sort-last to mix test

5a24040

ensure modules are compiled in the right order

a658b5e

ensure --sort-first/last tests are run in expected order

14253f9

fix maybe_sort_first_last invalid return

82d0238

use :queue for modules in ExUnit.Server

0e1c8b0

--trace-require

134eace

SteffenDE force-pushed the sd-exunit-sort-first-last branch from 8cf6513 to 134eace Compare May 31, 2024 21:52

automatically use --trace-require when using --sort-first/last

db73012

josevalim reviewed Jun 2, 2024

View reviewed changes

SteffenDE closed this Jun 2, 2024

This was referenced Jun 4, 2024

add max_concurrency to Kernel.ParallelCompiler #13634

Merged

add --max-requires to mix test #13635

Merged

SteffenDE mentioned this pull request Jun 4, 2024

use a :queue to store modules in ExUnit.Server #13636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --sort-first and --sort-last to mix test #13589

add --sort-first and --sort-last to mix test #13589

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

SteffenDE commented May 22, 2024

sabiwara commented May 22, 2024

josevalim commented May 23, 2024

josevalim commented May 23, 2024

SteffenDE May 23, 2024

SteffenDE Jun 2, 2024

josevalim Jun 2, 2024

SteffenDE Jun 2, 2024

SteffenDE commented May 23, 2024

josevalim commented May 23, 2024

josevalim Jun 2, 2024

SteffenDE Jun 2, 2024

josevalim Jun 2, 2024

SteffenDE Jun 2, 2024

josevalim Jun 2, 2024

SteffenDE Jun 2, 2024

josevalim Jun 2, 2024

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024 •

edited

Loading

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

		async_modules = :queue.to_list(state.async_modules) \|> Enum.uniq() \|> :queue.from_list()
		sync_modules = :queue.to_list(state.sync_modules) \|> Enum.uniq() \|> :queue.from_list()

add --sort-first and --sort-last to mix test #13589

add --sort-first and --sort-last to mix test #13589

Conversation

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

josevalim commented May 22, 2024

SteffenDE commented May 22, 2024

SteffenDE commented May 22, 2024

sabiwara commented May 22, 2024

josevalim commented May 23, 2024

josevalim commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteffenDE commented May 23, 2024

josevalim commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024 • edited Loading

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

josevalim commented Jun 2, 2024

SteffenDE commented Jun 2, 2024

SteffenDE commented Jun 2, 2024 •

edited

Loading