Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --sort-first and --sort-last to mix test #13589

Closed

Conversation

SteffenDE
Copy link
Contributor

This is based on an idea @josevalim had when debugging Phoenix tests. It is not working quite as intended though (yet?).

The goal is to provide an option to sort specific modules to be compiled first / last (and I extended this to also specify the order of tests based on the ExUnit filter syntax).

The problem as far as I can tell is that the test files are still compiled in parallel, so changing the order of the files passed to Kernel.ParallelCompiler.require does not guarantee that the modules are indeed compiled first or last.

Examples on how to use:

mix test --sort-first file:test/phoenix/router/resource_test.exs --sort-last mytag:true

This would ideally compile the given file first and run all tests matching the mytag:true filter after all the other tests in the file.

@josevalim
Copy link
Member

What if we have --sort-first and --sort-last accept files, and we use Code.require_file to explicitly those files and return the test modules, which we then accordingly schedule first and last? I think the issue is that async ones will run immediately but we probably have ways to work around that too?

@SteffenDE
Copy link
Contributor Author

Code.require_file does the trick with compilation order (I added a TODO about parallel_require_callbacks). I'll need to think a little bit more about the order of test runs. I guess this would need some bigger changes to the ExUnit.Server.

@josevalim
Copy link
Member

Perhaps we don't need to do anything? The ones loaded first will run first, the ones loaded last will run last. Perhaps the only challenge is that async tests always run before the sync ones, but we can add that as a footnote? We only guarantee execution order within those groups?

@SteffenDE
Copy link
Contributor Author

It works for sync tests when we Enum.reverse the modules in take_sync_modules (as we prepend them, so the last registered sync module is actually first currently).

I think the problem for async tests is similar: while we might compile the async test last, it is still prepended to the list of async modules in ExUnit.Server. So if there are other async tests waiting, it will run before them.

@josevalim
Copy link
Member

It works for sync tests when we Enum.reverse the modules in take_sync_modules (as we prepend them, so the last registered sync module is actually first currently).

Sounds good to me!

I think the problem for async tests is similar: while we might compile the async test last, it is still prepended to the list of async modules in ExUnit.Server. So if there are other async tests waiting, it will run before them.

Perhaps we should make it use :queue?

@SteffenDE
Copy link
Contributor Author

Good idea with :queue. In the meantime I added some code to append only in case we know the next modules required are going to be ones that should be appended. I'll try a queue next.

In the current state, testing in the phoenix repo

mix test --sort-first file:test/phoenix/router/resource_test.exs --trace --sort-first file:test/phoenix/integration/websocket_channels_test.exs 
--sort-last file:test/phoenix/token_test.exs --sort-last file:test/mix/tasks/phx.gen.notifier_test.exs 

correctly runs the async resource_test first, the async token_test as the last async test right before the sync websocket_channels_test and the sync phx.gen.notifier_test right at the end.

@SteffenDE
Copy link
Contributor Author

In the case of phoenixframework/phoenix@2c70068 even --sort-first does not help in every run as apparently other async tests are able to require the file just in time. In that case the initial Code.require_file correctly emits a warning, but the test then still passes.

@sabiwara
Copy link
Contributor

Just wondering (apologies if it is a silly question), could something like this be achieved with consecutive calls using different tag selections?

mix test --only first
mix test --except first --except last
mix test --only last

I suppose it has the same downside regarding parallel compilation though.

@josevalim
Copy link
Member

@sabiwara the goal is to find dependency between tests, so separate runs won't help us. :(

@josevalim
Copy link
Member

Good idea with :queue. In the meantime I added some code to append only in case we know the next modules required are going to be ones that should be appended. I'll try a queue next.

If you don't want to use a :queue, another idea is to have a "prepend/append" mode inside the ExUnit server and swap accordingly, but :queue seems to be the semantically correct operation anyway.

Comment on lines +94 to +99
async_modules = :queue.to_list(state.async_modules) |> Enum.uniq() |> :queue.from_list()
sync_modules = :queue.to_list(state.sync_modules) |> Enum.uniq() |> :queue.from_list()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really pretty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw @josevalim speaking about reducing complexity, I've seen that you changed the server to append tests instead of prepending them in b665ddd, so the changes in ExUnit.Server are not necessary any more. The :queue may be a little bit faster, so keep it or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I? I think the code is still prepending, since &1 (the value in state) is on the right side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well never mind, you're right of course

@SteffenDE
Copy link
Contributor Author

Maybe it would be more helpful to have the option to enforce a deterministic compilation order for tests? At least as for CI, where the speed difference might not matter for smaller projects? I feel like this could make reproducing such problems much easier, even without --sort-first/last.

@josevalim
Copy link
Member

@SteffenDE we could make it so --trace uses Enum.map+Code.require_file but i wouldn't do it by default because it will definitely slow down test suite runs?

And if we use the seed to Enum.shuffle the files before requiring them, perhaps we don't need these options? Another option is to introduce a specific --trace-require that traces and logs each file loaded serially (i.e. the same as above but not attached to --trace).

It depends on what you feel would be easier to expose those Phoenix bugs. Thoughts?

# require one file at a time
failed =
for file <- test_files do
do_require(false, [file], parallel_require_callbacks, warnings_as_errors?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Code.require_file instead... but I wonder if that largely changes the semantics of the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to still use the ParallelCompiler because of the parallel_require_callbacks and the warnings_as_errors? option (134eace), otherwise --stale or --warnings-as-errors won't work if combined with --sort-first/last or --trace-require.

@@ -278,13 +280,42 @@ defmodule ExUnit.Runner do
end
end

to_run = sort_first_last(config, to_run)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes are necessary? We only sort files, not actual tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only want to support sorting files, then yes. But as I am using the existing ExUnit filters, I saw the opportunity to also support --sort-first=mytag to sort tests themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's start with either tags or files first. I think files will reduce complexity, so that gets my vote. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so it would only remove the changes in ExUnit.Runner, but then I'd probably need to validate that the parameters given to --sort-first/last are only using file: filters, or would you recommend not using the ExUnit filter syntax and directly pass files:

Instead of:

mix test --sort-first file:path/to/test.exs --sort-first file:path/to/other/test.exs

just

mix test --sort-first path/to/test.exs --sort-first path/to/other/test.exs

?

Maybe test complexity would be reduced, I'll need to look at those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, just:

mix test --sort-first path/to/test.exs --sort-first path/to/other/test.exs

@SteffenDE
Copy link
Contributor Author

I've been thinking about another flag that would make debugging flaky tests caused by global state easier: --force-sync. This could force all tests to be added as sync, even if they are marked as async.

Combining this with --trace-require and a --seed could make such tests pretty much reproducible. I'm not so sure about the benefits of --sort-first/last, but those could still be added as advanced features.

So maybe it makes sense to split this:

  1. Add --trace-require to force a deterministic compilation order
  2. Add --force-sync to force a deterministic run order
  3. Optionally add --sort-first/last to influence compilation and/or run order
  4. Optionally change ExUnit.Server to use a :queue instead of appending to a list

Properly naming the flags can be further discussed. Maybe --sync-require would be better (sync vs parallel require).

@josevalim
Copy link
Member

--force-sync. This could force all tests to be added as sync, even if they are marked as async.

Wouldn't it be the same as --trace or setting --max-cases 1?

Add --trace-require to force a deterministic compilation order

It seems this is definitely a minimum requirement. I am afraid, however, this is not enough for reproducing a workflow. For example, imagine CI fails. --trace-require with a given seed would make it deterministic but not necessarily run in the order that made CI fail. So you may still want to have --sort-first and --sort-last?

Maybe, however, the flag we need to add is --deterministic. This sets max_cases: 1 and disables parallel requires? And steps 3 and 4?

@SteffenDE
Copy link
Contributor Author

Wouldn't it be the same as --trace or setting --max-cases 1?

Not exactly the same, as even with --max-cases 1 async tests would run before sync tests. But I guess you're right that --max-cases 1 should be enough together with sequential compilation. So yeah no need for --force-sync.

For example, imagine CI fails. --trace-require with a given seed would make it deterministic but not necessarily run in the order that made CI fail.

Exactly. It only helps for runs with the same settings. But it's still a win, as you can then try running locally until you can hopefully reproduce it with --deterministic and then say to a colleague "just run mix test --seed 1234 --deterministic to reproduce". Or, if the slowdown is acceptable also set it on CI, at least until the cause for the flaky tests is found and fixed :)

So I'd propose the following:

  1. Change ExUnit.Server to use a :queue for FIFO ordering
  2. Add --sync-require / --trace-require / --sequential-require which only enforces sequential require
  3. Add --deterministic which sets the prior and --max-cases 1 (can be the same PR as 2)
  4. Think more about --sort-first/last.

After all this I'm not fully convinced that --sort-first/last is truly that helpful. Imagine the case with Phoenix where mix compile in a mix task test messed up the code paths. We would not detect this with --sort-first as the paths are still good. And in my testing we also don't run into it if we run the affected test last, as there are other tests in between that seem to fix it again.
So I'm not sure if it's worth the complexity. Running the test suite with --deterministic multiple times until it (hopefully) fails might be just enough.

@josevalim
Copy link
Member

I think we only need to add --deterministic, which sets sequential requires AND --max-cases 1, without adding a flag for sequential require. If you want sequential requires and another max cases, you can always set max cases explicitly.

Change ExUnit.Server to use a :queue for FIFO ordering

Do we even need the queue for FIFO ordering? If they are loaded in order, it is deterministic anyway. I think we only need FIFO ordering if we add --sort-first and --sort-last.

So overall, I think it is:

  1. Add --deterministic which sets sequential requires and --max-cases 1 (can be the same PR as 2)
  2. Think more about --sort-first/last (which would require FIFO ordering)

@SteffenDE
Copy link
Contributor Author

SteffenDE commented Jun 2, 2024

Do we even need the queue for FIFO ordering? If they are loaded in order, it is deterministic anyway. I think we only need FIFO ordering if we add --sort-first and --sort-last.

I don't think so. Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. Maybe it does a database operation or something. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as compilation happens concurrently we could have:

  1. FooTest is compiled and because it's async it is immediately started. It takes 1 second to run.
  2. In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest.

The order of test runs now is: FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run. While FooTest runs, more than two other tests are compiled. The order of test runs is: FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

So for deterministic order either compilation would need to wait for each test to complete, or we need a FIFO order, right?

@josevalim
Copy link
Member

So for deterministic order either compilation would need to wait for each test to complete, or we need a FIFO order, right?

I see. That's why you also proposed --force-sync OR at least disable the feature that runs test modules as soon as they are loaded (these two options are quite similar but they are implemented differently).

It feels that loading tests and running tests as two distinct events, instead of concurrently, would be even better if we want to guarantee determinism?

So maybe we should add these two options:

  1. --force-sync - force all tests to run synchronously. The difference with --max-cases 1 is that --max-cases still runs tests as they are loaded. --force-sync only runs after they are loaded.

  2. --parallel-requires N - the concurrency for loading test files, set it to 1 for determinism (which should be easier to implement, we only need to add a max_concurrency to Kernel.ParallelCompiler?

@SteffenDE
Copy link
Contributor Author

It feels that loading tests and running tests as two distinct events, instead of concurrently, would be even better if we want to guarantee determinism?

I don't know if it's "better". --force-sync (I implemented it locally) definitely slows down a mix test run considerably, if combined with sequential requires. So if --max-cases 1 and FIFO order is good enough I would prefer that for --deterministic. Having --force-sync as another option could be nice, maybe?

Phoenix:

  • mix test --trace: Finished in 23.1 seconds (2.2s async, 20.8s sync)
  • mix test --force-sync --trace: Finished in 23.7 seconds (0.00s async, 23.7s sync)
  • mix test --force-sync --trace --trace-require: Finished in 30.3 seconds (0.00s async, 30.3s sync)

we only need to add a max_concurrency to Kernel.ParallelCompiler?

I like that! While initially implementing --sort-first I kind of expected to see such an option available already. No matter what we're ending up with here, I think that's a good addition.

@josevalim
Copy link
Member

So let's do --max-requires N (to mirror --max-cases) and FIFO. That should add more determinism and you can pair it with --max-cases too. Let's have --force-sync/--sort-first/--sort-last for after.

@SteffenDE
Copy link
Contributor Author

That sounds good to me! I’ll close this one and work on new PRs. Thank you so much for the guidance :)

@SteffenDE SteffenDE closed this Jun 2, 2024
SteffenDE added a commit to SteffenDE/elixir that referenced this pull request Jun 4, 2024
In order to have more deterministic test runs when using `--max-cases 1`
and `--max-requires 1` (elixir-lang#13635)
(see also elixir-lang#13589), we need to
run tests in compilation order (FIFO).

In the past, ExUnit.Server appended new tests to the front of a list,
which would result in the most recently added test to be run first.

Let's quickly demonstrate the problem this causes for deterministic runs
with a simple example:

Imagine a test (let's call if FooTest) that takes a non-deterministic
amount of time to run. For now let's assume that it sometimes takes 1
second and sometimes up to 5. And as async tests execute in parallel with
compilation of other test files, we could have the following scenario:

FooTest is compiled and because it's async it is immediately started.
It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended
to the list, then BazTest.
The order of test runs now is:

FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run.
While FooTest runs, more than two other tests are compiled.
The order of test runs is:

FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

This can be fixed either by appending new test modules to the end of the
list, or - and that's what this commit does - by using a `:queue`
instead.
josevalim pushed a commit that referenced this pull request Jun 5, 2024
In order to have more deterministic test runs when using `--max-cases 1`
and `--max-requires 1` (#13635)
(see also #13589), we need to
run tests in compilation order (FIFO).

In the past, ExUnit.Server appended new tests to the front of a list,
which would result in the most recently added test to be run first.

Let's quickly demonstrate the problem this causes for deterministic runs
with a simple example:

Imagine a test (let's call if FooTest) that takes a non-deterministic
amount of time to run. For now let's assume that it sometimes takes 1
second and sometimes up to 5. And as async tests execute in parallel with
compilation of other test files, we could have the following scenario:

FooTest is compiled and because it's async it is immediately started.
It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended
to the list, then BazTest.
The order of test runs now is:

FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run.
While FooTest runs, more than two other tests are compiled.
The order of test runs is:

FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

This can be fixed either by appending new test modules to the end of the
list, or - and that's what this commit does - by using a `:queue`
instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants