Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize tests only when there are enough to justify the parallelization overhead #42761

Merged
merged 1 commit into from Jul 14, 2021

Conversation

jorgemanrubia
Copy link
Contributor

@jorgemanrubia jorgemanrubia commented Jul 12, 2021

Parallelizing tests has a cost in terms of database setup and fixture loading. This change makes Rails disable parallelization when the number of tests is below a configurable threshold.

When running tests in parallel each process gets its own database instance. On each execution, each process will update each database schema (if needed) and load all the fixtures. This can be very expensive for non trivial datasets.

As an example, for HEY, when running a single file with 18 tests, running tests in parallel in my box adds an overhead of 13 seconds versus not parallelizing them. Of course parallelizing is totally worthy when there are many tests to run, but not when running just a few tests.

The threshold is configurable via config.active_support.test_parallelization_minimum_number_of_tests, which is 30 50 by default.

This also adds some tracing to know how tests are being executed:

When in parallel:

Running 2829 tests in parallel using 8 processes

When not in parallel:

Running 15 tests in a single process (parallelization threshold is 30)

Pending:

  • Document
  • Changelog
  • Tests
  • Refine default (30 is pretty much a wild guess). Update: see this.

cc @eileencodes

@ghiculescu
Copy link
Member

This would need a changelog entry. I'm a big fan of the concept.

@jorgemanrubia
Copy link
Contributor Author

jorgemanrubia commented Jul 12, 2021

Yes thanks @ghiculescu it's there in the "pending" list. I'll add :)

Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I feel about this feature as it silently changes behavior for existing applications. I think if someone isn't explicitly turning off parallelization then there should be a warning that parallelization isn't turned on.

activesupport/lib/active_support/test_case.rb Outdated Show resolved Hide resolved
activesupport/CHANGELOG.md Outdated Show resolved Hide resolved
@jorgemanrubia jorgemanrubia force-pushed the smart-parallel-tests branch 2 times, most recently from 8daa191 to 6a41aa1 Compare July 12, 2021 20:37
@rails-bot rails-bot bot added the railties label Jul 12, 2021
Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I like that new warning and the method name changes. I think ParallelizeWhenSuitableExecutor should just be ParallelizeExecutor since it always gets initialized. Once that's fixed, can you also squash your commits into one?

Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple more comments after I hit enter on that last one, sorry for the notification noise.

I think that parallelization should always be on if PARALLEL_WORKERS is set above 1 because that means it's an explicit setting and I think it would be odd to have to change a config and an env var.

@jorgemanrubia jorgemanrubia force-pushed the smart-parallel-tests branch 2 times, most recently from 3cabec5 to 2de5298 Compare July 13, 2021 11:26
@jorgemanrubia
Copy link
Contributor Author

jorgemanrubia commented Jul 13, 2021

@eileencodes I think I addressed all your comments. I changed things a bit to move some configuration decisions into the new ParalellizeExecutor.

I still have pending to add some tests for this to wrap things up. Also I'll try to determine a more-informed default. I'll do that today or tomorrow.

Thanks for the review and feedback 🙏.

@jorgemanrubia jorgemanrubia force-pushed the smart-parallel-tests branch 2 times, most recently from 4ba649b to 93a3af3 Compare July 13, 2021 11:35
@jorgemanrubia
Copy link
Contributor Author

jorgemanrubia commented Jul 13, 2021

@dhh suggested the possibility of adapting the number of workers to the number of tests. E.g: 30 tests, 1 worker. 60 tests, 2 workers, etc. I think it would be great and it would be totally feasible with this implementation. I'd leave that for a future iteration though.

@jorgemanrubia jorgemanrubia changed the title Parallelize tests only when there are a number enough to make it worthy Parallelize tests only when there are enough to compensate the parallelization overhead Jul 13, 2021
@jorgemanrubia jorgemanrubia changed the title Parallelize tests only when there are enough to compensate the parallelization overhead Parallelize tests only when there are enough to justify the parallelization overhead Jul 13, 2021
@jorgemanrubia
Copy link
Contributor Author

jorgemanrubia commented Jul 14, 2021

I wrote this script for trying to determine a proper default threshold here. It measures and compares times for running tests in parallel versus running them sequentially. When running the script with HEY and Basecamp I got 79 and 85, respectively.

For measuring, I established a synthetic test duration of 100ms each test. If no delay, threshold goes ridiculously high (it's always faster to run sequentially).

I lean towards setting the default threshold conservatively at 50.

@jorgemanrubia jorgemanrubia force-pushed the smart-parallel-tests branch 4 times, most recently from 24de208 to 42f83a5 Compare July 14, 2021 11:23
Parallelizing tests has a cost in terms of database setup and fixture
loading. This change makes Rails disable parallelization when the number
of tests is below a configurable threshold.

When running tests in parallel each process gets its own database
instance. On each execution, each process will update each database
schema (if needed) and load all the fixtures. This can be very expensive
for non trivial datasets.

As an example, for HEY, when running a single file with 18 tests,
running tests in parallel in my box adds an overhead of 13 seconds
versus not parallelizing them. Of course parallelizing is totally worthy
when there are many tests to run, but not when running just a few tests.

The threshold is configurable via
config.active_support.test_parallelization_minimum_number_of_tests,
which is 30 50 by default.

This also adds some tracing to know how tests are being executed:

When in parallel:

```
Running 2829 tests in parallel in 8 processes
```

When not in parallel:

```
Running 15 tests in a single process (parallelization threshold is 30)
```
@jorgemanrubia
Copy link
Contributor Author

jorgemanrubia commented Jul 14, 2021

This is good to merge from my side @eileencodes

@eileencodes eileencodes merged commit c183397 into rails:main Jul 14, 2021
akshaymohite added a commit to akshaymohite/rails that referenced this pull request Jul 14, 2021
…test name.

- Related to rails#42761
- Used `parallelized?` method instead of calling a method `should_parallelize?` to figure out if parallezation is enabled.
- Fixed semantics of the test name corresponding to the change
- Updated test name as per the code review suggestion.
jorgemanrubia added a commit to basecamp/rails that referenced this pull request Jul 19, 2021
…test

rails#42761 made the old system of disable paralleling testing when
only one test file was included obsolete.
jeremy pushed a commit that referenced this pull request Jul 27, 2021
…test

#42761 made the old system of disable paralleling testing when
only one test file was included obsolete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants