TestSequencer heuristic using dependency tree file sizes #7553

jeysal · 2018-12-26T21:57:28Z

Summary

In https://www.youtube.com/watch?v=3YDiloj8_d0, @cpojer mentioned that the TestSequencer heuristics aside from failures/durations from previous runs could use some improvements.
This PR improves the existing "file size" heuristic that is used as a fallback when there is no information from previous runs available by calculating the size of the whole dependency tree of a test in order to schedule more complex tests first.
Note that this PR does not touch on the idea of "test priority" based on changed files that is also mentioned in the video.

Test plan

Edit: See comments below for more up-to-date information.

I used this script on a few small & large open source projects that came to my mind that use Jest.
The script diffs the test order without a cache (which is determined by the file size heuristic) against the test order after one full Jest run (which is determined by the test durations) and counts the number of lines that diff prints, which serves as a rough approximation of how close we got to the best test order (lower is better)

project: master (median of 4 runs) / this PR (median of 4 runs)
redux: 14 / 12
downshift: 27 / 25
recompose: 75.5 / 75
gatsby: 225.5 / 225
react: 370 / 363.5
jest: 598 / 585.5
prettier: 1816 / 1812.5

The difference might not seem like a lot (and tbh I initially hoped for more), but in hindsight I actually think this is pretty good. The larger projects showed quite consistently improved numbers. I hacked together the alternative approaches "counting test/it occurrences" and "number of files in dependency graph" and they showed no significant improvements over master at all, so I think this is pretty much as good as you can get with a reasonably simple heuristic.

cpojer · 2018-12-27T01:29:12Z

Wow this is awesome. I’ll take a closer look after the holidays. Thank you for building this, and nice wins!

packages/jest-cli/src/TestSequencer.js

jeysal · 2018-12-27T20:28:29Z

fixed tests on Windows (hopefully)

codecov-io · 2018-12-27T20:51:30Z

Codecov Report

Merging #7553 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #7553      +/-   ##
==========================================
+ Coverage   68.27%   68.33%   +0.06%     
==========================================
  Files         252      252              
  Lines        9682     9703      +21     
  Branches        6        5       -1     
==========================================
+ Hits         6610     6631      +21     
  Misses       3070     3070              
  Partials        2        2

Impacted Files	Coverage Δ
packages/jest-cli/src/TestSequencer.js	`100% <100%> (ø)`	⬆️
packages/jest-resolve-dependencies/src/index.js	`98.14% <100%> (+0.42%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9243e69...07cbcc8. Read the comment docs.

jeysal · 2018-12-28T13:38:51Z

I just realized that the metric I used to measure the gains is garbage.
Diffing the test order before and after always produces roughly 2 * number of test files lines.
I'll come up with a better metric and then try a few heuristic approaches again.
Marking this as WIP meanwhile

jeysal · 2018-12-31T00:10:07Z

Alright, so the new metric in the gist uses the sum of the difference between the index of each test file in the cold vs warm sequence.
The current version after adding handling for tests that depend on child_process, fs etc. produces the following results (all medians of 5 runs):

          downshift redux recompose gatsby  prettier  react jest
master    20        14    228       2862    188328    5550  28860
PR	      20        12    228       2312    187066    6854  17352

SimenB · 2018-12-31T09:59:44Z

What's up with React? Do you know if any test(s) in particular fools the heuristic?

packages/jest-cli/src/TestSequencer.js

packages/jest-cli/src/__tests__/test_sequencer.test.js

jeysal · 2018-12-31T13:41:12Z

packages/jest-cli/src/TestSequencer.js

+// If a test depends on one of these core modules,
+// we assume that the test may be slower because it is an "integration test"
+// that spawn child processes, accesses the file system, etc.
+const coreModuleWeights = {


Any ideas on other core modules we could include here?
I just picked some arbitrary modules that came to my mind and they caused quite an improvement, unfortunately running the benchmark on a few projects takes quite some time so I can't try out all the modules.

https if http is worth it

Can we extend this comment with the information on how's 10000 different than 100000? It's not obvious if the weight is size in bytes or maybe something else?

jeysal · 2018-12-31T13:41:48Z

What's up with React? Do you know if any test(s) in particular fools the heuristic?

I'll take a look

jeysal · 2018-12-31T14:05:30Z

Looks like for the most part the new heuristic overestimates the duration of the react-dom tests and severely underestimates the duration of the react-reconciler as well as some ESLint-related tests.

I think it's just that ReactDOM is huge (almost all the ReactDOM tests have size >800000 in recursive dependencies) and most of its tests depend on the whole thing, but of course don't use all of it.
Most of the reconciler tests are between 600000 and 800000.

cpojer · 2019-01-02T01:43:37Z

I just took a look at this and I love that it brings real speed ups for some projects, that's awesome. I'm a little bit worried about projects with large amounts of tests and gigantic dependency trees as we'd stat for the file size of the entire repo every single time Jest is invoked. What if we cap the file size calls at one or two levels in? What if we apply some weight to tests with large dependency trees (if a file has more dependencies than 90% (?) of all tests, it is probably slower, etc.)? I want us to be conscious about the time Jest optimizing a test run as it can be counter productive on really large repos.

SimenB · 2019-01-02T08:38:31Z

I'm a little bit worried about projects with large amounts of tests and gigantic dependency trees as we'd stat for the file size of the entire repo every single time Jest is invoked.

Thoughts on my suggestion to stick it in hastefs? We already stat as part of jest-haste-map's crawl. Or would that use too much memory? A number is just 64 bits so maybe not

cpojer · 2019-01-02T08:47:15Z

I’m in favor of that as long as it doesn’t slow down the watchman call, but watchman should be able to give us the file size.

…

________________________________ From: Simen Bekkhus <notifications@github.com> Sent: Wednesday, January 2, 2019 17:38 To: facebook/jest Cc: Christoph Nakazawa; Mention Subject: Re: [facebook/jest] TestSequencer heuristic using dependency tree file sizes (#7553) I'm a little bit worried about projects with large amounts of tests and gigantic dependency trees as we'd stat for the file size of the entire repo every single time Jest is invoked. Thoughts on my suggestion to stick it in hastefs? We already stat as part of jest-haste-map's crawl. Or would that use too much memory? A number is just 64 bits so maybe not — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#7553 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAA0KE7BF8kdAq6LzEXjPbY9-HmN6p4Tks5u_HARgaJpZM4ZiHlz>.

SimenB · 2019-01-02T08:49:09Z

Yeah, watchman includes the file size, while the node crawler (and watch) includes stat

jeysal · 2019-01-02T12:59:13Z

Alright, I can try to implement @SimenB's suggestion to eliminate the I/O overhead.
If we still have problems with the actual computation time it takes to recurse into the dependency graph, I see two possible optimizations.

The easy one: Cache _fileSizeRecurseDependencies results (Because it is used as a comparator in sort, it is called with the same path multiple times).

The hard one: Implement a proper algorithm to calculate all the recursive sizes, something like:

Condensate the dependency graph to a DAG of strongly connected components - O(|V|+|E|)
Traverse from leaf nodes to root nodes, setting the size of each node to the sum of its successors plus its own file size(s) - O(|V|+|E|) because of topological sort

cpojer · 2019-01-03T01:02:43Z

Let's see where we land with file sizes stored in the haste map first, and then we can optimize things in a separate PR further :)

jeysal · 2019-01-05T14:05:38Z

opened #7580 for storing file sizes in haste map file metadata

jeysal · 2019-01-09T13:05:31Z

Rebased on master, integrating the hasteFS file sizes.
So do we merge it this way and then see if anyone complains about experiencing a significant performance hit?
On projects like React, the TestSequencer does noticeably take a few hundred ms (only on cold runs ofc), but those are also the projects where the heuristic can achieve the biggest gains, for example a few seconds for React by, umm, not scheduling the longest running test ReactClassEquivalence as one of the last tests to run like Jest on master currently does :D

jeysal · 2019-01-09T16:02:54Z

@SimenB implemented both of your suggestions, thanks!

SimenB · 2019-01-09T16:16:26Z

packages/jest-cli/src/TestSequencer.js

+
+    const fileSize = (path): number => {
+      const cachedSize = sizes.get(path);
+      if (cachedSize != null) {


Suggested change

if (cachedSize != null) {

if (sizes.has(path)) {

The problem with that is the Flow types - Flow will think the size can be null.

The problem with that is the Flow types - Flow will think the size can be null

Not sure why you need cachedSize at all, sizes is always defined and you should be good using @SimenB's code (plus removing line 79), no?

This is what I mean. (linked issue is for TS, but applies to Flow as well).
The return type of Map<Path, number>.get is number | void, not number, even if you've done a has check previously.

rubennorte · 2019-01-09T16:48:16Z

I'll release a new alpha now and test it internally to check the impact of the size addition to the haste map. I'll come back to you once it's done.

thymikee · 2019-01-09T17:26:05Z

packages/jest-cli/src/TestSequencer.js

+  _fileSizeRecurseDependencies(test: Test, sizes: Map<Path, number>): number {
+    const {resolver, hasteFS, config} = test.context;
+
+    const fileSize = (path): number => {


by passing sizes to this fn, you could extract it outside of _fileSizeRecurseDependencies, so it doesn't have to be recreated multiple times

Note that it also uses resolver and hasteFS, so fileSize would need the whole test instead of just the path as its argument as well. Not sure if that's worth it?
Edit: Actually the resolver and hasteFS themselves as arguments, because fileSize is also called on non-test paths.

Yup, so it can be a class method, right?

Converted it to a class method, although IMHO it's less readable now.

packages/jest-resolve-dependencies/src/__tests__/dependency_resolver.test.js

thymikee · 2019-01-09T17:46:45Z

packages/jest-cli/src/TestSequencer.js

+
+    const fileSize = (path): number => {
+      const cachedSize = sizes.get(path);
+      if (cachedSize != null) {


The problem with that is the Flow types - Flow will think the size can be null

Not sure why you need cachedSize at all, sizes is always defined and you should be good using @SimenB's code (plus removing line 79), no?

jeysal · 2019-01-16T22:33:33Z

test it internally to check the impact of the size addition to the haste map

@rubennorte do you have any news on this yet? :)

rubennorte · 2019-01-23T11:47:02Z

@jeysal sorry it took me so long. The change in the haste map is good perf-wise. I'd wait for the release of 24 before merging this. We can release it as an alpha to test it and then as a minor if everything looks good. Thanks for working on this!

jeysal · 2019-01-23T11:49:56Z

Agreed, best to include this in a minor, no reason for it to be in 24.0 👍

jeysal · 2019-01-27T14:46:34Z

Rebased on master.
@rubennorte do you have a Jest project at FB with an absurdly large (much bigger than React) dependency tree and amount of tests that you could use to check if sorting the tests on an empty cache takes too long? It would be noticeable because the gray "Determining test suites to run..." message appears longer than it does with Jest master.

jeysal · 2019-02-07T21:59:00Z

To avoid this stalling much longer: Given that there were no significant performance issues with the large open source projects using Jest that I tested, shall we merge it and address the potential perf improvements suggested if someone reports a problem with some project? (I'd give it a quick rebase) @SimenB

cpojer · 2019-02-08T09:09:09Z

Seems like this needs another rebase.

I would advise against merging this for now. I trust that perf is good, but we should verify this on the largest codebase using Jest in the world first, otherwise we'll stall Jest releases later. I'd prefer to stall a PR until we get around to testing it instead of stalling a release of Jest when we realize something is causing problems. Hope that makes sense to you.

jeysal · 2019-02-08T10:59:22Z

rebased

github-actions · 2021-05-11T18:13:01Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

facebook-github-bot added the cla signed label Dec 26, 2018

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 8e2a26a to 1e0f863 Compare December 26, 2018 22:02

SimenB reviewed Dec 27, 2018

View reviewed changes

packages/jest-cli/src/TestSequencer.js Outdated Show resolved Hide resolved

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 1e0f863 to 9e26ade Compare December 27, 2018 20:27

jeysal changed the title ~~TestSequencer heuristic using dependency tree file sizes~~ WIP TestSequencer heuristic using dependency tree file sizes Dec 28, 2018

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 9e26ade to 8ea7d91 Compare December 31, 2018 00:10

jeysal changed the title ~~WIP TestSequencer heuristic using dependency tree file sizes~~ TestSequencer heuristic using dependency tree file sizes Dec 31, 2018

SimenB reviewed Dec 31, 2018

View reviewed changes

packages/jest-cli/src/TestSequencer.js Outdated Show resolved Hide resolved

SimenB reviewed Dec 31, 2018

View reviewed changes

packages/jest-cli/src/__tests__/test_sequencer.test.js Outdated Show resolved Hide resolved

jeysal force-pushed the test-sequencer-dependency-file-sizes branch 2 times, most recently from c2817d8 to 9adb791 Compare December 31, 2018 13:37

jeysal commented Dec 31, 2018

View reviewed changes

jeysal mentioned this pull request Jan 5, 2019

HasteFS file size #7580

Merged

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 9adb791 to 95d2c7a Compare January 9, 2019 12:35

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 95d2c7a to 2d93ff4 Compare January 9, 2019 16:00

SimenB requested review from cpojer, mjesun and rubennorte January 9, 2019 16:15

SimenB approved these changes Jan 9, 2019

View reviewed changes

SimenB requested review from thymikee and rickhanlonii January 9, 2019 16:26

thymikee reviewed Jan 9, 2019

View reviewed changes

jeysal force-pushed the test-sequencer-dependency-file-sizes branch 2 times, most recently from fe7a4cd to 350e12a Compare January 10, 2019 13:04

jeysal mentioned this pull request Jan 21, 2019

Custom slicing of tests #7672

Closed

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 350e12a to 07cbcc8 Compare January 27, 2019 12:55

jeysal added 5 commits February 8, 2019 11:52

dependency_resolver.test.js describe blocks

6950b1e

add DependencyResolver.resolveRecursive()

47d5477

add DependencyResolver includeCoreModules option

22fee13

use file sizes of all dependencies in TestSequencer

1ca41b5

add test case for ordering to list-tests e2e test

2920417

jeysal force-pushed the test-sequencer-dependency-file-sizes branch from 07cbcc8 to 2920417 Compare February 8, 2019 10:58

jeysal closed this Jun 20, 2019

github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021

TestSequencer heuristic using dependency tree file sizes #7553

TestSequencer heuristic using dependency tree file sizes #7553

Conversation

jeysal commented Dec 26, 2018 • edited

Summary

Test plan

cpojer commented Dec 27, 2018

jeysal commented Dec 27, 2018

codecov-io commented Dec 27, 2018 • edited

Codecov Report

jeysal commented Dec 28, 2018

jeysal commented Dec 31, 2018 • edited

SimenB commented Dec 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeysal commented Dec 31, 2018

jeysal commented Dec 31, 2018 • edited

cpojer commented Jan 2, 2019

SimenB commented Jan 2, 2019

cpojer commented Jan 2, 2019 via email

SimenB commented Jan 2, 2019

jeysal commented Jan 2, 2019

cpojer commented Jan 3, 2019

jeysal commented Jan 5, 2019

jeysal commented Jan 9, 2019

jeysal commented Jan 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rubennorte commented Jan 9, 2019

Choose a reason for hiding this comment

jeysal Jan 9, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeysal commented Jan 16, 2019

rubennorte commented Jan 23, 2019

jeysal commented Jan 23, 2019

jeysal commented Jan 27, 2019

jeysal commented Feb 7, 2019 • edited

cpojer commented Feb 8, 2019

jeysal commented Feb 8, 2019

github-actions bot commented May 11, 2021

jeysal commented Dec 26, 2018 •

edited

codecov-io commented Dec 27, 2018 •

edited

jeysal commented Dec 31, 2018 •

edited

jeysal commented Dec 31, 2018 •

edited

jeysal Jan 9, 2019 •

edited

jeysal commented Feb 7, 2019 •

edited