Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify worker module map transmission w/ small perf benefit. #8237

Merged

Conversation

@scotthovestadt
Copy link
Contributor

scotthovestadt commented Mar 29, 2019

Summary

This PR unifies the way module maps are passed to the worker. Previously, we did it one way for watch mode and a different way for non-watch mode because our watch mode way was a lot slower.

I fixed that slowness for watch mode and realized while doing some performance and memory profiling that the watch mode way is now actually faster on a few levels:

  1. It's straight-up faster to transmit it to the process because the module map is significantly smaller than the whole haste map you have to deserialize if you get at it via the file.
  2. If you load the whole haste map and want to discard half of it, suddenly there is a bunch of stuff that will need to be GC'd in the future. This happens in the worker because it only wants the module map but has to deserialize the whole file.
  3. Not requiring the haste map be written to disk at this point opens up further optimizations in the future.

Here's a benchmark of running yarn jest packages/expect, meant to profile starting up some workers and running a couple tests. Each profile was run 10 times after 3 warm ups.

master

Time (mean ± σ): 3.902 s ± 0.120 s [User: 21.570 s, System: 5.105 s]
Range (min … max): 3.682 s … 4.084 s 10 run

this branch

Time (mean ± σ): 3.522 s ± 0.175 s [User: 19.722 s, System: 4.777 s]
Range (min … max): 3.356 s … 3.897 s 10 runs

It's faster. It's less code with a unified code path. It opens up more optimizations in the future.

Test plan

  1. All tests pass.
  2. Benchmarks show better performance in all situations.
Copy link
Collaborator

thymikee left a comment

🎉

@thymikee

This comment has been minimized.

Copy link
Collaborator

thymikee commented Mar 29, 2019

Linter says that we can actually remove even more :D

@@ -36,6 +35,30 @@ export default class ModuleMap {
private readonly _raw: RawModuleMap;
private json: SerializableModuleMap | undefined;

private static mapToArrayRecursive(

This comment has been minimized.

Copy link
@scotthovestadt

scotthovestadt Mar 29, 2019

Author Contributor

You're probably wondering: what is this?

Turns out the ModuleMap wasn't being serialized in watch mode correctly but tests didn't catch it because we had 2 code paths.

1 code path = bug caught and fixed.

@@ -65,43 +65,6 @@ test('injects the serializable module map into each worker in watch mode', () =>
});
});

test('does not inject the serializable module map in serial mode', () => {

This comment has been minimized.

Copy link
@scotthovestadt

scotthovestadt Mar 29, 2019

Author Contributor

pretty sure this code path no longer exists with my changes, please sanity check my assumption

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 29, 2019

Codecov Report

Merging #8237 into master will decrease coverage by 0.04%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8237      +/-   ##
==========================================
- Coverage   62.33%   62.28%   -0.05%     
==========================================
  Files         265      265              
  Lines       10553    10556       +3     
  Branches     2565     2563       -2     
==========================================
- Hits         6578     6575       -3     
- Misses       3387     3393       +6     
  Partials      588      588
Impacted Files Coverage Δ
packages/jest-haste-map/src/ModuleMap.ts 56.25% <0%> (-8.04%) ⬇️
packages/jest-runner/src/testWorker.ts 0% <0%> (ø) ⬆️
packages/jest-runner/src/index.ts 65.95% <100%> (-2.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e08be02...8d3a28f. Read the comment docs.

@SimenB
SimenB approved these changes Mar 29, 2019
Copy link
Collaborator

SimenB left a comment

Less code and better perf is an awesome outcome! 😀

@scotthovestadt

This comment has been minimized.

Copy link
Contributor Author

scotthovestadt commented Mar 29, 2019

@cpojer Would prefer you review before merge.

@scotthovestadt scotthovestadt requested a review from cpojer Mar 29, 2019
@SimenB SimenB requested a review from rubennorte Mar 29, 2019
@cpojer

This comment has been minimized.

Copy link
Contributor

cpojer commented Mar 29, 2019

The reason that we save the map to a file and read it from the worker is that I've found that on large module maps it used to take an insane amount of time to share it through workers. Doing it with two different implementations seemed like a good trade off. Before merging this, can you confirm this is actually faster on www and that it keeps scaling?

@scotthovestadt

This comment has been minimized.

Copy link
Contributor Author

scotthovestadt commented Mar 29, 2019

@cpojer

Benchmarked against 11 test files (all of which are just a single test with a single assertion) on WWW. Ran old vs. new 10 times each to get a decent profile. --skipFilter on and cache warm before running each benchmark.

This benchmarks 1.5% faster on average, which means that just this part is actually significantly faster since it's just a small part of the overall picture. 🥇

Other thoughts:

  • The benchmark doesn't even take into account the benefits of not needing to GC a lot of data loaded and discarded immediately, so the performance profile is slightly better than it seems.
  • Performance benefit is secondary. I'll take it with equal performance because this opens up future optimization opportunities and unifies code paths.
@cpojer

This comment has been minimized.

Copy link
Contributor

cpojer commented Mar 29, 2019

Oh wow, that's cool. Well, if you are confident, ship it.

@scotthovestadt

This comment has been minimized.

Copy link
Contributor Author

scotthovestadt commented Mar 29, 2019

@cpojer I just ran it 2 more times to be sure. It's definitely a statistically significant improvement in terms of performance. Merging!

@scotthovestadt scotthovestadt merged commit 9c9555f into facebook:master Mar 29, 2019
10 of 11 checks passed
10 of 11 checks passed
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci: lint-and-typecheck Your tests passed on CircleCI!
Details
ci/circleci: test-browser Your tests passed on CircleCI!
Details
ci/circleci: test-jest-circus Your tests passed on CircleCI!
Details
ci/circleci: test-node-10 Your tests passed on CircleCI!
Details
ci/circleci: test-node-11 Your tests passed on CircleCI!
Details
ci/circleci: test-node-6 Your tests passed on CircleCI!
Details
ci/circleci: test-node-8 Your tests passed on CircleCI!
Details
ci/circleci: test-or-deploy-website Your tests passed on CircleCI!
Details
deploy/netlify Deploy preview ready!
Details
facebook.jest #20190329.16 succeeded
Details
private static mapFromArrayRecursive(
arr: ReadonlyArray<[string, unknown]>,
): Map<string, unknown> {
if (arr[0] && Array.isArray(arr[1])) {

This comment has been minimized.

Copy link
@pyBlob

pyBlob Sep 10, 2019

Why do you check for arr[1]? Is there some constraint that there are always at least 2 items in arr when going recursive?
Checking for arr[0][1] would make more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.