Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jest-haste-map] reduce the number of lstat calls in node crawler #9514

Merged
merged 9 commits into from
Feb 5, 2020

Conversation

yepitschunked
Copy link
Contributor

Summary

fs.readdir has a withFileTypes option that will tell us if the file is a symlink or directory. Therefore, there's no need to make and wait for an lstat call before taking the appropriate actions for those file types (skip and recurse, respectively).

This drastically improves the initial crawl time when using the node watcher.

Test plan

Applied changes and ran it against my project. The activeCalls counter is a little tricky, but I'm fairly certain I've handled it correctly here since I'd expect an infinite loop if I got it wrong.

@yepitschunked yepitschunked requested review from SimenB and thymikee and removed request for SimenB February 4, 2020 01:34
@facebook-github-bot
Copy link
Contributor

Hi wonnage! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@yepitschunked yepitschunked changed the title [node crawler] reduce the number of lstat calls [jest-haste-map] reduce the number of lstat calls in node crawler Feb 4, 2020
@yepitschunked
Copy link
Contributor Author

Hmm, it looks like this option only exists in node > 10 - will add a check for that.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thank you!

packages/jest-haste-map/src/crawlers/node.ts Outdated Show resolved Hide resolved
packages/jest-haste-map/src/crawlers/node.ts Show resolved Hide resolved
@SimenB
Copy link
Member

SimenB commented Feb 4, 2020

@wonnage if you missed it - this is failing unit tests, on both old and new versions of node, with the same error. Probably just an error in the tests, but still 🙂

On that note, it would be nice to add some more unit tests, maybe spying on fs.lstat, to verify it's called (or not) depending on node version.

@codecov-io
Copy link

codecov-io commented Feb 4, 2020

Codecov Report

Merging #9514 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9514      +/-   ##
==========================================
+ Coverage   65.01%   65.04%   +0.02%     
==========================================
  Files         283      283              
  Lines       12148    12155       +7     
  Branches     3009     3013       +4     
==========================================
+ Hits         7898     7906       +8     
  Misses       3607     3607              
+ Partials      643      642       -1
Impacted Files Coverage Δ
packages/jest-haste-map/src/crawlers/node.ts 85.54% <100%> (+2.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 964dc16...9f0aae7. Read the comment docs.

setTimeout(
() =>
callback(null, [
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you're wondering - I mirrored the existing mocks to look like a fake fs.Dirent object here. I thought about coming up with something fancy to avoid this duplication (e.g defining the mocks in a map and then reduceing it to get strings/dirents as needed) but it seemed more complicated than it was worth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's try to keep it simple here

{
isDirectory: () => false,
isSymbolicLink: () => true,
name: 'symlink',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed that there weren't any existing tests for symlinks that I could find, so I added one here. It doesn't change the result of any existing tests (since we ignore symlinks) but will affect the lstat call-counting tests I added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, thank you! we'll hopefully be landing some symlink support soonish, so these tests should come in handy

Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, I like it! Would love some other reviewers to take a look before merging, especially @cpojer

@cpojer
Copy link
Member

cpojer commented Feb 5, 2020

Looks good, although I'm curious about the actual speed gains.

Fwiw, if you rely on this speed up, I recommend installing watchmen (if you aren't on Windows).

@cpojer cpojer merged commit c4ae594 into jestjs:master Feb 5, 2020
@yepitschunked
Copy link
Contributor Author

Looks good, although I'm curious about the actual speed gains.

Fwiw, if you rely on this speed up, I recommend installing watchmen (if you aren't on Windows).

@cpojer yeah, we're doing some weird stuff with running metro in a OSX docker container, which seems to have super slow file IO. We can ignore large swaths of our monorepo to speed this up, but watchman doesn't support wildcard ignore_dirs, and we don't want to manually enumerate all of them. So it winds up taking forever crawling and hashing everything (since metro wants sha1s).

@cpojer
Copy link
Member

cpojer commented Feb 5, 2020

Historically haste map performance in docker has been really bad. It’s quite possible that changing strategy may lead to significant wins.

@cpojer
Copy link
Member

cpojer commented Feb 5, 2020

Also, any chance you could share before and after benchmarks for this change?

@yepitschunked
Copy link
Contributor Author

Also, any chance you could share before and after benchmarks for this change?

@cpojer I don't have anything scientific, but it was previously taking ~90s to scan our repo and this change cut it down to 30s. We're watching roughly 500k files (mostly in node_modules).

@cpojer
Copy link
Member

cpojer commented Feb 6, 2020

Nice!!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021
@jestjs jestjs unlocked this conversation Feb 5, 2022
Comment on lines +69 to +70
// This logic is unnecessary for node > v10.10, but leaving it in
// since we need it for backwards-compatibility still.
Copy link
Member

@SimenB SimenB Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yepitschunked we're about to drop node 10 (#12220), which part of the logic does this refer to? The entire stat or just specific parts of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is old, but would love your comment over in that PR 🙂

@jestjs jestjs locked as resolved and limited conversation to collaborators Feb 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants