Skip to content
This repository has been archived by the owner on Jan 7, 2022. It is now read-only.

Use custom cachable fs.realpath implementation #20

Merged
merged 1 commit into from Jun 26, 2019
Merged

Conversation

isaacs
Copy link
Contributor

@isaacs isaacs commented Jun 10, 2019

(NB: this builds on top of #19 and should be landed afterwards)

In this use case, we don't care much about a lot of the stuff that
fs.realpath can (and should!) do. The only thing that's relevant to
reading a package tree is whether package folders are symbolic links,
and if so, where they point.

Additionally, we don't need to re-start the fs.lstat party every time we
walk to a new directory. While it makes sense for fs.realpath to do
this in the general case, it's not required when reading a package tree,
and results in a geometric explosion of lstat syscalls. For example, if
a project is in /Users/hyooman/projects/company/website, and it has 1000
dependencies in node_modules, then a whopping 6,000 lstat calls will be
made just to repeatedly verify that
/Users/hyooman/projects/company/website/node_modules has not moved!

In this implementation, every realpath call is cached, as is every
lstat. Additionally, process.cwd() is assumed to be "real enough", and
added to the cache initially, which means almost never having to walk
all the way up to the root directory.

In the npm cli project, this drops the lstat count from 14885 to 3054
for a single call to read-package-tree on my system. Larger projects,
or projects deeper in a folder tree, will have even larger reductions.

This does not account, itself, for a particularly large speed-up, since
lstat calls do tend to be fairly fast, and the repetitiveness means that
there are a lot of hits in the file system's stat cache. But it does
make read-package-tree 10-30% faster in common use cases.

@isaacs isaacs requested a review from zkat June 10, 2019 23:13
@coveralls
Copy link

coveralls commented Jun 10, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling e9cd536 on isaacs/less-lstat into 4eed760 on master.

In this use case, we don't care much about a lot of the stuff that
fs.realpath can (and should!) do.  The only thing that's relevant to
reading a package tree is whether package folders are symbolic links,
and if so, where they point.

Additionally, we don't need to re-start the fs.lstat party every time we
walk to a new directory.  While it makes sense for fs.realpath to do
this in the general case, it's not required when reading a package tree,
and results in a geometric explosion of lstat syscalls.  For example, if
a project is in /Users/hyooman/projects/company/website, and it has 1000
dependencies in node_modules, then a whopping 6,000 lstat calls will be
made just to repeatedly verify that
/Users/hyooman/projects/company/website/node_modules has not moved!

In this implementation, every realpath call is cached, as is every
lstat.  Additionally, process.cwd() is assumed to be "real enough", and
added to the cache initially, which means almost never having to walk
all the way up to the root directory.

In the npm cli project, this drops the lstat count from 14885 to 3054
for a single call to read-package-tree on my system.  Larger projects,
or projects deeper in a folder tree, will have even larger reductions.

This does not account, itself, for a particularly large speed-up, since
lstat calls do tend to be fairly fast, and the repetitiveness means that
there are a lot of hits in the file system's stat cache.  But it does
make read-package-tree 10-30% faster in common use cases.
@isaacs isaacs merged commit e9cd536 into master Jun 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants