Skip to content

Perf regression in Node 22/24 when loading JS files #60397

@Tyoneb

Description

@Tyoneb

Version

v22.21.0

Platform

any

What steps will reproduce the bug?

There is a performance regression when loading JS files from a location with latency (e.g. network path).

Unfortunately I don't know how to provide a sample to reproduce it in other conditions than my own environment, BUT, I think I identified the root cause in the NodeJS code (see below).

How often does it reproduce? Is there a required condition?

The issue is not present in:

  • v20.19.5
  • v21.4.0

The issue is partially fixed in:

  • v22.18.0
  • v22.19.0

It is present in:

  • v21.5.0
  • v22.21.0
  • v24.10.0

What is the expected behavior? Why is that the expected behavior?

The expected behavior is to load files as fast as in v20.x.

What do you see instead?

Files take up to 4.5 times slower to load (in my environment).

Additional information

I cloned the repo and tested each commit to reach the following conclusion:

  1. The regression was introduced in v21.5.0 by src: move package_json_reader cache to c++ #50322.
  2. A first fix in v22.19.0 (src: add cache to nearest parent package json #59086) fixed the issue partially but introduced an increase of memory consumption.
  3. A second fix in v22.21.0 (src: reduce the nearest parent package JSON cache size #59888) resolved the memory issue but reintroduced the performance regression.

I went through the code of each PR and here is my understanding:

  1. The initial change src: move package_json_reader cache to c++ #50322 moved the search of a package.json file from the JS side to the C++ side. I'm not a C++ dev but I believe a part of the previous algorithm was missed and now the cache only applies to the content of the package.json but not to the file system search. So when walking up the file tree, each possible location for the package.json is tested again and again. It becomes particularly painful when the package.json is located high up the file tree.That would very well explain why the performance degraded when the file system operations are costly, and was not seen in a purely local context.
  2. The first fix reintroduced caching on the JS side (which probably defeats at least partially the purpose of the initial change), but it duplicated the content of the package.json for each starting path location, thus increasing the consumed memory significantly.
  3. The second fix reintroduced the loop over the file system search on the JS side (which is kind of funny because it completely rolls back the initial change) while still not caching the results of the previous operations, thus bringing back the performance regression.

I tested the following change in the method findParentPackageJSON of lib\internal\modules\package_json_reader.js and it does resolve the performance issue:

const packageJSONLocationCache = new SafeMap();
// [...]
const maybePackageJSONPath = checkPath + path.sep + 'package.json';
if (packageJSONLocationCache.has(maybePackageJSONPath)) {
  if (packageJSONLocationCache.get(maybePackageJSONPath)) {
    return maybePackageJSONPath
  } else {
    continue
  }
}
const stat = internalFsBinding.internalModuleStat(checkPath + path.sep + 'package.json');

const packageJSONExists = stat === 0;
packageJSONLocationCache.set(maybePackageJSONPath, packageJSONExists)
if (packageJSONExists) {
  return maybePackageJSONPath;
}

However I'm not sure this should be merged. If the intent of the initial change was to bring all that logic to the C++ side for performance reasons, it might be preferable to rollback all the changes done on the JS side and to fix the caching on the C++ side.

Unfortunately, I'm a JS/TS dev and I failed miserably to fix the C++ code 😢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions