Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Improve ExportMap.for performance on larger codebases #2756

Merged
merged 1 commit into from Apr 12, 2023

Conversation

leipert
Copy link
Contributor

@leipert leipert commented Apr 8, 2023

While looking at a larger code base https://gitlab.com/gitlab-org/gitlab, ExportMap.for seems to take a significant amount of time. More than half of it is spend on calculating hashes with hashObject.

Digging a little deeper, it seems like we are calling it around 500 thousand times. Each iteration calculates the hash of a context object. This context object is created inside of the childContext function and consists of four parts:

  • settings -> an Object itself
  • parserOptions -> an Object itself
  • parserPath -> a String
  • path -> a String.

Interestingly settings, parserOptions and parserPath rarely do change for us, so calculating their hashes on every iteration seems unnecessary. hashObject recursively calculates the hashes of each key / value pair.

So instead of doing:

cacheKey = hashObject({settings, parserOptions, parserPath, path})

We could also do:

cacheKey = parserPath +
  hashObject(parserOptions) +
  hashObject(settings) +
  path

This would be just as stable as before, although resulting in longer cache keys. parserPath and path would not need to be hashed, because they are strings and a single character change in them would result in a different cache key.

Furthermore we can memoize the hashes of parserOptions and settings, in case they didn't change compared. The equality is checked with a simple JSON.stringify.

We move this cacheKey calculation to childContext, adding the cache key to the context object. This way, we can fall back to the old calculation inside of ExportMap.for, as it is a public interface which consumers might be using.

In our code base the results speak for itself:

  • 51.59s spent in ExportMap.for, 0ms spent in childContext.
    • 16.89s is spent in node:crypto/hash update (overall)
  • 41.02s spent in ExportMap.for, 1.91s spent in childContext.
    • Almost no time spent in hashObject, actually all calls in
      our flame graph come from other code paths
    • 7.86s is spent in node:crypto/hash update (overall)

So on this machine, and project, we are cutting the execution time of ExportMap.for in half. On machines, which are hashing slower, the effect might be more pronounced. Similarly machines with less memory, as the hashObject function creates a lot of tiny strings.

One side-effect here could be, that the memoization is in-efficient if the settings or parserOptions change often. (I cannot think of such a scenario, but I am not that versed in the use cases of this plugin.) But even then, the overhead should mainly be the equality checks with JSON.stringify.

@leipert
Copy link
Contributor Author

leipert commented Apr 8, 2023

This change is quite similar to #2755, but might be a little controversial, because it introduces a new dependency.

I had a look at fast-deep-equal: http://unpkg.com/browse/fast-deep-equal@3.1.3/index.js and I think it's simplicity aligns well with our use case here.

@leipert leipert force-pushed the leipert-perf-exportMap-for branch 3 times, most recently from 95edc58 to b3beaef Compare April 8, 2023 21:39
@leipert leipert force-pushed the leipert-perf-exportMap-for branch from b3beaef to 148fc75 Compare April 8, 2023 21:50
src/ExportMap.js Outdated Show resolved Hide resolved
@leipert
Copy link
Contributor Author

leipert commented Apr 8, 2023

Rebased and ready for review 👍 Thank you!

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utils changes need to be in a separate commit from the main plugin changes, please.

package.json Outdated Show resolved Hide resolved
src/ExportMap.js Outdated Show resolved Hide resolved
src/ExportMap.js Outdated Show resolved Hide resolved
src/ExportMap.js Show resolved Hide resolved
…ases

While looking at a larger code base https://gitlab.com/gitlab-org/gitlab,
`ExportMap.for` seems to take a significant amount of time. More than
half of it is spend on calculating hashes with `hashObject`.

Digging a little deeper, it seems like we are calling it around 500
thousand times. Each iteration calculates the hash of a context object.
This context object is created inside of the `childContext` function and
consists of four parts:
- `settings` -> an Object itself
- `parserOptions` -> an Object itself
- `parserPath` -> a String
- `path` -> a String.

Interestingly `settings`, `parserOptions` and `parserPath` rarely do
change for us, so calculating their hashes on every iteration seems
unnecessary. `hashObject` recursively calculates the hashes of each
key / value pair.

So instead of doing:

```js
cacheKey = hashObject({settings, parserOptions, parserPath, path})
```
We could also do:
```js
cacheKey = parserPath +
  hashObject(parserOptions) +
  hashObject(settings) +
  path
```

This would be just as stable as before, although resulting in longer
cache keys. `parserPath` and `path` would not need to be hashed,
because they are strings and a single character change in them would
result in a different cache key.

Furthermore we can memoize the hashes of `parserOptions` and
`settings`, in case they didn't change compared. The equality is
checked with a simple `JSON.stringify`.

We move this `cacheKey` calculation to `childContext`, adding the
cache key to the `context` object. This way, we can fall back to the
old calculation inside of `ExportMap.for`, as it is a public
interface which consumers might be using.

In our code base the results speak for itself:
- 51.59s spent in `ExportMap.for`, 0ms spent in `childContext`.
    - 16.89s is spent in node:crypto/hash `update` (overall)
- 41.02s spent in `ExportMap.for, 1.91s spent in `childContext`.
    - Almost no time spent in `hashObject`, actually all calls in
      our flame graph come from other code paths
    - 7.86s is spent in node:crypto/hash `update` (overall)

So on this machine, and project, we are cutting the execution time
of `ExportMap.for` in half. On machines, which are hashing slower,
the effect might be more pronounced. Similarly machines with less
memory, as the `hashObject` function creates a lot of tiny strings.

One side-effect here could be, that the memoization is in-efficient
if the `settings` or `parserOptions` change often. (I cannot think
of such a scenario, but I am not that versed in the use cases of
this plugin.) But even then, the overhead should mainly be the
`JSON.stringify`.
@leipert
Copy link
Contributor Author

leipert commented Apr 12, 2023

@ljharb

The utils changes need to be in a separate commit from the main plugin changes, please.

Ha, I added the changes to the utils changelog by accident 🙈 Fixed.

@leipert leipert requested a review from ljharb April 12, 2023 13:01
src/ExportMap.js Outdated Show resolved Hide resolved
@leipert leipert requested a review from ljharb April 12, 2023 17:42
Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good!

@ljharb ljharb force-pushed the leipert-perf-exportMap-for branch from 14904dc to 0ae35c0 Compare April 12, 2023 17:46
@ljharb ljharb merged commit 0ae35c0 into import-js:main Apr 12, 2023
151 of 152 checks passed
@leipert leipert deleted the leipert-perf-exportMap-for branch April 13, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants