Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: move hooks handling into separate class #45869

Merged
merged 1 commit into from
Dec 20, 2022

Conversation

GeoffreyBooth
Copy link
Member

This PR is meant to enable #44710 to get over the finish line, and could be viewed as splitting off a big chunk of the work for that PR into a smaller one.

This PR splits apart the ESMLoader class into two classes, ESMLoader and Hooks, where the latter’s file is only required (and the class instantiated) when --loader is used to register custom user loaders. As part of this, a few other lib/internal/module/esm files were changed to become lazy-loaded (cc @joyeecheung, please tell me if I’m doing this right).

This should result in a minor performance boost for the default case of no user loaders registered, as we don’t need to do as much validation for the return values of Node’s internal defaultResolve and defaultLoad hooks; those functions are also now called directly, rather than as part of a “chain” of one each. cc @mcollina

The case where user loaders are registered will now consume slightly more memory than before. Both previously and in this PR there is a check where the url property passed between hooks (as a return value from resolve and an input value for load) is validated that it can be instantiated via new URL(url) without throwing. Before this PR, there’s a check that the url we’re about to validate hasn’t already been validated by virtue of it existing in ESMLoader.moduleMap. Per this PR, we cache already-validated url values in a new SafeSet that’s attached to the Hooks class. This new data structure means that we store all of the URLs for all loaded modules twice, in ESMLoader.moduleMap and in Hooks.#validatedUrls; but I think this is unavoidable if we want this to become the basis for #44710 since we wouldn’t be able to share ESMLoader.moduleMap across the worker boundary. It’s arguable whether we need to do this validation check at all, or at least to do it between every registered hook; if we did the check only once on the final result of the resolve chain, it could stay in ESMLoader. This would be a breaking change to the current Loaders API, however, so I’m avoiding making such a change as part of this PR. (I think we should consider if it’s a change worth making, and the tradeoffs of whether the check is worth the performance cost, but I think that can come in a later PR.)

This should set the stage for another attempt at wrapping the user loaders processing into a worker thread. In #44710 the idea was to put a second ESMLoader class inside the worker and try to send instantiated Module out from the worker back to the main thread, but those proved impossible to serialize. After this PR lands, the hooks.js that this PR adds would be what gets wrapped in a worker, and all of its inputs and outputs should be serializable. (They’re not quite JSON serializable, as the load hook returns an object with a source property that could be a buffer, but these return values are much simpler than Module objects and should be within the capabilities of v8.serialize/deserialize.)

The bulk of hooks.js was just moved from ESMLoader. I avoided making unrelated changes other than adding/editing some comments and JSDocs.

cc @nodejs/loaders

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders labels Dec 15, 2022
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. process Issues and PRs related to the process subsystem. labels Dec 15, 2022
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftr these comments can 100% be ignored for now if you prefer to avoid making "non-move" changes in a "move code" PR

lib/internal/modules/esm/hooks.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/hooks.js Outdated Show resolved Hide resolved
Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could skip the creation of empty objects for context when no user hooks have been provided.

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Show resolved Hide resolved
lib/internal/modules/esm/loader.js Show resolved Hide resolved
@GeoffreyBooth
Copy link
Member Author

nit: we could skip the creation of empty objects for context when no user hooks have been provided.

@aduh95 The load in Hooks already had a default value: context = {}. So I don’t think we need to guard on the ESMLoader side as well. I added the same parameter default value to defaultLoad to match; I think that should cover it, please let me know if I’m missing something or if another way is better for some reason. I prefer to add the default value/guard in the function where we’re accessing context, rather than ahead of time before calling such a function, as the former seems more future-proof.

Should we be using { __proto__: null } here, like async load(url, context = { __proto__: null }) {? And when is it appropriate to use kEmptyObject? It seems to me like using a constant for that has the risk that the function mutates an object that’s shared between many calls of that function.

@GeoffreyBooth
Copy link
Member Author

I ran some benchmarks. Module ones seem like a wash, or slight improvement:

module/module-loader-circular.js n=10000                                                         -0.87 %       ±4.04%  ±5.37%  ±7.00%
module/module-loader-deep.js cache='false' files=1000 ext=''                                     -0.12 %       ±3.25%  ±4.32%  ±5.62%
module/module-loader-deep.js cache='false' files=1000 ext='.js'                                   2.35 %       ±3.49%  ±4.64%  ±6.04%
module/module-loader-deep.js cache='true' files=1000 ext=''                                       0.48 %       ±1.69%  ±2.26%  ±2.94%
module/module-loader-deep.js cache='true' files=1000 ext='.js'                                   -3.33 %       ±4.56%  ±6.07%  ±7.90%
module/module-loader.js cache='false' n=1000 files=500 dir='abs' name='/'                        -3.09 %       ±4.41%  ±5.89%  ±7.71%
module/module-loader.js cache='false' n=1000 files=500 dir='abs' name=''                          3.82 %       ±4.68%  ±6.26%  ±8.22%
module/module-loader.js cache='false' n=1000 files=500 dir='abs' name='/index.js'                -1.39 %       ±4.39%  ±5.85%  ±7.64%
module/module-loader.js cache='false' n=1000 files=500 dir='rel' name='/'                        -3.13 %       ±3.54%  ±4.72%  ±6.15%
module/module-loader.js cache='false' n=1000 files=500 dir='rel' name=''                         -2.56 %       ±3.82%  ±5.10%  ±6.66%
module/module-loader.js cache='false' n=1000 files=500 dir='rel' name='/index.js'          *     -5.14 %       ±4.32%  ±5.78%  ±7.59%
module/module-loader.js cache='false' n=1 files=500 dir='abs' name='/'                           -0.72 %       ±5.77%  ±7.68%  ±9.99%
module/module-loader.js cache='false' n=1 files=500 dir='abs' name=''                            -2.07 %       ±5.70%  ±7.58%  ±9.87%
module/module-loader.js cache='false' n=1 files=500 dir='abs' name='/index.js'            **      8.05 %       ±5.81%  ±7.73% ±10.06%
module/module-loader.js cache='false' n=1 files=500 dir='rel' name='/'                           -2.08 %       ±6.24%  ±8.30% ±10.80%
module/module-loader.js cache='false' n=1 files=500 dir='rel' name=''                            -1.53 %       ±6.10%  ±8.12% ±10.58%
module/module-loader.js cache='false' n=1 files=500 dir='rel' name='/index.js'                    4.07 %       ±6.74%  ±8.97% ±11.67%
module/module-loader.js cache='true' n=1000 files=500 dir='abs' name='/'                          0.07 %       ±0.76%  ±1.01%  ±1.33%
module/module-loader.js cache='true' n=1000 files=500 dir='abs' name=''                          -0.95 %       ±4.98%  ±6.63%  ±8.63%
module/module-loader.js cache='true' n=1000 files=500 dir='abs' name='/index.js'                 -1.99 %       ±4.12%  ±5.54%  ±7.33%
module/module-loader.js cache='true' n=1000 files=500 dir='rel' name='/'                          0.35 %       ±1.41%  ±1.87%  ±2.44%
module/module-loader.js cache='true' n=1000 files=500 dir='rel' name=''                          -1.59 %       ±4.16%  ±5.59%  ±7.36%
module/module-loader.js cache='true' n=1000 files=500 dir='rel' name='/index.js'                  0.83 %       ±1.43%  ±1.90%  ±2.47%
module/module-loader.js cache='true' n=1 files=500 dir='abs' name='/'                            -2.84 %       ±4.15%  ±5.54%  ±7.27%
module/module-loader.js cache='true' n=1 files=500 dir='abs' name=''                             -0.94 %       ±3.62%  ±4.86%  ±6.41%
module/module-loader.js cache='true' n=1 files=500 dir='abs' name='/index.js'                     1.91 %       ±4.65%  ±6.23%  ±8.20%
module/module-loader.js cache='true' n=1 files=500 dir='rel' name='/'                             1.49 %       ±2.93%  ±3.91%  ±5.10%
module/module-loader.js cache='true' n=1 files=500 dir='rel' name=''                              0.24 %       ±2.04%  ±2.71%  ±3.53%
module/module-loader.js cache='true' n=1 files=500 dir='rel' name='/index.js'                    -0.28 %       ±4.37%  ±5.89%  ±7.80%
module/module-require.js n=10000 type='dir'                                                      -8.33 %       ±8.71% ±11.60% ±15.12%
module/module-require.js n=10000 type='.js'                                                       0.61 %       ±7.05%  ±9.37% ±12.20%
module/module-require.js n=10000 type='.json'                                                    -1.61 %       ±6.99%  ±9.30% ±12.11%

Startup ones are maybe concerning? @joyeecheung? Could this have anything to do with moving some files to lazy-load that perhaps should instead be inside the snapshot?

misc/startup.js count=30 mode='process' script='benchmark/fixtures/require-builtins'         **      1.15 %       ±0.79% ±1.06% ±1.38%
misc/startup.js count=30 mode='process' script='test/fixtures/semicolon'                      *      1.62 %       ±1.50% ±2.00% ±2.62%
misc/startup.js count=30 mode='worker' script='benchmark/fixtures/require-builtins'                  0.17 %       ±1.12% ±1.49% ±1.94%
misc/startup.js count=30 mode='worker' script='test/fixtures/semicolon'                      **      1.45 %       ±1.00% ±1.34% ±1.76%

Copy link
Contributor

@JakobJingleheimer JakobJingleheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 huzzah! Thanks for this!

lib/internal/modules/esm/hooks.js Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@aduh95
Copy link
Contributor

aduh95 commented Dec 15, 2022

Should we be using { __proto__: null } here, like async load(url, context = { __proto__: null }) {?

My guess is that using a null-prototype object could be surprising for the user, almost everything in JS inherits from %Object.prototype%.

And when is it appropriate to use kEmptyObject? It seems to me like using a constant for that has the risk that the function mutates an object that’s shared between many calls of that function.

kEmptyObject is frozen, so it's no at risk of mutation. It's appropriate when you need a frozen empty object that's not impacted by %Object.prototype% mutation.

lib/internal/modules/esm/load.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member

As part of this, a few other lib/internal/module/esm files were changed to become lazy-loaded (cc @joyeecheung, please tell me if I’m doing this right).

The top-level scopes LGTM

@GeoffreyBooth
Copy link
Member Author

The top-level scopes LGTM

Would it be better to move say load.js into the snapshot, rather than being lazy-loaded? When is one better than the other?

@joyeecheung
Copy link
Member

joyeecheung commented Dec 16, 2022

Would it be better to move say load.js into the snapshot, rather than being lazy-loaded? When is one better than the other?

If it's going to be used by, e.g. 50%+ of every user of Node.js, then it's a good idea to move it into the snapshot. However load.js itself is problematic (top-level access to run-time states, and eager-loading a bunch of other modules that also do this, and some funny TDZ and circular dependencies in between), so refactoring of its entire dependency graph would be necessary before it can be snapshotted.

@nodejs-github-bot
Copy link
Collaborator

@aduh95

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@JakobJingleheimer
Copy link
Contributor

@GeoffreyBooth I can resolve the merge conflicts tomorrow if you'd like

@RafaelGSS
Copy link
Member

If you want to wait, yes.

@GeoffreyBooth GeoffreyBooth removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. backport-requested-v19.x labels Jan 24, 2023
@GeoffreyBooth
Copy link
Member Author

@nodejs/releasers This should now land cleanly once #43772 lands first.

@juanarbol juanarbol added the backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. label Jan 26, 2023
@juanarbol
Copy link
Member

This is not landing cleanly for v18.x as well

@GeoffreyBooth
Copy link
Member Author

This is not landing cleanly for v18.x as well

It depends on #43772; have you tried landing that first?

@ruyadorno
Copy link
Member

Same for v19.x

It depends on #43772; have you tried landing that first?

Yes, #43772 landed first.

For reference, here is the resulting patch I get once I try to land it on top of #43772 in v19.x-staging branch:

git diff
diff --cc lib/internal/modules/esm/loader.js
index a02619818ec,5c4ee53542b..00000000000
--- a/lib/internal/modules/esm/loader.js
+++ b/lib/internal/modules/esm/loader.js
@@@ -6,18 -6,11 +6,16 @@@ require('internal/modules/cjs/loader')
  const {
    Array,
    ArrayIsArray,
-   ArrayPrototypeJoin,
-   ArrayPrototypePush,
    FunctionPrototypeCall,
++<<<<<<< HEAD
 +  ObjectAssign,
 +  ObjectDefineProperty,
++=======
+   ObjectCreate,
++>>>>>>> 7738844fe35 (esm: move hooks handling into separate class)
    ObjectSetPrototypeOf,
-   RegExpPrototypeExec,
    SafePromiseAllReturnArrayLike,
    SafeWeakMap,
-   StringPrototypeSlice,
-   StringPrototypeToUpperCase,
-   globalThis,
  } = primordials;
  
  const {
@@@ -244,118 -106,13 +111,77 @@@ class ESMLoader 
      }
    }
  
++<<<<<<< HEAD
 +  /**
 +   *
 +   * @param {ModuleExports} exports
 +   * @returns {ExportedHooks}
 +   */
 +  static pluckHooks({
 +    globalPreload,
 +    resolve,
 +    load,
 +    // obsolete hooks:
 +    dynamicInstantiate,
 +    getFormat,
 +    getGlobalPreloadCode,
 +    getSource,
 +    transformSource,
 +  }) {
 +    const obsoleteHooks = [];
 +    const acceptedHooks = { __proto__: null };
 +
 +    if (getGlobalPreloadCode) {
 +      globalPreload ??= getGlobalPreloadCode;
 +
 +      process.emitWarning(
 +        'Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"'
 +      );
 +    }
 +    if (dynamicInstantiate) ArrayPrototypePush(
 +      obsoleteHooks,
 +      'dynamicInstantiate'
 +    );
 +    if (getFormat) ArrayPrototypePush(
 +      obsoleteHooks,
 +      'getFormat',
 +    );
 +    if (getSource) ArrayPrototypePush(
 +      obsoleteHooks,
 +      'getSource',
 +    );
 +    if (transformSource) ArrayPrototypePush(
 +      obsoleteHooks,
 +      'transformSource',
 +    );
 +
 +    if (obsoleteHooks.length) process.emitWarning(
 +      `Obsolete loader hook(s) supplied and will be ignored: ${
 +        ArrayPrototypeJoin(obsoleteHooks, ', ')
 +      }`,
 +      'DeprecationWarning',
 +    );
 +
 +    if (globalPreload) {
 +      acceptedHooks.globalPreload = globalPreload;
 +    }
 +    if (resolve) {
 +      acceptedHooks.resolve = resolve;
 +    }
 +    if (load) {
 +      acceptedHooks.load = load;
 +    }
 +
 +    return acceptedHooks;
++=======
+   addCustomLoaders(userLoaders) {
+     const { Hooks } = require('internal/modules/esm/hooks');
+     this.#hooks = new Hooks(userLoaders);
++>>>>>>> 7738844fe35 (esm: move hooks handling into separate class)
    }
  
-   /**
-    * Collect custom/user-defined hook(s). After all hooks have been collected,
-    * the global preload hook(s) must be called.
-    * @param {KeyedExports} customLoaders
-    *  A list of exports from user-defined loaders (as returned by
-    *  ESMLoader.import()).
-    */
-   addCustomLoaders(
-     customLoaders = [],
-   ) {
-     for (let i = 0; i < customLoaders.length; i++) {
-       const {
-         exports,
-         url,
-       } = customLoaders[i];
-       const {
-         globalPreload,
-         resolve,
-         load,
-       } = ESMLoader.pluckHooks(exports);
- 
-       if (globalPreload) {
-         ArrayPrototypePush(
-           this.#hooks.globalPreload,
-           {
-             fn: globalPreload,
-             url,
-           },
-         );
-       }
-       if (resolve) {
-         ArrayPrototypePush(
-           this.#hooks.resolve,
-           {
-             fn: resolve,
-             url,
-           },
-         );
-       }
-       if (load) {
-         ArrayPrototypePush(
-           this.#hooks.load,
-           {
-             fn: load,
-             url,
-           },
-         );
-       }
-     }
+   preload() {
+     this.#hooks?.preload();
    }
  
    async eval(
@@@ -771,35 -296,24 +365,24 @@@
     * @param {string} originalSpecifier The specified URL path of the module to
     *                                   be resolved.
     * @param {string} [parentURL] The URL path of the module's parent.
 -   * @param {ImportAssertions} [importAssertions] Assertions from the import
 +   * @param {ImportAssertions} importAssertions Assertions from the import
     *                                              statement or expression.
-    * @returns {{ format: string, url: URL['href'] }}
+    * @returns {Promise<{ format: string, url: URL['href'] }>}
     */
    async resolve(
      originalSpecifier,
      parentURL,
 -    importAssertions = ObjectCreate(null),
 +    importAssertions,
    ) {
-     const isMain = parentURL === undefined;
- 
-     if (
-       !isMain &&
-       typeof parentURL !== 'string' &&
-       !isURLInstance(parentURL)
-     ) {
-       throw new ERR_INVALID_ARG_TYPE(
-         'parentURL',
-         ['string', 'URL'],
-         parentURL,
-       );
+     if (this.#hooks) {
+       return this.#hooks.resolve(originalSpecifier, parentURL, importAssertions);
+     }
+     if (!this.#defaultResolve) {
+       this.#defaultResolve = require('internal/modules/esm/resolve').defaultResolve;
      }
-     const chain = this.#hooks.resolve;
      const context = {
-       conditions: getDefaultConditions(),
+       __proto__: null,
+       conditions: this.#defaultConditions,
        importAssertions,
        parentURL,
      };

GeoffreyBooth added a commit to GeoffreyBooth/node that referenced this pull request Feb 6, 2023
PR-URL: nodejs#45869
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
GeoffreyBooth added a commit to GeoffreyBooth/node that referenced this pull request Feb 6, 2023
PR-URL: nodejs#45869
Backport-PR-URL: nodejs#46511
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@GeoffreyBooth
Copy link
Member Author

I think this conflict is a consequence of #46083 being backported ahead of PRs that were blocked when #46083 was backported. #46511

GeoffreyBooth added a commit to GeoffreyBooth/node that referenced this pull request Mar 14, 2023
PR-URL: nodejs#45869
Backport-PR-URL: nodejs#46511
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
targos pushed a commit that referenced this pull request Mar 14, 2023
PR-URL: #45869
Backport-PR-URL: #46511
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
targos pushed a commit that referenced this pull request Nov 10, 2023
PR-URL: #45869
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
PR-URL: nodejs/node#45869
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
PR-URL: nodejs/node#45869
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders module Issues and PRs related to the module subsystem. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.