Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: source hooks #30986

Closed
wants to merge 16 commits into from
Closed

esm: source hooks #30986

wants to merge 16 commits into from

Conversation

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Dec 16, 2019

This PR adds additional hooks to the ESM --loader implementation:

  • getSource allows a user to override the function that Node’s ESM loader uses to retrieve the source code for a specifier (e.g. loading a JavaScript file from disk). A loader that overrides this can do things like load from cache or memory (like tink or yarn pnp) or from other sources like https:// URLs. An example of the former is in the tests, and the latter is in the docs.

  • transformSource allows a user to mutate the source code string or buffer after it’s been retrieved via getSource but before Node does anything else with it. This allows for things like a transpiler loader (TypeScript, CoffeeScript, JSX, Babel, etc.). This is separate, and intentionally smaller in scope, than a potential translate hook which could override one of the existing translators (such as for module, commonjs, wasm etc.; see lib/internal/modules/esm/translators.js).

See discussion in nodejs/modules#351. cc @nodejs/modules-active-members

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Copy link
Member

bmeck left a comment

Some changes, but I'd be hesitant to land these changes if we want to do a more serious overhaul.

doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
@bmeck

This comment has been minimized.

Copy link
Member

bmeck commented Dec 16, 2019

I talked with @GeoffreyBooth and am fine with landing this as long as we make the potential for change much more visible as we have discussed other hooks and I synced up with him about how these hooks are different from the design docs hooks we have previously discussed. We should sync up about the design of these hooks being explicitly written out, but this is fine as long as we go with an iterative approach and do not commit to these as final hooks.

Copy link
Member

bmeck left a comment

dismissing except for nit, seems ok as we are being more explicit as these are likely to change and merging this PR does not cement the hooks.

@bmeck bmeck dismissed their stale review Dec 16, 2019

agreed to iterate elsewhere afterwards/document instability

@devsnek

This comment has been minimized.

Copy link
Member

devsnek commented Dec 16, 2019

Do you think you could set up a benchmark? It would be good to keep a handle on how our loader performance is, and this seems like it might add a lot of overhead (not that I'm saying we shouldn't land it, but rather we should just be aware).

@GeoffreyBooth

This comment has been minimized.

Copy link
Member Author

GeoffreyBooth commented Dec 16, 2019

Do you think you could set up a benchmark?

That would be great, but I’ve never done benchmarks in the Node codebase. Is there something you can point me to to show me how? Can that be its own PR?

Also more broadly how would that work? Like I would assume that the benchmarked case is “no custom loaders,” so is what we’re benchmarking just how much slower that case becomes as more hooks are added? Or is your thinking to benchmark “no loaders” vs “with custom loader A” or B or C etc.?

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from debab7b to 129a757 Dec 16, 2019
@devsnek

This comment has been minimized.

Copy link
Member

devsnek commented Dec 16, 2019

@GeoffreyBooth benchmarks are here: https://github.com/nodejs/node/tree/master/test/benchmark. From what I can tell they run a specified test a bunch of times. The case I have in mind at the moment is "resolve big tree without hooks", specifically I'm eyeing the added await and two branches in each translator path, which is why I thought it would be good to add the benchmark now. Adding it later also works too.

@GeoffreyBooth GeoffreyBooth mentioned this pull request Dec 16, 2019
@GeoffreyBooth

This comment has been minimized.

Copy link
Member Author

GeoffreyBooth commented Dec 16, 2019

Adding it later also works too.

Thanks. I’m not sure when I’ll have time for that, so if you don’t mind I’d prefer to do that as a follow-up PR (or anyone else is welcome to).

Copy link
Contributor

guybedford left a comment

Amazing work moving this forward!

Would be nice to use the direct hook functions like with resolve, instead of having the custom wrappers, see the review comments.

I would still prefer to only have a getSource hook and not also add a transform hook, since the ability to transform is enabled by the hook composition model, and I don't like the idea of endorsing the concept of transform hooks in the API explicitly, as opposed to simply enabling them.

doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
Copy link
Contributor

guybedford left a comment

Thinking about this a little further, I think this is a useful opportunity to also move the format return to the getSource hook.

The reason for this is that format being returned by the resolver was always due to us not having a source hook, and having the source hook return the format is much more useful in the loader workflows (like HTTP loaders).

I think it's important to get that in as part of this.

@guybedford

This comment has been minimized.

Copy link
Contributor

guybedford commented Dec 17, 2019

As discussed further with @GeoffreyBooth I'd be happy to have a getFormat hook that can be called after resolve, and before getSource.

@GeoffreyBooth

This comment has been minimized.

Copy link
Member Author

GeoffreyBooth commented Dec 17, 2019

As discussed further with @GeoffreyBooth I'd be happy to have a getFormat hook that can be called after resolve, and before getSource.

So the current flow works like this:

  1. The ESM loader receives a specifier, e.g. 'some-pkg/some-export'
  2. resolve converts that specifier into a file URL, e.g. 'file:///.../node_modules/some-pkg/lib/some-export.js'
  3. resolve additionally determines the format by looking at the file extension of the URL in 2 and possibly referencing package type. format in this case is the name of one of the translators: module, commonjs, builtins, wasm, json, dynamic
  4. Based on the format returned, one of the translators is chosen and passed the URL
  5. Inside the translator, getSource retrieves file contents for the given URL (and transformSource possibly mutates the contents)

So a getFormat hook would split out that logic from inside resolve (i.e. make step 3 above its own hook). I think that would be all we do for the first version of the hook.

Later on we probably want the “determine format” logic to be possibly set or overridden by the source returned in getSource, for example if getSource makes an HTTP call and we want the format set based on the headers in the response. I would still want getFormat to be its own hook, but we’d have to reorder the steps of the flow or make it optional when the format is determined. That’s a bigger refactor, possibly a redesign of this whole flow, that I think would be a separate effort.

@coreyfarrell

This comment has been minimized.

Copy link
Member

coreyfarrell commented Dec 17, 2019

I would still prefer to only have a getSource hook and not also add a transform hook, since the ability to transform is enabled by the hook composition model, and I don't like the idea of endorsing the concept of transform hooks in the API explicitly, as opposed to simply enabling them.

Combining them would cause problems once adding multiple hooks is supported. getSource callbacks are not necessarily expected to call the defaultGetSource but every registered transform hook should be run on every source. Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

@guybedford

This comment has been minimized.

Copy link
Contributor

guybedford commented Dec 17, 2019

Combining them would cause problems once adding multiple hooks is supported. getSource callbacks are not necessarily expected to call the defaultGetSource but every registered transform hook should be run on every source.

Yes, with the getSource model, we can still have both classes of hooks, but they follow different rules - source hooks ignore defaultGetSource, while transform hooks implemented via getSource always chain with defaultGetSource (which under loader composition would be the parent loader return value).

Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

Great question - the primary reason is a human / branding one. Including a transform hook explicitly in the Node.js documentation acts as an advertisement to users to write their own transform loaders, and that leads to a bad time unless loader authors are highly skilled at implementing caching, from my experience with SystemJS.

@bmeck

This comment has been minimized.

Copy link
Member

bmeck commented Dec 18, 2019

I've extracted a getFormat hook based upon conversations. There is some dancing going on so that old resolve hooks are properly maintained to minimize migration breakage. Ideally we can remove some of the odd ordering of operations in the future when we fully remove format from resolve.

Copy link
Contributor

guybedford left a comment

Looking good.

doc/api/esm.md Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
'ExperimentalWarning');
format = legacyExtensionFormatMap[ext];
}
return { format: format || null };

This comment has been minimized.

Copy link
@guybedford

guybedford Dec 18, 2019

Contributor

Do we need to make this an object, or can we just permit a string return?

This comment has been minimized.

Copy link
@GeoffreyBooth

GeoffreyBooth Dec 18, 2019

Author Member

I guess it depends on if we might extend this in the future to return more data? Perhaps data or functions to pass between hooks?

This comment has been minimized.

Copy link
@bmeck

bmeck Dec 18, 2019

Member

For getFormat in particular I feel that this should be an object while we sort out how things like bodies can be ferried between to avoid double parsing/fetching.

This comment has been minimized.

Copy link
@guybedford

guybedford Dec 18, 2019

Contributor

Can we perhaps still support the direct string as a sugar case though?

This comment has been minimized.

Copy link
@bmeck

bmeck Dec 23, 2019

Member

I'm not the biggest fan of polymorphic return types since that means each stage has to do branching marshaling to/from strings. If we keep it the same as the value returned by getDefaultFormat that seems better than branching to me. I could be swayed otherwise but would rather start throwing on invalid returns like the documentation example has/had rather than trying to marshal multiple types. I think an alternative of moving getFormat to be combined with getSource would remove the need for this data ferrying as well and might be more palpable to myself.

@guybedford guybedford dismissed their stale review Dec 18, 2019

getFormat hook added

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from 5734d54 to 75ab14a Dec 18, 2019
Copy link
Member

coreyfarrell left a comment

About the defaultGetFormat / defaultGetSource passed to the hook, I think these will eventually need to be async functions. Right now only a single loader hook is supported but when they can be stacked the defaultFn argument will not be the node.js function but will be an async function provided by another loader hook.

doc/api/esm.md Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/default_resolve.js Outdated Show resolved Hide resolved
@coreyfarrell

This comment has been minimized.

Copy link
Member

coreyfarrell commented Dec 20, 2019

Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

Great question - the primary reason is a human / branding one. Including a transform hook explicitly in the Node.js documentation acts as an advertisement to users to write their own transform loaders, and that leads to a bad time unless loader authors are highly skilled at implementing caching, from my experience with SystemJS.

@guybedford I'm worried that if transformSource is omitted transform hooks will get blocked/ignored by fetch hooks. Specifically I'm concerned that this will introduce a situation where a fetch hook added in the incorrect order will cause the nyc transform (instrument) function to simply never run. No error will be provided to the user, they just won't get coverage.

I appreciate your not wanting to encourage unnecessary transform hooks but can this be solved by documentation warnings? In addition I have not seen firm plans for an API to add loader hooks. At one point I was hoping for an API to enable require('./install-a-loader-hook.js') but your comment about SystemJS has given me reason to reconsider this. If --loader will be the only way to add a hook this is essentially a permanently flagged feature.

@arcanis

This comment has been minimized.

Copy link
Contributor

arcanis commented Dec 21, 2019

When running yarn install, we generate a single .pnp.js file that contains the CJS loader and inject it in the environment through --require. For the loader API to be viable for us we'd need this .pnp.js file to keep working as it does while also being a valid loader. I'm not sure that's the case: the .pnp.js file will be CJS (and need to, for backward compat), whereas the loader has to be ESM from what I gather.

Why are loaders required to be ESM? As it been considered to also support CJS-exported loaders?

@frank-dspeed

This comment has been minimized.

Copy link

frank-dspeed commented Dec 29, 2019

maybe look into my implamentation https://github.com/direktspeed/esm-loader it works without hooks and is highly configurable via parsers

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from 7001cb5 to efa3987 Jan 2, 2020
@GeoffreyBooth

This comment has been minimized.

Copy link
Member Author

GeoffreyBooth commented Jan 2, 2020

I pushed some commits last night but this is still WIP. A few TODOs:

  • getFormat currently needs to be synchronous, but we should allow a getFormat hook to be async.

  • Since getFormat always returns an object, e.g. {format: 'module'}, the other new hooks should also always return objects, e.g. {source: '...'}.

  • We should probably remove loader from all of the hook signatures.

  • How do people feel about the unified signatures across all hooks? They’re now all hook(options, defaultHook) e.g. resolve({ specifier, parentURL }, defaultResolve) and so on. And all would return objects.

doc/api/esm.md Show resolved Hide resolved
@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from 3cd5505 to 4db2069 Jan 4, 2020
@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from 2aded31 to 7279d14 Jan 5, 2020
@GeoffreyBooth

This comment has been minimized.

Copy link
Member Author

GeoffreyBooth commented Jan 5, 2020

I updated the docs to move the loader examples into their own section, where I think they make much more sense rather than haphazardly spread across multiple hooks’ sections. I think that’s everything, @guybedford please let me know if there’s anything else.

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:source-hooks branch from 7279d14 to b1b2df4 Jan 6, 2020
@nodejs-github-bot

This comment has been minimized.

@guybedford

This comment has been minimized.

Copy link
Contributor

guybedford commented Jan 6, 2020

Once CI completes it looks like we will be good to land this. Further final reviews welcome.

@bmeck
bmeck approved these changes Jan 6, 2020
guybedford added a commit that referenced this pull request Jan 6, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@guybedford

This comment has been minimized.

Copy link
Contributor

guybedford commented Jan 6, 2020

Landed in 2551a21.

@guybedford guybedford closed this Jan 6, 2020
@GeoffreyBooth GeoffreyBooth deleted the GeoffreyBooth:source-hooks branch Jan 6, 2020
@coreyfarrell coreyfarrell mentioned this pull request Jan 7, 2020
4 of 4 tasks complete
MylesBorins added a commit that referenced this pull request Jan 16, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@codebytere codebytere mentioned this pull request Jan 16, 2020
codebytere added a commit that referenced this pull request Jan 20, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* doc:
  * add GeoffreyBooth to collaborators (Geoffrey Booth) #31306
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* doc:
  * add GeoffreyBooth to collaborators (Geoffrey Booth) #31306
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257
* Added new collaborators:
  * [GeoffreyBooth](https://github.com/GeoffreyBooth) - Geoffrey Booth. #31306

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257
* Added new collaborators:
  * [GeoffreyBooth](https://github.com/GeoffreyBooth) - Geoffrey Booth. #31306

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257
* Added new collaborators:
  * [GeoffreyBooth](https://github.com/GeoffreyBooth) - Geoffrey Booth. #31306

PR-URL: #31382
MylesBorins added a commit that referenced this pull request Jan 28, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
BethGriggs added a commit that referenced this pull request Feb 6, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants
You can’t perform that action at this time.