Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: ESM loaders next steps #36396

Closed
GeoffreyBooth opened this issue Dec 5, 2020 · 92 comments
Closed

module: ESM loaders next steps #36396

GeoffreyBooth opened this issue Dec 5, 2020 · 92 comments
Labels
discuss Issues opened for discussions and feedbacks. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders
Projects

Comments

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Dec 5, 2020

This issue is meant to be a tracking issue for where we as a team think we want ES module loaders to go. I’ll start it off by writing what I think the next steps are, and based on feedback in comments I’ll revise this top post accordingly.

I think the first priority is to finish the WIP PR that @jkrems started to slim down the main four loader hooks (resolve, getFormat, getSource, transformSource) into two (resolveToURL and loadFromURL, or should they be called resolve and load?). This would solve the issue discussed in #34144 / #34753.

Next I’d like to add support for chained loaders. There was already a PR opened to achieve this, but as far as I can tell that PR doesn’t actually implement chaining as I understand it; it allows the transformSource hook to be chained but not the other hooks, if I understand it correctly, and therefore doesn’t really solve the user request.

A while back I had a conversation with @jkrems to hash out a design for what we thought a chained loaders API should look like. Starting from a base where we assume #35524 has been merged in and therefore the only hooks are resolve and load and getGlobalPreloadCode (which probably should be renamed to just globalPreloadCode, as there are no longer any other hooks named get*), we were thinking of changing the last argument of each hook from default<hookName> to next, where next is the next registered function for that hook. Then we hashed out some examples for how each of the two primary hooks, resolve and load, would chain.

Chaining resolve hooks

So for example say you had a chain of three loaders, unpkg, http-to-https, cache-buster:

  1. The unpkg loader resolves a specifier foo to an url http://unpkg.com/foo.
  2. The http-to-https loader rewrites that url to https://unpkg.com/foo.
  3. The cache-buster that takes the url and adds a timestamp to the end, so like https://unpkg.com/foo?ts=1234567890.

These could be implemented as follows:

unpkg loader

export async function resolve(specifier, context, next) { // next is Node’s resolve
  if (isBareSpecifier(specifier)) {
    return `http://unpkg.com/${specifier}`;
  }
  return next(specifier, context);
}

http-to-https loader

export async function resolve(specifier, context, next) { // next is the unpkg loader’s resolve
  const result = await next(specifier, context);
  if (result.url.startsWith('http://')) {
    result.url = `https${result.url.slice('http'.length)}`;
  }
  return result;
}

cache-buster loader

export async function resolve(specifier, context, next) { // next is the http-to-https loader’s resolve
  const result = await next(specifier, context);
  if (supportsQueryString(result.url)) { // exclude data: & friends
    // TODO: do this properly in case the URL already has a query string
    result.url += `?ts=${Date.now()}`;
  }
  return result;
}

These chain “backwards” in the same way that function calls do, along the lines of cacheBusterResolve(httpToHttpsResolve(unpkgResolve(nodeResolve(...)))) (though in this particular example, the position of cache-buster and http-to-https can be swapped without affecting the result). The point though is that the hook functions nest: each one always just returns a string, like Node’s resolve, and the chaining happens as a result of calling next; and if a hook doesn’t call next, the chain short-circuits. I’m not sure if it’s preferable for the API to be node --loader unpkg --loader http-to-https --loader cache-buster or the reverse, but it would be easy to flip that if we get feedback that one way is more intuitive than the other.

Chaining load hooks

Chaining load hooks would be similar to resolve hooks, though slightly more complicated in that instead of returning a single string, each load hook returns an object { format, source } where source is the loaded module’s source code/contents and format is the name of one of Node’s ESM loader’s “translators”: commonjs, module, builtin (a Node internal module like fs), json (with --experimental-json-modules) or wasm (with --experimental-wasm-modules).

Currently, Node’s internal ESM loader throws an error on unknown file types: import('file.javascript') throws, even if the contents of that file are perfectly acceptable JavaScript. This error happens during Node’s internal resolve when it encounters a file extension it doesn’t recognize; hence the current CoffeeScript loader example has lots of code to tell Node to allow CoffeeScript file extensions. We should move this validation check to be after the format is determined, which is one of the return values of load; so basically, it’s on load to return a format that Node recognizes. Node’s internal load doesn’t know to resolve a URL ending in .coffee to module, so Node would continue to error like it does now; but the CoffeeScript loader under this new design no longer needs to hook into resolve at all, since it can determine the format of CoffeeScript files within load. In code:

coffeescript loader

import CoffeeScript from 'coffeescript';

// CoffeeScript files end in .coffee, .litcoffee or .coffee.md
const extensionsRegex = /\.coffee$|\.litcoffee$|\.coffee\.md$/;

export async function load(url, context, next) {
  const result = await next(url, context);

  // The first check is technically not needed but ensures that
  // we don’t try to compile things that already _are_ compiled.
  if (result.format === undefined && extensionsRegex.test(url)) {
    // For simplicity, all CoffeeScript URLs are ES modules.
    const format = 'module';
    const source = CoffeeScript.compile(result.source, { bare: true });
    return {format, source};
  }
  return result;
}

And the other example loader in the docs, to allow import of https:// URLs, would similarly only need a load hook:

https loader

import { get } from 'https';

export async function load(url, context, next) {
  if (url.startsWith('https://')) {
    let format; // default: format is undefined
    const source = await new Promise((resolve, reject) => {
      get(url, (res) => {
        // Determine the format from the MIME type of the response
        switch (res.headers['content-type']) {
          case 'application/javascript':
          case 'text/javascript': // etc.
            format = 'module';
            break;
          case 'application/node':
          case 'application/vnd.node.node':
            format = 'commonjs';
            break;
          case 'application/json':
            format = 'json';
            break;
          // etc.
        }

        let data = '';
        res.on('data', (chunk) => data += chunk);
        res.on('end', () => resolve({ source: data }));
      }).on('error', (err) => reject(err));
    });
    return {format, source};
  }

  return next(url, context);
}

If these two loaders are used together, where the coffeescript loader’s next is the https loader’s hook and https loader’s next is Node’s native hook, so like coffeeScriptLoad(httpsLoad(nodeLoad(...))), then for a URL like https://example.com/module.coffee:

  1. The https loader would load the source over the network, but return format: undefined, assuming the server supplied a correct Content-Type header like application/vnd.coffeescript which our https loader doesn’t recognize.
  2. The coffeescript loader would get that { source, format: undefined } early on from its call to next, and set format: 'module' based on the .coffee at the end of the URL. It would also transpile the source into JavaScript. It then returns { format: 'module', source } where source is runnable JavaScript rather than the original CoffeeScript.

Chaining globalPreloadCode hooks

For now, I think that this wouldn’t be chained the way resolve and load would be. This hook would just be called sequentially for each registered loader, in the same order as the loaders themselves are registered. If this is insufficient, for example for instrumentation use cases, we can discuss and potentially change this to follow the chaining style of load.

Next Steps

Based on the above, here are the next few PRs as I see them:

  1. Finish esm: merge and simplify loader hooks #35524, simplifying the hooks to resolve, load and globalPreloadCode.
  2. Refactor Node’s internal ESM loader’s hooks into resolve and load. Node’s internal loader already has no-ops for transformSource and getGlobalPreloadCode, so all this really entails is merging the internal getFormat and getSource into one function load.
  3. Refactor Node’s internal ESM loader to move its exception on unknown file types from within resolve (on detection of unknown extensions) to within load (if the resolved extension has no defined translator).
  4. Implement chaining as described here, where the default<hookName> becomes next and references the next registered hook in the chain.
  5. Get a load return value of format: 'commonjs' to work, or at least error informatively. See esm: Modify ESM Experimental Loader Hooks #34753 (comment).
  6. Investigate and potentially add an additional transform hook (see below).

This work should complete many of the major outstanding ES module feature requests, such as supporting transpilers, mocks and instrumentation. If there are other significant user stories that still wouldn’t be possible with the loaders design as described here, please let me know. cc @nodejs/modules

@GeoffreyBooth GeoffreyBooth added the esm Issues and PRs related to the ECMAScript Modules implementation. label Dec 5, 2020
@coreyfarrell
Copy link
Member

Combining load and transform into a single hook makes me very uncomfortable. It seems like this will result in transform hooks being silently skipped by load hooks if chained in the wrong order.

@jkrems
Copy link
Contributor

jkrems commented Dec 5, 2020

It seems like this will result in transform hooks being silently skipped by load hooks if chained in the wrong order.

Can you elaborate on this concern? It would be possible to have transform as a 2nd pass but it would create two competing ways to write a compiling loader which seems confusing.

It would also mean that it's really hard to write a loader that can be 100% certain that the code they generate for a (potentially virtual/in-memory) module definitely runs without modifications. Because even if its load hook runs first, some other loader's transform hook may mess with the generated code. And it would require a careful dance to undo that with a transform hook.

@GeoffreyBooth GeoffreyBooth added the discuss Issues opened for discussions and feedbacks. label Dec 5, 2020
@coreyfarrell
Copy link
Member

It seems like this will result in transform hooks being silently skipped by load hooks if chained in the wrong order.

Can you elaborate on this concern? It would be possible to have transform as a 2nd pass but it would create two competing ways to write a compiling loader which seems confusing.

The example https loader has a code branch which does not call next. In this case if next were babel or ts-node the code would not get transpiled as expected by the user.

As far as having two ways to transform a module technically yes but IMO it would be incorrect to implement a transform with the load hook (a bug in that loader).

It would also mean that it's really hard to write a loader that can be 100% certain that the code they generate for a (potentially virtual/in-memory) module definitely runs without modifications. Because even if its load hook runs first, some other loader's transform hook may mess with the generated code. And it would require a careful dance to undo that with a transform hook.

I might be misunderstanding on this point, I would expect all --loaders to be loaded before any are activated. Will that not be the case?

@GeoffreyBooth
Copy link
Member Author

The example https loader has a code branch which does not call next. In this case if next were babel or ts-node the code would not get transpiled as expected by the user.

I think this might be a source of confusion. In that example, the HTTPS loader’s next refers to Node’s internal load—the one that loads source code from disk. Since the URL in question is an https URL, there’s no point in asking Node to try to load it: Node would just error. Note that after the if (url.startsWith('https://')) { block, the loader does call next, to let Node handle all non-https URLs.

So if this were the only loader in use, e.g. node --loader https-loader.mjs file.mjs, any HTTPS URLs in file.mjs would be loaded by the HTTPS loader and non-HTTPS URLs would be loaded by Node, via the call to next at the end of the HTTPS loader’s load hook. So basically the HTTPS loader’s load hook runs first, and sometimes calls Node’s load hook. The HTTPS loader is therefore always run first, before the equivalent Node hook; this is how the non-chainable loaders currently work (where next is currently the equivalent of defaultLoad, a reference to Node’s internal version of the hook).

When the CoffeeSript loader is added to the chain, its next is the HTTPS loader. And if you look at CoffeeScript’s load hook, it calls next right at the start: it wants the source as returned by Node’s load or by any other loaders between CoffeeScript and Node (such as the HTTPS loader). So for example:

node --loader coffeescript-loader.mjs --loader https-loader.mjs file.coffee
  1. CoffeeScript’s load runs first. Its call to next runs the HTTPS loader’s load.
  2. Within the HTTPS loader’s load, HTTPS URLs are processed and the source/format returned; and for other URLs next is called, which runs Node’s load to ask Node to provide source/format. Either way, the source/format are returned to the CoffeeScript loader as the return value for the CoffeeScript loader’s next call.
  3. Back in the CoffeeScript load, we then transpile the source and return the source/format. Since this is the first/outermost loader, this is the “final” return value and what Node then takes to evaluate.

Every loader hook has to return the expected value, whether a string URL for resolve or { source, format } for load. If it doesn’t, Node would error. So there’s no chance of a loader author shipping a loader that breaks earlier loaders (HTTPS loader breaking the CoffeeScript loader, in this case). A loader author could short-circuit and prevent later loaders from ever getting run, like how in this example the HTTPS loader sometimes prevents Node’s load from ever evaluating. It depends on the loader whether this is appropriate; basically, if the input to next is something that could be successfully processed by Node’s version of the hook, the loader probably should call next (which might be Node’s version or it might be some other compatible loader’s) and modify that, rather than duplicating what Node can do and/or short-circuiting unnecessarily. But if the arguments to next would throw when passed to Node, as they would for the HTTPS loader here, the loader shouldn’t call next with them because then that would just cause errors.

So next in this case refers to the next registered loader, as in CoffeeScript referring to HTTPS referring to Node, where Node is always the last loader. Does this make sense?

And because the calls are explicit—the chaining happens via calls to next, not implicitly by running functions in sequence—that’s why we can’t really have a separate transformSource, as far as I can tell. In a transformSource hook, next would need to be either the next loader’s getSource or transformSource; it would get confusing pretty fast.

@coreyfarrell
Copy link
Member

When the CoffeeSript loader is added to the chain, its next is the HTTPS loader. And if you look at CoffeeScript’s load hook, it calls next right at the start: it wants the source as returned by Node’s load or by any other loaders between CoffeeScript and Node (such as the HTTPS loader). So for example:

node --loader coffeescript-loader.mjs --loader https-loader.mjs file.coffee
  1. CoffeeScript’s load runs first. Its call to next runs the HTTPS loader’s load.

  2. Within the HTTPS loader’s load, HTTPS URLs are processed and the source/format returned; and for other URLs next is called, which runs Node’s load to ask Node to provide source/format. Either way, the source/format are returned to the CoffeeScript loader as the return value for the CoffeeScript loader’s next call.

  3. Back in the CoffeeScript load, we then transpile the source and return the source/format. Since this is the first/outermost loader, this is the “final” return value and what Node then takes to evaluate.

My specific concern is what happens when someone (or something) runs:

node --loader https-loader.mjs --loader coffeescript-loader.mjs file.coffee

This would cause import of https://...file.coffee to fail due to the coffeescript-loader.mjs being skipped. Having https-loader.mjs provide getSource and coffeescript-loader.mjs provide transformSource would eliminate this type of error. In this example the damage might be somewhat limited since it would cause a clear failure. In the nyc use case we would simply fail to capture coverage but the code would still function. At most they would get a coverage threshold error.

This becomes more difficult to defend against when you consider NODE_OPTIONS="--loader https-loader.mjs".

So next in this case refers to the next registered loader, as in CoffeeScript referring to HTTPS referring to Node, where Node is always the last loader. Does this make sense?

And because the calls are explicit—the chaining happens via calls to next, not implicitly by running functions in sequence—that’s why we can’t really have a separate transformSource, as far as I can tell. In a transformSource hook, next would need to be either the next loader’s getSource or transformSource; it would get confusing pretty fast.

next given to resolve always refers to the next resolve. The same would be true of next given to getSource or transformSource, each stage of the hooks complete before any part of the following stage executes. So you are proposing 2 stages - resolve and load. I'm suggesting we need three stages - resolve, retrieve and transform.

@targos
Copy link
Member

targos commented Dec 6, 2020

Should we convert this issue to a discussion?

@GeoffreyBooth
Copy link
Member Author

Should we convert this issue to a discussion?

We could, but this is also an issue in that it has TODO items that pull requests should address (eventually to close the issue). I guess we could create a separate discussion related to this issue? My understanding (correct me if I’m wrong) is that once this becomes a discussion it can’t be converted back into an issue.

@GeoffreyBooth
Copy link
Member Author

Having https-loader.mjs provide getSource and coffeescript-loader.mjs provide transformSource would eliminate this type of error.

If you were to refactor the above coffeescript-loader.mjs into a getSource / transformSource model, note that CoffeeScript loader would still need to provide a getSource hook, in order to return format. And since HTTPS loader doesn’t call next for HTTPS links, this format returned by the CoffeeScript loader under node --loader https-loader.mjs --loader coffeescript-loader.mjs file.coffee would just get lost. So the different API doesn’t solve this problem: either version fails if the loaders are registered in the wrong order.

Keep in mind that the reason we’re moving the getFormat work into load is because sometimes we need the source in order to determine the format. Most (any?) transpilation loaders therefore would need to implement load, to define a format for their custom file types (.ts, .jsx, etc.). Once they do that, it’s natural for them to also want to transform the source in the same function, as there’s a lot less uncertainty there than if they interact with a separate chain of transform hooks.

@coreyfarrell
Copy link
Member

Having https-loader.mjs provide getSource and coffeescript-loader.mjs provide transformSource would eliminate this type of error.

If you were to refactor the above coffeescript-loader.mjs into a getSource / transformSource model, note that CoffeeScript loader would still need to provide a getSource hook, in order to return format. And since HTTPS loader doesn’t call next for HTTPS links, this format returned by the CoffeeScript loader under node --loader https-loader.mjs --loader coffeescript-loader.mjs file.coffee would just get lost. So the different API doesn’t solve this problem: either version fails if the loaders are registered in the wrong order.

Keep in mind that the reason we’re moving the getFormat work into load is because sometimes we need the source in order to determine the format. Most (any?) transpilation loaders therefore would need to implement load, to define a format for their custom file types (.ts, .jsx, etc.). Once they do that, it’s natural for them to also want to transform the source in the same function, as there’s a lot less uncertainty there than if they interact with a separate chain of transform hooks.

Could we allow transformSource to also return/alter a format? I'm not against breaking changes to the transformSource hook if that's what it would take to keep it separate. I do not think the transformSource should even have an argument for the next function, node.js should unconditionally call all registered transformation hooks in sequence. My ultimate desire is that the only way for another hook earlier in the chain to prevent my transformSource from being called is for the earlier hook to throw (abort the import).

Also sorry for my slow response, offline stuff has taken over most of my time lately.

@GeoffreyBooth
Copy link
Member Author

I do not think the transformSource should even have an argument for the next function, node.js should unconditionally call all registered transformation hooks in sequence.

My concern with this is that now we have two patterns of chaining, with the next style in addition to this alternate one. What would this even look like in practice? The transform function receives source as input, and needs to return source as output—and if it fails to return, that’s akin to failing to call next? Is the next transform function called with undefined input, or does it get the input that the previous transform didn’t transform? Is the benefit that all transform functions are always called worth the cost of understanding this very different pattern of how chaining works for transform separate from load?

Also are we running all the load functions first, then all the transform functions? So the final source returned at the end of the chained load functions would be passed into the first transform function? Or would they be called in pairs, like the output from the first loader’s load is passed into the first loader’s transform, and then that output is passed into the second loader’s load, and so on?

Basically, there are a lot of questions to be worked out when designing the API for how a separate transform would work, now that we have chaining. That’s the short version of what I ran into when Jan and I designed the new API in the top post. I’m not opposed to adding a separate transform hook, but it seems like something that can come later as a follow-up PR to the PRs suggested in the top post; it doesn’t require any changes to load, so transform can be added cleanly afterward. I would suggest that we build what’s outlined above first, which will probably undergo changes once actual code is written, and once we see how chained loaders actually turn out then we can revisit transform.

@coreyfarrell
Copy link
Member

I do not think the transformSource should even have an argument for the next function, node.js should unconditionally call all registered transformation hooks in sequence.

My concern with this is that now we have two patterns of chaining, with the next style in addition to this alternate one. What would this even look like in practice? The transform function receives source as input, and needs to return source as output—and if it fails to return, that’s akin to failing to call next? Is the next transform function called with undefined input, or does it get the input that the previous transform didn’t transform?

Would it be better for node.js to throw a TypeError if a transform gave an undefined return, reference the URL of the offending hook in the message? Honestly how node.js handles undefined return isn't hugely important to me as long as it's documented. I'll never return undefined from a transform and someone else returning undefined from a transform will not cause my hook to be skipped silently.

import CoffeeScript from 'coffeescript';

// CoffeeScript files end in .coffee, .litcoffee or .coffee.md
const extensionsRegex = /\.coffee$|\.litcoffee$|\.coffee\.md$/;

export function transform(previousResult, context) {
  // The first check is technically not needed but ensures that
  // we don’t try to compile things that already _are_ compiled.
  if (previousResult.format === undefined && extensionsRegex.test(context.url)) {
    // For simplicity, all CoffeeScript URLs are ES modules.
    const format = 'module';
    const source = CoffeeScript.compile(previousResult.source, { bare: true });
    return {format, source};
  }

  // no action so return `previousResult` which came from
  // the `load` chain / previous `transform` functions.
  return previousResult;
}

Is the benefit that all transform functions are always called worth the cost of understanding this very different pattern of how chaining works for transform separate from load?

I think it is. For someone writing a transform hook having that hook skipped is a bug. If I write a transform hook that gets silently skipped because of another hook that the end-user might not even be aware of this would make support more difficult. Keep in mind --loader is not just for end users, it will be injected by tooling. This can be done through process arguments or NODE_OPTIONS environment, nyc for example will eventually add a loader to NODE_OPTIONS for child processes (as it currently does to inject a --require option).

Also are we running all the load functions first, then all the transform functions? So the final source returned at the end of the chained load functions would be passed into the first transform function? Or would they be called in pairs, like the output from the first loader’s load is passed into the first loader’s transform, and then that output is passed into the second loader’s load, and so on?

Yes, the load chain would run first to completion, then the transform chain would run starting with the final result of the load chain.

Basically, there are a lot of questions to be worked out when designing the API for how a separate transform would work, now that we have chaining. That’s the short version of what I ran into when Jan and I designed the new API in the top post. I’m not opposed to adding a separate transform hook, but it seems like something that can come later as a follow-up PR to the PRs suggested in the top post; it doesn’t require any changes to load, so transform can be added cleanly afterward. I would suggest that we build what’s outlined above first, which will probably undergo changes once actual code is written, and once we see how chained loaders actually turn out then we can revisit transform.

I'm not strongly against this idea but I worry about loader hooks becoming stable without a separate transform hook.

@jkrems
Copy link
Contributor

jkrems commented Dec 17, 2020

Okay, my current state of thinking it through: I think it's possible to have a separate transform hook like this, at the expense of making one edge case more complicated. It would effectively split the order constraints for different kinds of loaders into three. My current understanding of "in what order need the getSource hooks run" is:

  1. "Isolated code generation": Any hook that needs complete control over specific specifiers. Think: low-level instrumentation hooks that generate virtual modules like my-system:super-fragile-code. If anything transforms that generated code, it breaks (e.g. it needs access to certain constructors and transpilation would lead to invalid runtime behavior).
  2. Code instrumentation, e.g. coverage.
  3. JS-to-JS transformations, e.g. babel. Or WASM-to-WASM transformation - intra-format transformation.
  4. Non-JS-to-JS transformation, e.g. tsc. Or non-WASM-to-WASM transformation - inter-format transformation.
  5. Additional resource loaders, e.g. load from zip file or HTTPS.

Adding transform would split this into:

  1. Type 1: Hooks that offer both transform and getSource.
  2. Type 2: Hooks that offer only transform. Within this group, in this exact order:
  • Code instrumentation.
  • JS-to-JS.
  • Non-JS-to-JS.
  1. Type 3: Hooks that offer only getSource. Within this group, in this exact order:
  • Additional resource loaders.

Type 1 hooks would have to added first. Type 2 and 3 hooks can be added interleaved but would have to be ordered within their categories. A JS-to-JS hook could be added before or after a "additional resource" hook but would have to be listed after any code instrumentation hooks and before all non-js-to-js hooks.

In other words: Code instrumentation could still not be blindly added as the first hook, it would have to be added after any and all type 1 hooks.

One consequence of splitting the precedence into these 3 groups is that type 1 hooks would be harder to write: In a system with just one pass, they could just return the exact source they require from getSource and otherwise delegate to the chain. In a two-pass system, type 1 loaders need to:

  1. Return a placeholder for getSource. This value will be ignored later and could be an empty string.
  2. Return the actual source in their transform step, ignoring whatever the argument was.

In other words: Writing a loader that ignores coverage instrumentation is still possible, just more awkward. And the sorting requirements are a little relaxed but really just for a single case: A hook for additional protocols or resource loading behaviors has a little more flexibility in where in the chain it's specified.

@GeoffreyBooth
Copy link
Member Author

These last few posts have made me more certain that we should complete the already-proposed work first before potentially adding a transform hook. I added transform as a potential sixth PR after the five already in the list at top, so it’s part of the roadmap to at least be considered.

It seems to me that transform will require a fair bit of work to scope out a design that will work for all the use cases we’re trying to enable; but not having transform most likely won’t prevent any use cases, so its addition would be optional.

The appeal of transform seems to be that under @coreyfarrell’s proposal, it would always run, whereas under the plan for resolve and load, sometimes loaders get bypassed if next isn’t called. This is already achievable without transform, by assuring that no transforming loaders are placed in order after any loaders that fail to call next. So while I see the appeal of this feature, it’s more of a convenience than something that enables previously unachievable functionality. In the node --loader coffeescript-loader.mjs --loader https-loader.mjs file.coffee example above, the CoffeeScript loader is always run because it’s first. Any loader that always calls next could go ahead of it, and the CoffeeScript loader would still always run. It’s only the loaders that come after the HTTPS loader that are at risk—and even then, maybe that’s what we might want. I can imagine that there might be some combination of loaders where we have no choice but to put the transforming loader after a not-next-calling loader, and therefore transform enables a previously-impossible scenario; but I feel like I need to see an example of what that use case would be and why transform is the best solution for it. I think we need to get several more examples of use cases and example loaders to know what problems we’re trying to solve, and that should probably come after we’ve done the earlier work to build chained resolve and load and see what other issues we have with them.

@Janpot

This comment has been minimized.

@GeoffreyBooth
Copy link
Member Author

I’m assuming the next function returns a Promise?

I think so, yes. All the hooks in the current API are async, so presumably the future resolve and load would be, too, so therefore next would also be async and need to be awaited. I updated my examples above.

@d3x0r
Copy link

d3x0r commented Jan 7, 2021

I had this report
d3x0r/JSON6#48

which the user is trying to use two experimental-loaders at once. It appears from the code that chaining doesn't happen from one loader to the next, and that only one or the other will end up getting used, depending on the order specified.

https://github.com/d3x0r/JSON6/blob/master/lib/import.mjs
I do currently have a bug that the call to defaultGetFormat() doesn't include the defaultGetFormat function itself... but that would just end up infinitely recursive if it ended up used in the next call.

@GeoffreyBooth
Copy link
Member Author

It appears from the code that chaining doesn’t happen

Chaining has not yet been implemented. That’s what this issue is about.

@JakobJingleheimer
Copy link
Contributor

Hiya, I'm "the user" mentioned by d3x0r above. I hope you don't mind me chiming in with my 2¢:

The current experimental implementation is pleasantly easy to understand and use. The "other" loader d3x0r mentioned facilitates import path aliases (ex '~/foo' instead of '../../../foo' and resolves to 'my-project/foo'), and it was elegantly simple to write. (Ordinarily the import paths are transformed/expanded by a bundler, but for running unit tests, a bundler is hugely unnecessary overhead).

I quite like @GeoffreyBooth's proposal to consolidate the hooks into the 2. I think it makes sense to consolidate them like that, but I do have a couple points:

Simple things should (continue to) be easy. If we merely want to do a small subset of what the whole loader can do, that should be easily accomplished with minimal code. For instance, if a user has some module-like file whose extension for whatever reason isn't .js or .mjs but can be handled as if it was one of those, the loader should provide a simple way to "opt-in" to Node's default handling; something like merely returning { format: 'module' } (where source is omitted, so Node consumes format and does whatever it normally would to get the source of the format it recognises/supports).

Requiring next. I think a better approach would be for Node to look for what it's expecting and have a done function instead of next:

hook returns continue? result potential use case
false no skip file (use empty string for value?) file is not needed in current circumstances
nullish yes loader did nothing: continue to next loader isn't for the file type
invalid value yes log warning (possible flag to throw and abort instead?), discard invalid value user error
valid value yes pass value to next loader expected use
done(validValue) no use validValue as final value (skipping any remaining loaders) module mocking (automated tests)

I think a separate transform is undesirable as it puts redundant cooks in the kitchen and likely will cause confusion and bugs (and/or false bug reports). From the sounds of it, there's nothing transform would do that can't be accomplished in load (if you wanted to abstract that part, create a transform function and call it in load; Node doesn't need to know your implementation details). Handling the transform in load is very straight-forward and intuitive, and I think the loader doesn't and shouldn't care how/where the output came from. Keeping the return of load authoritative seems logical and sensible. If a specific use-case presents itself that is better handled with a separate transform hook, cross that bridge then (it likely won't be a radical change, and probably backwards-compatible too).

RE the sequence for --loaders to be supplied in the command: I think it should be the order in which they get called (left-most being first, right-most being last).

@guybedford
Copy link
Contributor

@jshado1 this is a great proposal! Is this something you might be interested in collaborating on?

@JakobJingleheimer
Copy link
Contributor

@GeoffreyBooth yes!

@guybedford
Copy link
Contributor

It would be great to have your input further then.

Note one difference with a done option is that it doesn't permit "reencapsulation". That is, if a loader sets "done", but then a new instrumentation loader wants to see the whole pipeline, and return the done value, it can never get a handle as done will short-circuit. Similarly for any subsequent instrumenters. This is a benefit of functional nesting - it is both a call pattern and an unbound hierarchy of loaders that can always permit "another hook".

@JakobJingleheimer
Copy link
Contributor

I would say in that situation the answer is merely: don't. I think done would/should only be used when assured that should be the final word (like for a module mock). If a loader only wants to inspect but not influence/mutate, return nullish. That allows it to do whatever it wants to do and then step aside. done is a very special and limited use-case; with great power and all that.

@guybedford
Copy link
Contributor

So if you have a loader chain with a loader in it using done, how does a commercial APM hook into this app to be able to inspect all module resolutions?

@JakobJingleheimer
Copy link
Contributor

JakobJingleheimer commented Jan 10, 2021

I think either, expose that a different way or allow the subsequent ones to run but not change the end result (but probably not, because that could be confusing). I think injecting some monitor into the chain is not the way to go. I would expect that to be some kind of "finally" hook (that receives a dump of everything that happened) to ensure it happens after all is said and done.

@GeoffreyBooth
Copy link
Member Author

Will loaders enable unit-test mocking? For someone interested in unit-test mocking, is this best issue to follow?

You should follow jestjs/jest#9430, that might be more directly related to your use case. The idea on Node’s side is to get ESM loaders and vm in feature-complete-enough shape to support what users expect out of Jest in ESM.

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Apr 1, 2021

Ad hoc meeting to discuss custom loader hooks

Hello everyone, @JakobJingleheimer and I are interested in scheduling a call to discuss the hooks API in greater detail.

There are many interested parties, and several attempts at changing the current experimental custom loader hooks API have failed for various reasons, so we want to make sure that everyone's use cases are heard out and addressed. If you have a use case that has not yet been voiced (or you simply want to ensure that it is accounted for), we will value your input on this call.

We are hoping to be able to come to a conclusion to move #37468 forwards.

Note: This call is mainly going to focus on the public API of the hooks system, so we'd like to know what hooks y'all need.


When: Tuesday, April 6 @ 3:00 EST

/cc everyone in this thread, @nodejs/modules, @bengl, et al.

Please 👍 if you are interested in attending and I will send you a link that day

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Apr 6, 2021

Please 👍 if you are interested in attending and I will send you a link that day

We should be getting started in about 10 minutes. The link is below. Hope to see you all there!

https://meet.google.com/jod-busy-frb

@bengl @JakobJingleheimer @zackschuster @Songkeys @GeoffreyBooth @d3x0r @giltayar @Qard @Flarna


The meeting ran for about an hour and a half with additional dialogue scheduled for next week.

When: Friday, April 16 @ 11:00AM EST

@GeoffreyBooth GeoffreyBooth added this to Reference in Loaders Apr 6, 2021
@DerekNonGeneric
Copy link
Contributor

For those interested in the continued dialog, today's meeting has been rescheduled for next week at the same time.

We would like to have our conclusions from the last meeting reified in #37468, which is currently making good progress.

@Qard
Copy link
Member

Qard commented Apr 17, 2021

Oh, missed it again...some proper calendar invites would help. 🤔

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Apr 17, 2021

No problemo @Qard, hopefully a Google Calendar invitation will help everyone with adding it to their calendars.

📅 Event: Node.js Module Loaders Working Group Meeting 2021-04-23

If there is some other calendar invitation format you had in mind, please let me know.

@JakobJingleheimer
Copy link
Contributor

@DerekNonGeneric

Could not find the requested event.

@DerekNonGeneric
Copy link
Contributor

Hm, here is the whole calendar then. I've made it public, so it should work.

📅 Calendar: Node.js Module Loaders Working Group

@GeoffreyBooth GeoffreyBooth added loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team loaders Issues and PRs related to ES module loaders labels Apr 20, 2021
@GeoffreyBooth
Copy link
Member Author

Hey folks. I’ve been working on setting us up as a proper team akin to the Modules team. We now have a repo: http://github.com/nodejs/loaders.

There’s also a group: @nodejs/loaders. Please respond with 🚀 if you’d like me to add you to this group, and by extension to the new Loaders team. You’ll start receiving GitHub notifications whenever that group tag is used.

As for meetings, the way those work is that we need to use the Node Zoom account which records and streams to YouTube. I’ve asked for access to this, which won’t come until next week at the earliest. So I think we should probably postpone this week’s meeting (sorry). Also, no two Node teams’ meetings can overlap, because the Zoom/YouTube accounts can only stream one meeting at a time. The Friday 15:00 UTC meeting time overlaps with another team’s meeting, so we need to find a new regular time. Here’s the calendar of Node team meetings. There’s a gap two hours later, at 17:00 UTC / 1 pm ET / 10 am PT; I was thinking we could do this time every two weeks, starting on Friday April 30. If this works for you, please reply with 👍 , otherwise 👎 . If you can’t attend and you’d like to, please login to node-js.slack.com and discuss on the #esm channel. Depending on how that discussion goes I might create a Doodle to find a new time.

Thanks!

@GeoffreyBooth
Copy link
Member Author

Hey @nodejs/loaders, everyone who 🚀 ‘ed the previous message should be in @nodejs/loaders now. Please also take a look at https://github.com/nodejs/loaders.

I’m still waiting on the Zoom credentials, so I guess we can’t meet tomorrow. I’ll propose a new time once I have the credentials, probably at the same time either a week or two weeks from tomorrow. Following the pattern of other Node teams, the meetings will be announced via issues in the team repo; you should be notified via the @nodejs/loaders tag and if you watch the repo. Issues and PRs on any nodejs repo tagged loaders-agenda will be flagged for discussion at the meetings.

@JakobJingleheimer
Copy link
Contributor

JakobJingleheimer commented Apr 30, 2021

@GeoffreyBooth if the issue is recording the meeting, could we use a google Meet or does it have to be zoom? I have an early adopter account/organisation for gSuite, so I get most of the paid features for free; I remember Meet switched meeting recording to a premium feature a while back—I think I still get it (I can check).


Update: I can't find "recording" in settings (it must be enabled now apparently), so I'm not sure. "G Suite Legacy" is never listed in any of the support documentation, so I dunno if I'm supposed to have it (best I can find is a list of general features for G Suite Legacy).

@GeoffreyBooth
Copy link
Member Author

Hey @nodejs/loaders, I should have everything we need for our next meeting, which will be at 17:00 UTC / 1 pm ET / 10 am PT on May 14 (not tomorrow). nodejs/loaders#1 was a test of the meeting infrastructure, that the scripts that look at the Google calendar would generate the meeting agenda and so on properly, and everything appears to be working. Sorry for all the delays and confusion, and I look forward to seeing all of you next week!

@GeoffreyBooth
Copy link
Member Author

Closing this issue as discussion has moved to https://github.com/nodejs/loaders

Loaders automation moved this from Reference to Done Nov 10, 2021
@GeoffreyBooth GeoffreyBooth moved this from Done to Reference in Loaders Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders
Projects
Loaders
Reference
Development

No branches or pull requests