Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Rediscussion for out-of-order CJS exec? #509

Closed
guybedford opened this issue May 4, 2020 · 62 comments
Closed

Rediscussion for out-of-order CJS exec? #509

guybedford opened this issue May 4, 2020 · 62 comments

Comments

@guybedford
Copy link
Contributor

guybedford commented May 4, 2020

I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.

I can revive a former PR for this which is still sitting around somewhere no problem.

The two "problematic semantics" as far as this group is concerned are:

  1. CJS executing during the instantiation phase instead of during the execute phase.
  2. Because of the getter triggers I would also prefer to move the "wrapper creation" to be lazily done on the first import, instead of for every CJS module. That is, doing the exports snapshot at import time, even if it was required earlier. The reason being that otherwise an app that loads 100s of CommonJS modules snapshotting all of those on startup will both be slower and also may trigger lots of unnecessary warning and error side effects.

I have no problem with either of these semantics and never have.

Will other members of this group finally reconsider their positions in the name of practicality for users?

This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.

@guybedford guybedford added the modules-agenda To be discussed in a meeting label May 4, 2020
@giltayar
Copy link

giltayar commented May 4, 2020

I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were not the same as the exports, throws an exception.

That would sound to me like an (almost) perfect solution.

@guybedford
Copy link
Contributor Author

guybedford commented May 4, 2020

@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:

  1. export * from 'a'; export * from 'b' is supposed to throw when a and b both export the same named export. Allowing the names to be user/consumer-defined means this validation phase needs to be post-handled / ignored somehow.
  2. import * as ns in a cycle can expose a module namespace object for a CJS module with the exports being displayed before the execution of that CJS module has completed. Since the exposed exports are then the user definitions this exposes non-determinism despite any later execution error.

The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus.

The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities.

@GeoffreyBooth
Copy link
Member

yes this was the dynamic modules proposal

So this proposal determined the names of the CommonJS exports by looking at what was imported, e.g. in import statements; did we ever consider determining the names of the CommonJS exports by parsing the CommonJS?

I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just parsing it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much.

@ljharb
Copy link
Member

ljharb commented May 4, 2020

Is there a way to reuse the results of the parse to minimize the perf hit of that?

@giltayar
Copy link

giltayar commented May 4, 2020

I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which will happen), a module that worked in one version of Node, will stop working in another.

Too non-deterministic for my taste.

@giltayar
Copy link

giltayar commented May 4, 2020

but this approach was shot down at TC39 after multiple attempts at consensus

True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM.

@jkrems
Copy link
Contributor

jkrems commented May 4, 2020

then you can definitely go the "dynamic modules" route

The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution".

Afaik the current state is:

  • Early Execution: Technical side works fine, allows full named exports as long as they exist after initial execution. Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree. Some race conditions (potentially, likely rare) if we don't snapshot the exports of every CJS after first execution. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • (Partial) Parse: Technical side works fine, allows named exports but only heuristically if they can be determined from a parse (top level parse?). Every imported CJS file has to be parsed twice (once by module loader, once by V8). Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • Dynamic Module: Technical side requires buy-in from V8 which requires buy-in from TC39. The weirdness around * exports and namespace objects which may prevent it from ever finding acceptance (see Guy's comment above). Outside of *, "perfect" bindings. CJS executes when the respective ESM node should execute. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
  • Default Only: Technical side works fine but only allows module.exports as default. No collision between module.exports.default and module.exports. CJS executes when the respective ESM node should execute. Relatively easy to simulate/recreate/modify in a userland loader.

My personal take: I think partial parse would be a reasonable compromise and JDD's esm seemed to show that the performance impact was okay, especially since it would only affect the boundaries between ESM and CJS so the majority of modules wouldn't be double-parsed.

The bad news is that interop-require conflated module.exports.default and module.exports in some places. We would have to pick module.exports for backwards compat afaict and that would create a potentially even uncannier valley than what we have today. If we're okay with a breaking change there, the parser could look out for that magic es module property and switch to module.exports.default based on it.

@jkrems
Copy link
Contributor

jkrems commented May 4, 2020

Relatively easy to simulate/recreate/modify in a userland loader.

To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out.

@bmeck
Copy link
Member

bmeck commented May 4, 2020

We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.

Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree.

I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:

  • transitive dependencies of CJS also get hoisted. it is hard to understand how things would be re-arranged by both humans and tools. I do not see this as making the user experience better.
  • workflows like console based debugging will start to get very confusing, I do not see this as making the user experience better.
  • it doesn't match how faux modules (transpiled CJS trying to emulate parts of ESM, not real ESM) work. If the goal is to achieve the same thing as faux modules, this does route not seem to be well understood or having history being tested in the wild. If tooling could provide such a historical benchmark to see breakage and/or UX it would be helpful.
  • it doesn't match what is now shipping in Node unflagged. Re-ordering things that people are depending in a stable version on is not something I feel comfortable with.
  • it cements that APMs/Polyfills/Security hardening/global error handlers/etc. need to stay CJS forever (and want import condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.

I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.

It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.

Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.

This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.

Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.

I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.

@bmeck
Copy link
Member

bmeck commented May 4, 2020

I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented May 4, 2020

@bmeck Does your take above apply to any of the options from @jkremscomment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with.

Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't think the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with default, are there any other problems with that option?

@bmeck
Copy link
Member

bmeck commented May 4, 2020

@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with export * which do require runtime info. I also had a stripped down pragma PR "use exports ...list..."; a long while back but people didn't like the perf problems it had. The complexity of a full JS parse is non-trivial and we would need to document our exact heuristic such that transpilers could target it and it could continued to be supported.

@GeoffreyBooth
Copy link
Member

So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:

  1. Find all references to a non-declared exports object (i.e. an exports variable that has no var/const/let/function parameter declaration) or a likewise non-declared module object with an exports property.
  2. For each such reference, parse the references of statically-defined properties of that object (such as exports.foo or exports['foo']) or module.exports.foo and module.exports['foo'] etc.).
  3. Also parse assignment of an object to that reference (exports = { foo: 1 }, etc.).

And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be.

So yes, the transpiler output of export * from would fall into the unsupported category; but the transpiler output of export const foo = would be parseable. Ditto export function foo() and export class foo() and export { foo }. The main exceptions that I can find in some cursory tests are export * from and export default would be unsupported, and we would call those out in the docs (as well as a traditional CommonJS export named default).

@targos
Copy link
Member

targos commented May 4, 2020

Prior art: https://github.com/rollup/plugins/blob/master/packages/commonjs/src/transform.js
That module also has to handle imports, but I think it's an interesting case because users of rollup already rely on it to support named imports from CommonJS dependencies.

@MylesBorins
Copy link
Contributor

Regarding perf hit of a double parse.

It is my understanding that the fetch / parse / link phase can be done asynchronously. As CJS modules will all be terminal nodes in the graph. While I don't think we are doing this yet we could move the parsing / validation off the main thread. I also think, but could be wrong, they there might be wiggle room in the spec to begin the execution phase before the validation is finished here... This might be a bit more controversial though. I'm also curious if we could potentially code cache the result of the parsing to reuse later... if V8 does not yet have an API for something like that it might be something we could ask about.

So, with that said, I think there is likely room to explore the double parse option and ways to architect it so that the performance hit is not significant. Is there a reason it is a non-starter from a spec perspective?

@bmeck
Copy link
Member

bmeck commented May 4, 2020

I do not have any significant objections to parsing but would like to see more numbers around it. To my knowledge parsing to form the export list does not violate any spec expectations (indeed this is what ESM does and across modules no less!), but doing evaluation would be a change. If parsing is done up front, I am unsure if we would need the execution to actually be hoisted. For a variety of runtime dependent stuff like the babel output I linked, if it is setup to be cross module you might be able to fully form the heuristic graph even for export * from

@guybedford
Copy link
Contributor Author

I have previously always been against parsing for the extra overhead but can possibly get behind it if we do it lazily only on the ESM/CJS boundary.

I think it would be important that the parsing is done lazily because if we apply it to all CJS modules loaded that means even Node.js users who don't use ES modules will pay this cost - that is we should remove the injection into the ESM registry to do lazy snapshotting.

@bmeck I know you have had a problem with the idea of lazy snapshotting before, would you be able to compromise on that?

@bmeck
Copy link
Member

bmeck commented May 4, 2020 via email

@guybedford
Copy link
Contributor Author

Great, that sounds like it could well be a possibility then to me, with analysis as @GeoffreyBooth describes in #509 (comment).

@GeoffreyBooth
Copy link
Member

I know we use Acorn for parsing here and there in our internals; is it possible to use V8 itself to parse JavaScript source code and return an abstract syntax tree back to Node?

On top of that, could we then submit that AST back into V8 when it’s time to evaluate that code? Rather than asking V8 to parse the source again.

If both of these are possible, we could avoid a double parse. Even if only the first is possible, it would be preferable to using Acorn both for speed and to ensure that the parsed AST we work with matches what V8 evaluates.

@bmeck
Copy link
Member

bmeck commented May 4, 2020

V8 does not expose any AST like structure after it parses. The AST is internal to V8 and not a public API.

@MylesBorins
Copy link
Contributor

/cc @nodejs/v8 who might have opinions

@devsnek
Copy link
Member

devsnek commented May 4, 2020

I'm against the cjs parsing idea. I know of a lot of cjs code that dynamically builds exports which this would fail on. Even if we do go forward with it, we will need to make sure we parse using the exact behaviour of the parser in the engine we use, which acorn doesn't fulfill. Since people also embed node within other engines (for example graaljs), I don't think we can ever reach that.

@MylesBorins
Copy link
Contributor

MylesBorins commented May 4, 2020 via email

@devsnek
Copy link
Member

devsnek commented May 4, 2020

@MylesBorins since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

@jkrems
Copy link
Contributor

jkrems commented May 5, 2020

I wouldn’t call it “breaking” when those values are still reachable from the default export. And I would assume that the entire module.exports object would be on the default export in those cases.

@weswigham
Copy link
Contributor

weswigham commented May 5, 2020

since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good"

Is it worth mentioning that JS autocomplete/autoimport in vscode, VS, webstorm, and a bunch of other editors, is essentially predicated on the idea that "most JS cjs module exports are statically analyzable"? And that all of those go out the window when you use more dynamic export patterns, too? Don't get me wrong, I'd still rather see the dynamic modules solution happen via some means, but I wouldn't say cjs static analysis is a wholly untested concept.

@devsnek
Copy link
Member

devsnek commented May 5, 2020

@jkrems if you restructure your cjs module to use dynamically created properties and someone using esm depends on named exports it is by definition a breaking change. the graph will not validate. the application will exit with an error.

@bmeck
Copy link
Member

bmeck commented May 5, 2020

As a slight aside, I think we should avoid calling things/people the "enemy" or implying that something is inferior by nature (good vs perfect). I'm not clear on those statements on how they are actually making any claims about the arguments at hand and it is a bit confusing to read. If such a catch phrase is used can we explain the positions of why something is a better trade off more clearly and without implying that other trade offs are in some way evil?

@devsnek
Copy link
Member

devsnek commented May 5, 2020

@bmeck it actually matches the definition of guess:

estimate or suppose (something) without sufficient information to be sure of being correct.

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

Can you explain what the necessary qualification of things to be considered not to be guesswork?

For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

@bmeck
Copy link
Member

bmeck commented May 5, 2020

@devsnek

The suggestion is that we guess the properties of module.exports, using some sort of syntactic analysis.

It does not claim to do so nor does it claim to do so completely, it only claims to create bindings if assignments in the special forms listed are found. In particular it was stated:

parse the references of statically-defined properties of that object

I assume you are claiming this to be guesswork based upon "without sufficient information to be sure of being correct", however if we are only concerned with static analysis it seems we do not fulfill your definition of guesswork. Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?


For any CJS module, getExportsStatically(module) is equal to Object.keys(require(module))

So to be absolutely clear, since require(module) generates some properties at runtime that cannot be statically analyzed; therefore it is my assumption that you are against any static analysis based solution. Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature? If the claim is that it is better to only have purely guaranteed behavior to match runtime because it has some ill effect to add these bindings that might be more easy to understand your concerns.

@devsnek
Copy link
Member

devsnek commented May 5, 2020

@bmeck

Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify in which cases generating from static information would be a hinderance rather than an additive feature?

  • As I've said a few times now, it creates breaking changes to esm consumers from otherwise invisible changes to cjs module source.
  • People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

@bmeck
Copy link
Member

bmeck commented May 5, 2020

Because I can still import the other forms, even if the modules group tries to pretend they don't exist.

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem. Are you stating if we did not allow access to non-static analysis members of CJS modules it would not be a problem since we would be limited to only having one form of access and not a super/subset problem for a special form vs normal form? We could take an alternate stance that since we cannot identify runtime based values we could ban anything non-static. This is actually an interesting potential alternative trade-off we could pursue since export * and dynamic assignments to exports are semi-rare. Would you have the same objections if we limited CJS exports to purely static forms so that we do not have a mismatch?

As I've said a few times now, it creates breaking changes to ESM consumers from otherwise invisible changes to CJS module source.

Yes, but I am unclear on how it create creates the breaking change, can you elaborate? Right now it seems that this would not be a breaking change any any way except adding bindings that did not previously exist.

People consuming modules have to either read the source or try to use a named import and see if it works to use this feature.

I do not believe so, they could use the REPL to see by importing a namespace and reflection, they could always use the normal form, they could import a namespace, or use docs as well. More pointedly, the same problem is true for all of ESM bindings that are exported and CJS itself to some extent; coders need to know how they can use dependencies by some means of understanding the module being loaded in both CJS and ESM and this is not changed by any algorithm. Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

@ljharb
Copy link
Member

ljharb commented May 5, 2020

It's worth noting that since module.exports can have properties that are not IdentifierNames, as well as that are Symbols, import * as ns from 'cjs'; Reflect.ownKeys(ns) can never match Reflect.ownKeys(module.exports), so I'm not sure that's actually a meaningful bar to set.

If you narrow it down to only worrying about own string data properties of module.exports that are IdentifierNames, then the only possible discrepancy left would be dynamically added IdentifierName properties, something which absolutely could happen, but in practice is likely to be somewhere between "exceedingly rare" and "never".

@devsnek
Copy link
Member

devsnek commented May 5, 2020

@bmeck

Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem.

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

I am unclear on how it create creates the breaking change, can you elaborate?

// module 1.0.0
module.exports = {
  a: require('./x/a'),
  b: require('./x/b'),
  etc
};
// module 1.0.1
for (const name of readdirSync('./x')) {
  module.exports[name] = require(`./x/${name}`);
}

Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency?

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not. If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

@devsnek
Copy link
Member

devsnek commented May 5, 2020

@ljharb that problem is bigger than cjs (wasm exports, for example) so i'm not sure its worth worrying about that much in this context.

@bmeck
Copy link
Member

bmeck commented May 5, 2020

@devsnek

Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".

If they are importing an ESM module don't they have the same issue of needing to know what bindings are exported? I'm confused on why needing to know the exported bindings of a CJS module is problematic but ESM requires it.

for (const name of readdirSync('./x')) {
 module.exports[name] = require(`./x/${name}`);
}

Thanks for the example. So your concern is that people may accidentally create breakage by moving to more dynamic forms of code in some non-major version. We have seen some issues of this with how package.json#exports can often create breaking changes if the understanding of how the feature works is poor. Perhaps since this was acceptable for package.json#exports we could try and better understand what the difference there was an if we can also make a similar trade-off that we did if it is ameniable.

If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says module.a and module.b and etc are available for consumption. The documentation for available apis doesn't change during the version bump, because the available apis don't change. Yet in version 1.0.0, they are named exports and in 1.0.1 they are not.

I don't understand the point about it being contradiction, if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do. Stating that creating exports using a static form flagging binding names is quite similar to how the export statement also flags binding names. It might be good to have a lint rule for this, and for module authors to test their changes, but it doesn't seem like something that is unable to be understood, tested, or even done through feature detection.

If that contradiction is not enough, you can compare it to an ES module's documentation (regardless of if the module is wasm or source text or whatever), which would obviously say what the exports are.

I don't understand this comparison, this algorithm is defining a special form of static syntax. It defines what the named exports for ESM are by using that form. As @weswigham points out there is precedent and tooling relying on similar behavior. Comparing WASM and Source Text Module Records also define their exports for ESM using static syntax. How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

@devsnek
Copy link
Member

devsnek commented May 5, 2020

How is using a static syntax in CJS different from the fact that ESM also uses static syntax?

if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do.

Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm? If someone is updating their package and wants to support esm (reads our documentation, for example) we should tell them to use an esm wrapper file. @addaleax even created a tool to do it for you: https://github.com/addaleax/gen-esm-wrapper. We shouldn't tell them to write their cjs code in a way such that node can tease details from the ast.

@ljharb
Copy link
Member

ljharb commented May 5, 2020

Given that, it seems like the real concern is existing packages, not new ones, since the new ones should be using an esm wrapper.

Would it be worth partially surveying the npm registry itself, to do a comparison between the runtime "named exports" and those static analysis derived?

@MylesBorins
Copy link
Contributor

FWIW I've opened a PR to improve error messages for the lack of named exports from CJS modules as a way to improve experience in the mean time.

nodejs/node#33256

@bmeck
Copy link
Member

bmeck commented May 5, 2020

Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm?

Yes, and no. From my point of you it is more importantly about the feasibility of being able to port to ESM from faux modules. If you must port an entire dependency graph, that is much more painful that porting parts of your subgraph.

For example, if there is a module graph as follows:

Name Author Notes
A Avery Top level app. CJS: Faux Modules, named imports from B and C
B Dakota Dependency with a dependency. CJS: Faux Modules, named imports from C
C Pat Leaf Dependency. CJS

The path to removing transpilation is to get A and B off of a transpiler. Since A and B and C are authored by different people this introduces some ordering concerns.

C porting to ESM

  1. If C updates to real ESM the faux modules are unable to obtained named imports any more. Therefore an implementation in CJS must remain available (perhaps even routed by package.json#exports on the same deep specifier)
  2. If C duplicates their implementation in both ESM and CJS that means that various things like instanceof fail to function as seen in the React/GraphQL issues a while back.
  3. If C is effectively bound by consumers to keep a CJS implementation around then their ESM implementation must remain a wrapper around it in some form if they support loading as CJS.

All of this summarizes as updating C to be real ESM is quite breaking for consumers of C. Therefore the only safe path for C is to keep implementation in CJS and wrap CJS in ESM.

B porting to ESM

  1. If B is unable to use named import form to pull bindings from C it must update all import sites and this can affect existing tooling and lead to a loss of static analysis and degradation of features of tools like tree-shaking in bundlers. Since tree-shaking is very important to a variety of libraries, it might be taken as a sign to not move to ESM.
  2. B suffers the same problems as C where consumer breakage occurs.
  3. If C updates to ESM after B you get to revert all the non-named import forms; so, it is best to wait for C to update to using ESM. This encourages migration to be leaf first deep dependency migration which is a much larger timeline that changing B (which must have a CJS backed implementation anyway).

A porting to ESM

  1. Similar issues to B
  2. This at least can port all the named imports to be default imports for shallow dependencies. It gets to revert them though as they update.

By alleviating some of the back and forth of porting named imports we can at the least allow for fewer steps in porting implementation code over to ESM for top level apps. Additionally, we can avoid dependencies with their own dependencies from needing to wait on leaf dependencies to upgrade to keep tree shaking as well as reducing code churn.

@ljharb
Copy link
Member

ljharb commented May 5, 2020

@bmeck The consumer breakage concern means that named imports don't actually matter; anyone who wants to avoid breaking people will keep their implementation in CJS, with an ESM wrapper (or, if it's stateless, duplicated, with or without a build tool) for the foreseeable future. The thing that would impact what you're talking about is require('esm'). Additionally, tree-shaking either can't work on CJS regardless, or, is advanced enough that it would work the same on a destructured default import, so I don't think that's a particularly compelling argument.

@GeoffreyBooth
Copy link
Member

So to summarize, the ultimate ask here is to enable named exports from CommonJS into ESM, e.g. import { someName } from 'commonjs-package'. The original post discussed achieving this via out-of-order CommonJS execution, but it seems like there are several folks with strong objections to that because of spec concerns and CommonJS primacy concerns (especially as related to instrumentation).

One option that seems to still potentially be viable is to attempt a static analysis of the CommonJS package to determine its exports that way, with the understanding that dynamically-named exports and unconventionally defined exports will be missed. There are concerns to this approach around that it doesn't necessarily catch all CommonJS exports, even if we document that it isn't trying to support all possible CommonJS exports; and that what exports are discovered might change over time if we change the static analysis algorithm.

The one other option I remember ever being discussed was double execution of CommonJS modules, with the initial execution during the linking (?) phase being only for the purpose of determining the exports. This was dismissed because of the possibility of side effects, e.g. a file being created or deleted twice or text being printed to the console twice, etc. However it occurs to me that there might be a way around that problem: what if there was a way to evaluate CommonJS code without causing side effects? Like if you could evaluate a CommonJS package but sandbox its access to the file system or network, and throw away its STDOUT/STDERR, that would be a way to evaluate it without causing side effects. I don't know how or if this could be possible within Node, but I remember discussing the possibility of package managers adding an ESM wrapper to CommonJS packages via something like evaluating the package within a Docker container. Is there a way to do something similar in Node?

Even that has a caveat in that some packages might behave differently in such an environment versus the unrestricted access-to-everything environment that Node normally provides, so there would still be the possibility of inconsistency between the exports available in an all-CommonJS environment and the ones discovered for use in ESM. Ultimately I think the way to settle this is for someone to write the code for either of these approaches, and try it out on the top 1000 or so npm packages, comparing the detected named exports via this new method against what's found via Reflect.ownKeys(require('commonjs-package')). If there are lots of packages where the two approaches differ, then the new method is too unreliable to use (or more precisely, nonstandard ways of defining CommonJS exports are too widely used for that method to be considered reliable enough for common use). But hopefully that won't be the case.

Ultimately the question isn't whether a detection method is good enough, or if we're willing to accept less than 100% accuracy; it's a question of which user experience is the least bad of our options. The current user experience, where only the default export is supported from CommonJS packages, is pretty bad and people are complaining about it. Would a different user experience that supplied named exports but with exceptions be worse? (Obviously, it depends on the exceptions and how many there are and how common each exception is.) I think the only way to reasonably answer this question is to write the code and run it on as wide a dataset as possible, so that we can make an informed decision. Having the data would also provide us with a good answer to users when they come asking why they can only get a default export, or why dynamic exports aren't supported, or why ESM has whatever incompatibility they find as compared with pure CommonJS or a transpiler environment.

@LeszekSwirski
Copy link

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

@weswigham
Copy link
Contributor

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I mean, arguably they already can, as they may have been cached by another module graph which executed prior to the currently resolving one.

@LeszekSwirski
Copy link

Right, that's (one of the reasons) why I ask.

@bmeck
Copy link
Member

bmeck commented May 8, 2020

It should be noted that a few years ago while google was trying to ship ESM and testing the viability of it in real world productions workflows in the browser, they wanted to do out of order evaluation of purely ESM. They have a document at https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/view ; however, as the document explains they chose not to do so for a variety of reasons. This also came up again this year with a similar result.

@jkrems
Copy link
Contributor

jkrems commented May 8, 2020

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I think the problem is less that CJS may execute before ESM but that it always executes before any ESM file may execute in the graph. That means that certain kinds of code (bootstrapping, polyfills) can only ever be written as ESM if there's not a single CJS file involved in the entire program. Which is... a tall order.

@MylesBorins
Copy link
Contributor

Removing from module agenda for right now, feel free to re add

@bmeck
Copy link
Member

bmeck commented Aug 4, 2020

@guybedford are we ok closing this?

@bmeck bmeck closed this as completed Aug 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests