Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

CommonJS import interoperability decisions #264

Closed
GeoffreyBooth opened this issue Feb 12, 2019 · 31 comments
Closed

CommonJS import interoperability decisions #264

GeoffreyBooth opened this issue Feb 12, 2019 · 31 comments
Labels
discussion interoperability no/go? one-off "not pro forma" actions

Comments

@GeoffreyBooth
Copy link
Member

As @MylesBorins mentioned in #261, there are some things we need to vote on. I was thinking that we should start with the biggest topic: support for import statements/expressions bringing CommonJS files and packages into ESM. That’s probably the biggest decision we have to make, in terms of the effect it has on the implementation we’re building and eventually shipping; and making this decision will probably take up most if not all of the time in the meeting, so there’s probably no point in preparing votes on anything else. Here are some questions that I propose we put to votes at a meeting:

  1. Do you support import interoperability of CommonJS in theory? As in, could there potentially be any implementation that satisfies you for allowing import statements and import() expressions of CommonJS files and packages? (If this question is voted “no,” we can skip the rest.)

  2. Do you support full CommonJS import interoperability via two-phase execution (CJS named exports via two-phase execution ecmascript-modules#31) if that turns out to be the only CommonJS import interoperability method that wins majority support?

  3. Do you support almost-full (all but export * from) CommonJS import interoperability via dynamic modules (WIP [Do not merge] - Irp type dynamic modules ecmascript-modules#29), if that turns out to be the only CommonJS import interoperability method that wins majority support?

  4. If both of the above PRs win majority support, which one do you prefer?

  5. If neither of the above PRs win majority support, do you prefer to keep the current --experimental-modules/ecmascript-modules “default export only” CommonJS import (so import _ from 'underscore' but not import { shuffle } from 'underscore') or would you prefer to remove CommonJS import interoperability altogether?

I tried to phrase these neutrally; if people have feedback I’ll edit this post. Please feel free to make your case for your preferences in this thread 😄

@GeoffreyBooth GeoffreyBooth added modules-agenda To be discussed in a meeting interoperability discussion no/go? one-off "not pro forma" actions labels Feb 12, 2019
@ljharb
Copy link
Member

ljharb commented Feb 12, 2019

  1. yes

  2. no, i do not support violating the standard (in a way that won’t be possible to change the standard to allow)

  3. yes

  4. 3 over 2, obvs

  5. default only > > > no interop by default; additionally, i don’t think ESM is worth shipping at all without some interop by default.

@guybedford
Copy link
Contributor

@ljharb it feels like you are answering the questions here individually instead of considering what outcome you are advocating. Your ideal outcome with those answers seems to be not to ship.

Perhaps we need a question for this as well? //cc @MylesBorins @GeoffreyBooth

@ljharb
Copy link
Member

ljharb commented Feb 12, 2019

@guybedford the OP asks 5 questions, answering them individually seems appropriate? My ideal outcome would be shipping dynamic modules.

@guybedford
Copy link
Contributor

@ljharb sure, if that is how you want to answer them, although deadlock is a very real risk too. Shipping dynamic modules is not possible within our timeframes anymore.

@GeoffreyBooth
Copy link
Member Author

My intent with staggering the questions was to determine the lowest common denominator that can achieve a majority. Like if we simply ask what everyone’s ideal outcome is, options 1 through 5, none of the five might achieve a majority; but if people know that options 1, 2, and 3 each individually lack a majority then number 4 might get a majority over number 5, etc. We could add a “what’s your ideal outcome” question just to gauge general opinion, and then go through them one by one; there are plenty of ways to do this.

To put this in practical terms, @ljharb what would be your ideal outcome if dynamic modules wasn’t an option (either for practical reasons or because it simply doesn’t get a majority willing to support it)?

Basically I’ve identified four outcomes:

  1. “Full” interop via dynamic modules
  2. “Full” interop via two-phase
  3. Default-only interop
  4. No interop

Does anyone know of any other options besides these?

This can get even more complicated if we include the possibility of releasing things behind a flag, or even some version of interop by default and another behind a flag (e.g. default-only enabled without a flag and two-phase as opt-in via flag) but I was thinking we can figure that out in discussion once we see how the initial votes go.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

I have consistently and continually rejected the notion that "timeframes" matter; we should ship the right thing at whatever pace that takes.

I would be content with 3; 2 violates the spec; 4 isn't worth shipping ESM at all imo.

@evanplaice
Copy link

evanplaice commented Feb 13, 2019

👍 3

Give 'early adopters' the option to address the messiness of incomplete interop using existing tooling (ex Babel) and keep progressing toward full interop so more conservative migration projects will eventually have a stable migration path.

IMO, the important question here, which was one of the blocking concerns a year ago was that rushing ESM could potentially require breaking changes later to implement interop. AFAICT (ie and I have read damn near every singe conversation since), nothing in the current CJS (ie default imports only) interop nor in the current ESM imports implementation violates that constraint.

Update: 3 refers to 'default only interop' on the second list. Sorry for the confusion.

@evanplaice
Copy link

Well, this just happened...

blink-dev -> Intent to Implement: Import maps, basic support

@guybedford
Copy link
Contributor

guybedford commented Feb 13, 2019

@ljharb there is no spec violation here in the two-phase execution approach. Please tell me where the spec violation you see comes from? (3) is no longer possible without (a) getting consensus on the looser namespace invariant (of which three people were against) or changing the order of execution of circular references in ECMA262 modules, as well as (b) significantly refactoring the entire ECMA262 modules specification to support export * from dynamic modules while getting consensus that the virality of indeterminate names is not a problem and (c) convincing two members that source text module records that know about the existence of dynamic module behaviours is ok.

@GeoffreyBooth
Copy link
Member Author

The options have different numbers in the top comment and in my later comment. I think @ljharb and @evanplaice are referring to my later comment (3 = default exports only) while @guybedford is referring to the original (3 = dynamic modules). If you folks could just use names rather than numbers that would help, as the numbers in the top comment may change if we edit the questions.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

@guybedford we don’t need consensus necessarily; I’d prefer dynamic modules with the looser invariant come to a vote before rejecting it.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

@guybedford

potential example of problems with two-phase evaluation

(a.mjs is the entry point)

// a.mjs
console.log('a before b:', global.foo);
import './b.js';
console.log('a after b:', global.foo); // 42
import './c.mjs';
console.log('a after c:', global.foo); // Infinity
// b.js
console.log('b start:', global.foo); // undefined
global.foo = 42;
console.log('b end:', global.foo); // 42
// c.mjs
console.log('c start:', global.foo); // 42
import './d.js';
console.log('c after d:', global.foo); // NaN
global.foo = Infinity;
console.log('c end:', global.foo); // Infinity
// d.js
console.log('d start:', global.foo); // 42
global.foo = NaN;
console.log('d end:', global.foo); // NaN

If d.js and b.js are evaluated with the proper ordering, then I'd expect:

a before b: undefined
b start: undefined
b end: 42
a after b: 42
c start: 42
d start: 42
d end: NaN
c after d: NaN
c end: Infinity
a after c: Infinity

If, however, d.js and b.js are evaluated prior to a.mjs and c.mjs, I'd instead perhaps get:

b start: undefined
b end: 42
d start: 42
d end: NaN
a before b: NaN
a after b: NaN
c start: NaN
c after d: NaN
c end: Infinity
a after c: Infinity

(i say "perhaps" because I could envision both an evaluation order of b,d,a,c or b,a,d,c, and I'm not sure which the current proposal contains)

Replace "alter the global" with anything side-effecting - writing to the filesystem, making a network request, etc - and the consequence of the ordering changes is both much more dangerous, and, not in line with the ordering the spec demands, as far as I understand it.

If I'm missing something, please ping me on the appropriate issue (or reach out directly) and we can clear up any misunderstandings.

@GeoffreyBooth
Copy link
Member Author

I’d prefer dynamic modules with the looser invariant

@ljharb Is this different than the PR I linked to above? If so, how?

potential example of problems with two-phase evaluation

In your example, am I to assume that the .js files are supposed to be CommonJS? In our current implementation, all import statements of .js files in the same package scope as the importing ESM file would be treated as ESM. Perhaps you should rename them to .cjs to make your example clearer.

2 violates the spec

To be clear, in nodejs/ecmascript-modules#31 (comment) you agreed that the two-phase PR had no spec violations, so I think it’s disingenuous to claim otherwise. The order of execution concern you raise is a legitimate concern, but let’s call it what it is: a UX concern, where users might be confused by Node’s behavior. The decision to be made here is whether allowing CommonJS named exports is worth the potential confusion due to this issue.

We do have another compromise course of action: we could ship one of the “full interop” PRs as the new flagged version of --experimental-modules and see how it fares in real-world user testing. If we ship the two-phase version, for example, we could see if users are confused or have execution issues. We could always scale back to defaults-only or no interop before dropping the --experimental-modules flag.

@GeoffreyBooth
Copy link
Member Author

P.S. Not only is the order of imports not guaranteed per the spec, but the meeting notes that @ljharb linked to explain why:

The guarantee isn’t complete though; import a; import b; import c; looks like a, b then c, but if a imports c you actually get c a b.

In other words, the proposed guarantee of execution order has its own issues, which is why it wasn’t added to the spec.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

It wasn't added because at the time they weren't sure how to word it - not because the guarantee isn't supposed to be there - all linking must be done before any evaluation is done. Whether the spec happens to forbid it explicitly isn't important, because it will be made to forbid it if that's the only way to stop an implementation from defying the spirit of it, and I believe the TSC agreed with that position when the feedback was brought to them (the modules group doesn't make decisions; we make recommendations to the TSC - they may decide differently this time, but this isn't a "loophole" to be exploited.)

@devsnek
Copy link
Member

devsnek commented Feb 13, 2019

@GeoffreyBooth by order of imports i believe jordan meant the order created by performing the dfs algorithm in the spec, which is predictable and definitely something people observe and depend on as they work with esm.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

update: my hastily typed example turns out, due to import hoisting, not to illustrate any ordering change - I’ll write one up tomorrow and update the comment inline.

@bmeck
Copy link
Member

bmeck commented Feb 13, 2019

I don't think this issue is meant to discuss the details of all these potential proposed paths, maybe we can move discussions to those specific issues and/or related PRs?

@guybedford
Copy link
Contributor

All linking must be done before any evaluation is done.

All linking must be done before any evaluation of Module Records in the ECMAScript specification is done. Calling CommonJS require is not an ECMAScript module record.

Whether the spec happens to forbid it explicitly isn't important, because it will be made to forbid it if that's the only way to stop an implementation from defying the spirit of it

ECMAScript cannot define or block how or when non-ECMAScript host loaders execute their code.

@bmeck
Copy link
Member

bmeck commented Feb 13, 2019

@guybedford

Ordering can be enforced that only things already evaluated before notifying the host of starting the import are valid targets via ordering mandates like they do with shared memory. However, that is some seriously rough language to spec out for something that people are never really wanting to happen, as such an editorial note was talked about (but left out for now). Of course, preprocessing can get around this, so you also need to forbid some levels of that; the spec does include a few prohibitions already so one could be added. Still, even if it is allowed by spec, that is a bit separate from if the spec is expecting it to not occur. Even then, if we do various things against the nature of the spec, we are likely to fall into situations where our designs do not match the future designs/guarantees implied by the spec. This style of forbids clause was also brought up around Dynamic Modules and if the concerns people had about them would be alleviated by adding a forbids clause for them to be used for anything except interoperability with CJS, which also isn't something the spec can enforce via algorithms alone.

Even if we are fine with not matching the spec for intent, there are still usability problems I find often by diverging how things are expected to work such that users can be surprised. I consider surprise to be a leading reason that people call things "warts", not lack of some UX sugar. We should consider with care what we are gaining and what we are losing whenever we talk about the ideas in this issue.

The idea of ordering is particularly interesting due to it allowing people to manipulate timing, meaning if you need to run before others for some reason (security "hardening" / polyfills / instrumentation / etc.), you always want to continue writing whatever format evaluates first. The workaround for getting things to evaluate in order is to opt-out of the unconventional timing rather than opt-in. I don't think there are objections currently to allowing userland loaders doing an opt-out style behavior instead of opt-in, but it would still be against intent of the spec and the larger question is if our default behavior should go against the spec and force people to opt-in to matching spec intent.

@guybedford
Copy link
Contributor

Ordering can be enforced that only things already evaluated before notifying the host of starting the import are valid targets via ordering mandates like they do with shared memory.

So this would stop resolvers or resolver hooks from working as well then? As this is userland code that runs as part of the operation of resolving a module.

However, that is some seriously rough language to spec out for something that people are never really wanting to happen

Exactly.

Still, even if it is allowed by spec, that is a bit separate from if the spec is expecting it to not occur.

Shifting the discussions in these sorts of directions seems healthier than the black and white "is it valid by the spec" debate. The two-phase execution approach was exactly carefully designed to do the best around these expectations.

Even then, if we do various things against the nature of the spec, we are likely to fall into situations where our designs do not match the future designs/guarantees implied by the spec.

Nothing is being done against the spec that would lose guarantees in future. A concrete example would help to avoid this being just debating words at this point.

This style of forbids clause was also brought up around Dynamic Modules and if the concerns people had about them would be alleviated by adding a forbids clause for them to be used for anything except interoperability with CJS, which also isn't something the spec can enforce via algorithms alone.

The namespace invariant clause for CJS is a very different type of thing, as it is dealing with a specification concept (the module namespace, and its behaviours). CJS execution is very different as the ECMAScript specification has no concept of CJS execution, a CJS module record, or what it means to execute CJS. It's just "javascript" as far as the spec is concerned.

Even if we are fine with not matching the spec for intent, there are still usability problems I find often by diverging how things are expected to work such that users can be surprised. I consider surprise to be a leading reason that people call things "warts", not lack of some UX sugar. We should consider with care what we are gaining and what we are losing whenever we talk about the ideas in this issue.

Great, please let's move the conversation to these pragmatic directions! Let's discuss the cases, and consider them on their own merits.

The workaround for getting things to evaluate in order is to opt-out of the unconventional timing rather than opt-in. but it would still be against intent of the spec and the larger question is if our default behavior should go against the spec and force people to opt-in to matching spec intent.

If we decide to support two-phase execution, we get named exports. There is no other way to get named exports without two-phase execution at this point. Definitely let's consider whether we are happy with the properties of two-phase execution for Node.js to get named exports.

@bmeck
Copy link
Member

bmeck commented Feb 13, 2019

So this would stop resolvers or resolver hooks from working as well then? As this is userland code that runs as part of the operation of resolving a module.

No, it only would affect things running in the main Agent (thread, generally). Agents do have some kinds of ordering of operations, and the ability to execute JS code in a new position is what this is really about. ESM was not intending JS code to be interleaved here.

Our current loaders are actually executing code in an unexpected position here already, but they are intended to migrate to a different thread and this would be resolved.

Loaders such as WASM and JSON loaders are a bit of a conundrum here, as their evaluation should be side effect free and valid if you don't use JS to load them up, but our loader API as it exists in --experimental-modules does not allow that.

Since we are intending to move loaders to a different Agent, it seems like we would be going to be more compliant in this area.

Exactly.

Unsure what this means. That the reordering was never wanted to happen? Or, that the spec change was never something we wanted to invest in if no one was pursuing it?

Nothing is being done against the spec that would lose guarantees in future. A concrete example would help to avoid this being just debating words at this point.

In terms of what guarantees would be lost an example is fairly simple:

import 'a';
import 'b';

We would lose the guarantee that "if b is not already an evaluated module, no JS from b is evaluated prior to a being evaluated".

I find this similar in nature to if we had lazy futures of some kind. If we had a and b as futures, and I start a then I start b after a with both queueing JS code; I would not expect b to queue up JS code that evaluates prior to a's JS code.

The namespace invariant clause for CJS is a very different type of thing, as it is dealing with a specification concept (the module namespace, and its behaviours). CJS execution is very different as the ECMAScript specification has no concept of CJS execution, a CJS module record, or what it means to execute CJS. It's just "javascript" as far as the spec is concerned.

This violation of intent is not based upon forming of a module namespace, but on running of JS code in an unexpected position. This also goes against all tools I am familiar with and how they order evaluation even if we find the position to be satisfactory.

Definitely let's consider whether we are happy with the properties of two-phase execution for Node.js to get named exports.

Yes, both the pros and the cons. We are here to weigh them both. This is not a pure win, like many of the choices we are forced to make in this WG.

@guybedford
Copy link
Contributor

guybedford commented Feb 13, 2019

No, it only would affect things running in the main Agent (thread, generally).

But our resolver does run in the main thread Agent, and it runs lots of internal Node code.

Our current loaders are actually executing code in an unexpected position here already, but they are intended to migrate to a different thread and this would be resolved.

So you mean the violation here is simply the same violation is using --loader in the current experimental modules implementation? That we run code in the resovler? The first designs of loaders were always designed around same-thread resolver code, are you saying these were in violation as well? Where is it written that same-thread execution cannot happen unless mandated by the ECMAScript module system itself? I understand this is a nice abstract property from a security and language design perspective, but where is this so-called "invariant" coming from?

Unsure what this means. That the reordering was never wanted to happen? Or, that the spec change was never something we wanted to invest in if no one was pursuing it?

That a TC39 delegate would choose to invest time in stopping this in Node.js by creating a special note, when ECMA262 doesn't really even have the concepts to block when or when not require() can be called by userland code unless in some vague wording.

We would lose the guarantee that "if b is not already an evaluated module, no JS from b is evaluated prior to a being evaluated".

This was in answer to your comment about language guarantees that might change in future. How might the language change in future to make the above no longer possible / breaking?

This violation of intent is not based upon forming of a module namespace, but on running of JS code in an unexpected position.

I much prefer this term, "violation of intent", thank you. This was in answer to your comment that there was discussion of a note in the spec on the namespace invariant change for dynamic modules, and I was arguing that that was a very different type of spec guard to a CJS guard because the namespace is a well-defined ECMAScript concept to define while CJS is not.

Yes, both the pros and the cons. We are here to weigh them both. This is not a pure win, like many of the choices we are forced to make in this WG.

Thanks for this balanced summary - my primary complaint here is the black and white thinking of "this is not spec valid, so no one can vote to land the two-phase execution PR", which just isn't fair and doesn't cater to the nuance of the topic.

@devsnek
Copy link
Member

devsnek commented Feb 13, 2019

Where is it written that same-thread execution cannot happen unless mandated by the ECMAScript module system itself?

its assumed by the spec no code is running unless it explicitly runs code.

@guybedford
Copy link
Contributor

its assumed by the spec no code is running unless it explicitly runs code.

By "run" here we mean more "non-JS events that call functions". Lots of events trigger function calls, from system events to host calls. HOST_RESOLVE_IMPORTED_MODULE triggers a JS function to resolve the module. That JS function can do anything it wants. If it calls require, who is to say it can't?

@bmeck
Copy link
Member

bmeck commented Feb 13, 2019

But our resolver does run in the main thread Agent, and it runs lots of internal Node code.

Kind of? But it isn't reentrant into JS code except through Loaders.

So you mean the violation here is simply the same violation is using --loader in the current experimental modules implementation? That we run code in the resovler? The first designs of loaders were always designed around same-thread resolver code, are you saying these were in violation as well? Where is it written that same-thread execution cannot happen unless mandated by the ECMAScript module system itself? I understand this is a nice abstract property from a security and language design perspective, but where is this so-called "invariant" coming from?

If you are using reentrancy during these operations, then yes all those things would be unexpected. I understand this is a nice abstract property from a security and language design perspective, but where is this so-called "invariant" coming from? Mostly from historical discussions, the spec contains several locations where some form of operation is not intended but algorithms are not enforcing that restrictions. We actually went back to TC39 for clarification on this particular point. If you are talking about this being a guarantee from the spec through the algorithms a host has access to, there is not one (and particularly because writing the total order/reentrancy is prohibitively hard/unclear if if it worth the effort given no one was going in this direction). If you however talk about what was intended to be allowed, there are discussions on not wishing to have this happen.

That a TC39 delegate would choose to invest time in stopping this in Node.js by creating a special note, when ECMA262 doesn't really even have the concepts to block when or when not require() can be called by userland code unless in some vague wording.

This once again is not about calling some specific JS API or creating some specific structure, but rather about when JS code is expected to run. JS code expectations relate to if it is safe to assume operations will cause side effects. The spec algorithms have no real way of doing things like you mention; however, a complete ban on reentrancy and/or ordering of operation is possible and is similar to what the memory model does.

This was in answer to your comment about language guarantees that might change in future. How might the language change in future to make the above no longer possible / breaking?

I'm not understanding the question here. I don't have all possible futures planned out? I could see some operation needing to ensure some/all parts of the import mechanism doesn't cause side effect being possible. Perhaps if we have a resolve operation added due to things like asset references?

I much prefer this term, "violation of intent", thank you. This was in answer to your comment that there was discussion of a note in the spec on the namespace invariant change for dynamic modules, and I was arguing that that was a very different type of spec guard to a CJS guard because the namespace is a well-defined ECMAScript concept to define while CJS is not.

I still consider violating intent to be violating spec in this case, to be clear. We can use whatever wording we want in the discussion though.

@guybedford
Copy link
Contributor

Kind of? But it isn't reentrant into JS code except through Loaders.

Reentrance is almost impossible to block due to the lack of frozen intrinsics and various mutable core exposures in Node.js, although I don't need to tell you this :)

If you are talking about this being a guarantee from the spec through the algorithms a host has access to, there is not one (and particularly because writing the total order/reentrancy is prohibitively hard/unclear if if it worth the effort given no one was going in this direction). If you however talk about what was intended to be allowed, there are discussions on not wishing to have this happen.

This is what I mean by embracing the nuance of the discussion. It is a break from expectations certainly, but that needs to be weighed up against the pragmatism here.

JS code expectations relate to if it is safe to assume operations will cause side effects. The spec algorithms have no real way of doing things like you mention; however, a complete ban on reentrancy and/or ordering of operation is possible and is similar to what the memory model does.

There is currently nothing saying HOST_RESOLVE_IMPORTED_MODULE can't have side-effects on other JS objects.

I'm all for working towards these kinds of invariants in the language, in fact I'm actively working towards them creating PRs like the frozen intrinsics PR to core. But having this legacy past does not stop that future. A future version of Node could have a flag to disable those old "unsecure CJS modules" (which also do other awful things like have access to high-resolution timers, all process environment variables, full unrestricted fs and module access without any permissions model, as well as non-frozen intrinsics), and then we're there.

I still consider violating intent to be violating spec in this case, to be clear. We can use whatever wording we want in the discussion though.

If I knew nothing of this topic and was told that it "violates the spec", I simply would not vote to land it as I would be against violating the spec. That is highly misleading.

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

I agree that "violating the intent of the spec" is a more accurate phrase here; however, I don't understand why anyone would be more inclined to violate the intent of the spec than to violate the literal wording of the spec?

@Fishrock123
Copy link
Contributor

Fishrock123 commented Feb 13, 2019

  1. why is this still a question? yes. it is necessary.
  2. probably not
  3. sure
  4. (3.)
  5. must be kept.

@adrianhelvik
Copy link

adrianhelvik commented Mar 8, 2019

  1. Yes
  2. No. I prefer export default module.exports.
  3. No. Same as above.
  4. Neither really. Both have issues and I really like the current implementation.
  5. YES - keep it

I have created some pet projects using --experimental-modules, and I really like it! In my opinion, the goal should be complete legacy interop, but CommonJS should be regarded as a second class citizen.

I do miss __dirname every time I use path.resolve(new URL(import.meta.url).pathname, '..') though. So import.meta.dirname is definitely on my wish list. Great job on the interop however!

@MylesBorins
Copy link
Contributor

Based on current upstream I think we can close this. Please reopen if I am mistaken

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion interoperability no/go? one-off "not pro forma" actions
Projects
None yet
Development

No branches or pull requests

9 participants