Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: support require()ing synchronous ESM graphs #51977

Closed
wants to merge 0 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Mar 5, 2024

Summary

This patch adds require() support for synchronous ESM graphs under
the flag --experimental-require-module

This is based on the the following design aspect of ESM:

  • The resolution can be synchronous (up to the host)
  • The evaluation of a synchronous graph (without top-level await) is
    also synchronous, and, by the time the module graph is instantiated
    (before evaluation starts), this is is already known.

If --experimental-require-module is enabled, and the ECMAScript
module being loaded by require() meets the following requirements:

  • Explicitly marked as an ES module with a "type": "module" field in
    the closest package.json or a .mjs extension.
  • Fully synchronous (contains no top-level await).

require() will load the requested module as an ES Module, and return
the module name space object. In this case it is similar to dynamic
import() but is run synchronously and returns the name space object
directly.

// point.mjs
export function distance(a, b) { return (b.x - a.x) ** 2 + (b.y - a.y) ** 2; }
class Point {
  constructor(x, y) { this.x = x; this.y = y; }
}
export default Point;
const required = require('./point.mjs');
// [Module: null prototype] {
//   default: [class Point],
//   distance: [Function: distance]
// }
console.log(required);

(async () => {
  const imported = await import('./point.mjs');
  console.log(imported === required);  // true
})();

If the module being require()'d contains top-level await, or the module
graph it imports contains top-level await,
ERR_REQUIRE_ASYNC_MODULE will be thrown. In this case, users should
load the asynchronous module using import().

If --experimental-print-required-tla is enabled, instead of throwing
ERR_REQUIRE_ASYNC_MODULE before evaluation, Node.js will evaluate the
module, try to locate the top-level awaits, and print their location to
help users find them.

Background

There were some previous discussions about this idea back in 2019 (e.g. #49450). I I didn't go through all of them, but in 2024 I believe we can agree that not supporting require(esm) is creating enough pain for our users that we should really deprioritize the drawbacks of it. A non-perfect solution is still better than having nothing at all IMO.

There was a previous attempt in #30891 which tried to support TLA from the start and thus needed to run the event loop recursively, which would be unsafe and therefore it was closed (synchronous-only require(esm) was brought up in #30891 (comment) but the PR didn't end up going that way). I have the impression that there were some other attempts before, but non active AFAIK.

This PR tries to keep it simple - only load ESM synchronously when we know it's synchronous (which is part of the design of ESM and is supported by the V8 API), and if it contains TLA, we throw. That should at least address the majority of use cases of ESM (TLA in a module that's supposed to be import'ed is already not a great idea, they are more meant for entry points. If they are really needed, users can use import() to make that asynchronicity explicit).

When I was refactoring the module loader implementation and touching the V8 Module API to fix other issues, this idea appears to be natural to me (since ESM is really designed to have this synchronocity in mind) and does not actually need that much work in 2024 (er, with some refactorings that I already did for other issues at least..), so here is another attempt at it.

Motivation

The motivation for this is probably obvious, but I'll give my take again in case there are unfamiliar readers: CJS/ESM interop would always be done on a best-effort basis and they should not be mixed if avoidable, but today the majority of the popular packages out there in the registry are still CJS. There needs to be an escape hatch for simple cases while the transition happens.

With require(esm), when a dependency goes ESM-only, it is less likely to be a breaking change for users as long as it's a synchronous ESM (with no top-level await), which should be the case most of the time. This helps package authors transition to ESM without worrying about user experience, or having to release it as dual module which bloats the node_modules size even further and leads to identity problems due to the duplication.

The design of ESM already ensures that synchronous evaluation and therefore interop with CJS for a synchronous graph is possible (e.g. see tc39/proposal-top-level-await#61), and we won't be alone in restricting TLA for certain features(e.g. w3c/ServiceWorker#1407 service workers on the web also disallows TLA) it would be a shame not to make use of that. Ongoing proposal like import defer could also help addressing the lazy-loading needs without breaking the synchronous aspect of ESM.

TODOs

There are still some feature interactions that this implementation doesn't handle (e.g. --experimental-detect-module or --experimental-loader or --experimental-wasm-modules). Some edge cases involving cycles probably would have undefined behaviors. I don't think this needs to handle interactions with everything (especially other experimental features) perfectly to land as a first iteration of an experimental feature. We can continue iterating on it while it's experimental.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/startup
  • @nodejs/test_runner
  • @nodejs/vm

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 5, 2024
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 5, 2024
@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Mar 5, 2024
@RafaelGSS RafaelGSS added the notable-change PRs with changes that should be highlighted in changelogs. label Mar 5, 2024
Copy link
Contributor

github-actions bot commented Mar 5, 2024

The notable-change PRs with changes that should be highlighted in changelogs. label has been added by @RafaelGSS.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section.

@RafaelGSS RafaelGSS added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 5, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 5, 2024
@joyeecheung joyeecheung force-pushed the require-esm branch 2 times, most recently from 5a27731 to 612b870 Compare March 5, 2024 18:39
@nodejs-github-bot
Copy link
Collaborator

@nodejs nodejs deleted a comment from nodejs-github-bot Mar 5, 2024
@GeoffreyBooth
Copy link
Member

Before we get into the technical details, I just want to give a heartfelt THANK YOU to @joyeecheung for taking this on, and express my awe of her brilliance in figuring out how to achieve it. require of ESM has been something that we’ve been discussing off and on for years, with dreams of someday figuring out how to achieve it in a shippable way, so if this PR can result in that then we will all be very grateful; and this new functionality will be transformative for module authors, who will be able to write libraries with less worry around interoperability issues.

@GeoffreyBooth
Copy link
Member

  • currently loader hooks are ignored in this path, because they already don’t affect require() anyway

I think the hooks do affect require, since #47999. I don’t think wiring them up to this needs to be a blocker for landing this PR, but if it’s not difficult perhaps we could either warn or error if hooks are registered when a require of an ES module occurs.

WASM support

import of Wasm is still experimental and there’s no timeframe for that changing (we were waiting on spec changes the last I recall) so I wouldn’t worry about this.

.cjs exclusion, directory resolution handling

What does this mean? Doing the extension searching for .cjs and/or .mjs in the filename? I wouldn’t worry about that for this PR; anyone doing require of a .cjs file needs to add the .cjs extension today, so it would be expected to need to continue to specify it; and likewise for .mjs.

@ShogunPanda
Copy link
Contributor

I LOVE this idea. It will simplify so many things. Let's keep going with it.

@joyeecheung joyeecheung force-pushed the require-esm branch 2 times, most recently from 515d02d to 1ab2592 Compare March 6, 2024 17:52
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 6, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 6, 2024
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 6, 2024

I think the hooks do affect require, since #47999. I don’t think wiring them up to this needs to be a blocker for landing this PR, but if it’s not difficult perhaps we could either warn or error if hooks are registered when a require of an ES module occurs.

They only affect the require() in imported CJS modules, not the required ones. I think it's not a good idea to let the loader hooks affect require() the way it currently works - by spawning a separate thread to make it synchronous, it leads to similar concerns raised previously about supporting TLA in the fallback, then we are back to square one (and, unlike import syntax which is static and handled before module evaluation, require() is a regular function that can be called anywhere, so making it block on a new thread that can run any JS makes things even more complicated than TLA). Someone who thought this separate-thread behavior applies everywhere already raised this to me, I think we are lucky at least that the CJS require is unaffected and not beyond saving. For this PR, I think we can just ignore hooks when the fallback is enabled, and warn about it. (On a side note, I noticed that the require() added by #47999 is cutting corners e.g. not respecting policy...).

What does this mean? Doing the extension searching for .cjs and/or .mjs in the filename?

It means I don't know what happens when this happens, and there are not yet any test for it.

doc/api/cli.md Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
@benjamingr
Copy link
Member

Big +1 on the idea and I think bun shows this is feasible and users like it.

@anonrig
Copy link
Member

anonrig commented Mar 18, 2024

Is there a reason to why this pull-request doesn't show any changes at all?

@GeoffreyBooth
Copy link
Member

Attempted to make GitHub show this as purple by pushing to my branch after it landed, and then realized that I should’ve pushed to my branch before I pushed to main, so now GitHub thinks this is an empty PR.

If you reset your branch back to where it was before you landed it, would that restore the diff here?

@joyeecheung
Copy link
Member Author

If you reset your branch back to where it was before you landed it, would that restore the diff here?

Nope. I pushed my branch require-esm to 03bf4b3 (which was where the last the fixup commit was at) after I pushed it to 5f7fad2 (which was where the main branch at after I landed it), and GitHub doesn't change how it showed the diff. Probably shouldn't have even tried to make it purple...I guess if anyone still wants to see how the branch looked like before landing, they can check out https://github.com/joyeecheung/node/tree/require-esm

@robertsLando
Copy link

robertsLando commented Mar 19, 2024

@joyeecheung Which NodeJS version does support this? Is this backported to v18 too?

@markstos
Copy link
Contributor

@robertsLando I'd expect it to appear in the next Node.js release. Node.js features are generally not backported, only security fixes get that treatment.

@joyeecheung
Copy link
Member Author

joyeecheung commented Mar 19, 2024

I think this is backportable to v20. It should go out in v22 next month, too (it's approaching April, I don't know if there's going to be any more v21 minor release). v18 is in maintenance mode so it largely is up to the release team. If you want to try it out before a full release, you can use the nightlies since 20240319: https://nodejs.org/download/nightly/

@TotallyInformation
Copy link

Fantastic work - and so many thanks for getting to the bottom of the cjs/esm craziness that is starting to hold Node.js back. Hope this gets into a live version ASAP since it will still take some time for the versions in production systems to roll over.

@luckyyyyy
Copy link

Thank you joyeecheung for the excellent work. This PR will solve a lot of problems. We no longer need to explain the issues with CJS/ESM to others, and we don't have to worry about various compatibility configurations anymore.

rdw-msft pushed a commit to rdw-msft/node that referenced this pull request Mar 26, 2024
PR-URL: nodejs#51977
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
rdw-msft pushed a commit to rdw-msft/node that referenced this pull request Mar 26, 2024
This patch adds `require()` support for synchronous ESM graphs under
the flag `--experimental-require-module`

This is based on the the following design aspect of ESM:

- The resolution can be synchronous (up to the host)
- The evaluation of a synchronous graph (without top-level await) is
  also synchronous, and, by the time the module graph is instantiated
  (before evaluation starts), this is is already known.

If `--experimental-require-module` is enabled, and the ECMAScript
module being loaded by `require()` meets the following requirements:

- Explicitly marked as an ES module with a `"type": "module"` field in
  the closest package.json or a `.mjs` extension.
- Fully synchronous (contains no top-level `await`).

`require()` will load the requested module as an ES Module, and return
the module name space object. In this case it is similar to dynamic
`import()` but is run synchronously and returns the name space object
directly.

```mjs
// point.mjs
export function distance(a, b) {
  return (b.x - a.x) ** 2 + (b.y - a.y) ** 2;
}
class Point {
  constructor(x, y) { this.x = x; this.y = y; }
}
export default Point;
```

```cjs
const required = require('./point.mjs');
// [Module: null prototype] {
//   default: [class Point],
//   distance: [Function: distance]
// }
console.log(required);

(async () => {
  const imported = await import('./point.mjs');
  console.log(imported === required);  // true
})();
```

If the module being `require()`'d contains top-level `await`, or the
module graph it `import`s contains top-level `await`,
[`ERR_REQUIRE_ASYNC_MODULE`][] will be thrown. In this case, users
should load the asynchronous module using `import()`.

If `--experimental-print-required-tla` is enabled, instead of throwing
`ERR_REQUIRE_ASYNC_MODULE` before evaluation, Node.js will evaluate the
module, try to locate the top-level awaits, and print their location to
help users fix them.

PR-URL: nodejs#51977
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
@thedavidprice
Copy link

Amazing work @joyeecheung 🚀 This will be a huge boon for the current RedwoodJS development roadmap.

I think this is backportable to v20

If there's any way it is (and any way we can be of help), we'd make use of this immediately.

@mercmobily
Copy link

Oh my... importing ESMs from CJS? I didn't think I'd see the day.
I don't think you quite understand how many people will be in awe -- and for how long.
Please, please backport this to 20. It's just so important.

Node team: how long before this is not behind a flag anymore? This will basically kill any reason to hold back the updating of any module to ESM.

This is a dream. @joyeecheung ... thank you. Thank you. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. esm Issues and PRs related to the ECMAScript Modules implementation. lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet