esm: utility method for detecting ES module syntax #27808

GeoffreyBooth · 2019-05-22T01:15:36Z

This PR adds a utility method to detect whether input JavaScript source code contains ES module syntax:

const { containsModuleSyntax } = require('module');

containsModuleSyntax('import { shuffle } from "lodash"'); // true
containsModuleSyntax('console.log(process.version)'); // false

containsModuleSyntax('import "./file.mjs"'); // true
containsModuleSyntax('import("./file.mjs")'); // false

The use case for this is any utility that currently uses vm.Script’s runInThisContext and might also want to support ESM input source code, and therefore would need to know when to use vm.Script versus the ES module vm.SourceTextModule (and has no definitive signal from the user as to the intended source code type, and where the risk of evaluating in the wrong module system is acceptable for the use case). Two such utilities that I’m familiar with are Babel and CoffeeScript, which each take arbitrary source code as user input (like --eval) and use vm.runInThisContext for evaluation. I would also imagine that this method would be useful to some loaders or testing frameworks.

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

ljharb · 2019-05-22T03:03:47Z

(link to nodejs/modules#330 and nodejs/ecmascript-modules#69)
cc @nodejs/modules

ljharb

It seems more useful to have a method that returns an array of possible parse goals - ie, esm, cjs, or both - which allows for future options like “wasm”, “json”, etc - but “contains module syntax” certainly does what it says on the tin.

devsnek · 2019-05-22T03:41:44Z

i am concerned about the longevity of this api, not that the api itself is a bad thing. is this the right place to put it, is there some other api in the future we might want instead, etc. perhaps flagging it would make sense for now?

lib/internal/modules/cjs/loader.js

BridgeAR · 2019-05-22T11:15:45Z

I agree with @devsnek that this API might change again. An alternative to a flag would be to mark it as experimental.

Fishrock123 · 2019-05-22T15:29:48Z

Does this api really belong in core? Can’t it be an npm module?

Especially since it’s future may be uncertain, out of core flexibility seems better to me...

GeoffreyBooth · 2019-05-22T18:06:59Z

Does this api really belong in core? Can’t it be an npm module?

Of course, any utility method could be outside core. I was thinking that this belonged in core because it’s useful in relation to vm.Script and vm.SourceTextModule. We might also find ourselves using this method in core someday, for example if we wanted to provide some kind of automatic way to choose the type for --input-type.

TimothyGu · 2019-05-22T18:43:09Z

Hmm, I think we should wait until such a usage in core becomes apparent and then discuss adding this. It seems possible right now that a future feature would in fact require a slightly different algorithm for deciding if a file is CJS or a module.

GeoffreyBooth · 2019-05-22T21:15:59Z

Ultimately I think there will be multiple algorithms for deciding how a file or a source input string should be treated. There’s no one ideal algorithm that will work for all scenarios.

The difficulty in “detecting” CommonJS is that most (all?) CommonJS files that have 'use strict' or could have 'use strict' without erroring are also valid as ESM. Such source code may reference the CommonJS globals like require, but that’s permitted in ES module code; such references would just throw, unless the reference was one like if (require). So you could make a theoretical “probably CommonJS source” method, that looks for references to these globals and possibly uses of non-strict mode syntax like with or HTML comments, and that might be a useful companion to this method. But I wouldn’t combine them. There’s value in having a method that does exactly one thing and does it fast. containsModuleSyntax doesn’t even parse the source code, it only tokenizes; and it breaks as soon as an import or export statement token is encountered. Since the current behavior of Node is to default to running all code as CommonJS, it will be useful to have a way to very quickly do “if definitively ESM, run as ESM; else run as CommonJS” for the tools that want to do so.

devsnek · 2019-05-22T21:26:01Z

There’s no one ideal algorithm that will work for all scenarios.

that's a pretty good argument to leave it out of core for now. perhaps you could develop a module with lots of algorithms and such that can be merged into core in the future?

sendilkumarn · 2019-05-22T21:33:10Z

Yeah, I also think adding it as an npm module makes more sense at this moment rather than having this in the core.

SMotaal · 2019-05-23T09:47:36Z

Thinking of this long enough — I must admit it is neither a core thing nor just another npm thing — Some things are just to intertwined to specific features of core to a point where it creates too much noise when forced to always play catch up in the wild.

So in my opinion — this needs to have:

Steady core support — ie any aspects used from core in the detection.

To be clear, not an actual API, just what an API like this one needs to get the job done without ugly package soup.

A clear procedural document in core — ie how to detect.

To the point, ESM where otherwise not unambiguous is a source text that meets the following criteria… etc.

A recommended npm package — while maintained.

Opportunity to transition this out right here :)

Last thing we need is a popularity contest on how a package seems to magically detect ESM modules only to discover it just drops safe guards to do so — that is if we are serious about having an auto detect mode (assuming it is a loader extension at some point, yes?).

doc/api/modules.md

addaleax · 2019-06-03T11:59:18Z

Does the API here lock us into shipping with a parser module like acorn? If so, I’d also really prefer this to be an npm module.

SMotaal · 2019-06-03T12:00:49Z

npm module

@addaleax is there precedence for @nodejs npm modules? I'd really like this to have the benefit of being a Node.js backed utility so that it always meets the bar.

addaleax · 2019-06-03T12:05:06Z

@SMotaal If I recall correctly, we own the @nodejs namespace on npm, so if we wanted to, we could do it, but I’m not sure if this is really a case where it’s necessary to have something “official” – it seems like this module would not typically require an unusual amount of maintenance?

SMotaal · 2019-06-03T12:08:11Z

a parser module like acorn

@addaleax I raised this a few months back, and the rational at the moment was acorn ships because of REPL, which I must point out was already causing me issues, but reason still dictated that since it is there already, that was a different problem space to work out, and that was more appropriate than adding yet another parser to the mix.

I am working on that other one regardless in unrelated efforts, looking for better ways to avoid AST parse reporting errors artificially when they are not.

SMotaal · 2019-06-03T12:13:18Z

have something “official”

I think that this is a very fine grain special case — when people implement loaders, they will aim for convenience, they will sell anything goes mentality and claim that "we the mojo will always detect CJS vs ESM formats for you" which is in very high demand these days.

I'd like this problem to be solved by an "official" solution so that people do not solve this in ways that overload the loader or make it too hard for security aspects that they fail to properly predict what code will be loaded before it is actually loaded.

(sorry for the edits — it is annoying but takes my me time to catch my mistakes)

devsnek · 2019-06-04T20:43:25Z

lib/internal/modules/cjs/loader.js

+  // writing, Acorn doesn't support import() expressions as they are only Stage
+  // 3; yet Node already supports them.
+  const acorn = require('internal/deps/acorn/acorn/dist/acorn');
+  source = stripShebang(source);


the correct thing to use now is stripShebangOrBOM if you know that you're parsing cjs. If you're parsing ESM, you should only use stripShebang. This obviously presents a problem, and I don't have a solution for it.

stripShebangOrBOM was removed in 702331b, because the V8 parser handles stripping shebangs/hashbangs now. Only stripBOM remains.

~~This is a problem, as Acorn doesn’t strip hashbangs. If I remove stripShebang from my method, the following throws:~~

containsModuleSyntax('#!/usr/bin/env node\nimport "./file.js"');

~~When it should return true. I could resurrect stripShebangOrBOM for this method, I suppose?~~

~~I guess this is the problem with relying on Acorn for parsing in certain places (like the REPL) and V8 everywhere else.~~

Nevermind, Acorn has an allowHashBang option, and it’s supported by its tokenizer. All is well. I rebased from the latest master and added a test with a shebang line.

@GeoffreyBooth can we maybe have a preflight checklist re making sure that acorn configs/plugins... etc remain effective for this feature?

I'm thinking that if it is used sparsely, we might want to sanely keep track of dependencies so that updates don't break features like this.

benjamingr · 2019-06-04T20:45:03Z

I've opened an issue to discuss vendoring this and other packages as NPM modules as @addaleax and @SMotaal discussed above.

GeoffreyBooth · 2019-06-05T07:25:13Z

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

targos · 2019-06-05T07:32:08Z

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

As long as this doesn't expose anything specific to Acorn (such as error messages or objects returned by the library), I'm fine with it.

GeoffreyBooth · 2019-06-05T07:33:34Z

As long as this doesn’t expose anything specific to Acorn (such as error messages or objects returned by the library), I’m fine with it.

It doesn’t. The method either returns true or false or throws whatever exception is thrown by new vm.Script(source, { displayErrors: true });.

GeoffreyBooth · 2019-06-05T19:43:38Z

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don’t even strip the bom for esm.

@devsnek Okay, I removed stripBOM.

devsnek · 2019-06-05T20:41:44Z

@GeoffreyBooth but we do strip bom for cjs.

GeoffreyBooth · 2019-06-05T23:02:59Z

@GeoffreyBooth but we do strip bom for cjs.

@devsnek Okay, then we should keep stripBOM. The BOM might confuse Acorn, so we can and should remove it. Whether or not source code has a BOM is irrelevant to whether or not the source contains module syntax.

Whether or not the source code is runnable is outside the scope of what this method is aiming to tell you. If a string has a BOM and also has module syntax, then this method should return true, even if Node chokes on the BOM when it tries to run the file. This method isn’t aiming to tell you if Node can successfully parse or evaluate the source, just whether or not it contains module syntax.

GeoffreyBooth · 2019-06-07T07:16:30Z

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don't even strip the bom for esm.

@devsnek Whether or not source code contains a BOM or a hashbang is irrelevant to whether or not the source contains module syntax. For the purposes of this method we can strip out anything we want that's not an import or export statement, and we have to strip out anything that causes Acorn to choke when it shouldn't (like a hashbang). So I think the current version is correct. Do you mind lifting your block?

devsnek · 2019-06-13T14:30:27Z

@GeoffreyBooth it's a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

GeoffreyBooth · 2019-06-13T17:43:55Z

@GeoffreyBooth it’s a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

I’m not sure what you’re asking for here. If a BOM is permissible in ESM, and therefore ESM may or may not contain a BOM, then sniffing it doesn’t tell us anything about whether code contains module syntax.

The stripping, of BOM or of hashbangs, is only to prevent Acorn from throwing an exception on code that should be parseable. Now that hashbangs are permitted in JavaScript, Acorn shouldn’t require a special option to strip them—but it does. Now that BOMs are permitted in JavaScript, Acorn shouldn’t require us to strip them preemptively—but we need to. These edits are just to work around Acorn’s limitations.

SMotaal · 2019-06-13T20:34:06Z

it's a syntactic difference for esm and cjs

@devsnek can you elaborate what you mean, would:

BOM presence negate the outcome for a source text that is otherwise ESM?
BOM absence negate the outcome for a source text that is otherwise not ESM?

I am almost certain we're not considering BOM a disambiguating aspect here so my apologies for needing more clarification please.

ljharb · 2019-06-13T21:54:59Z

(note that hashbangs are stage 3, so not yet permitted in JS)

devsnek · 2019-06-13T22:00:49Z

what i'm saying is we don't strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

GeoffreyBooth · 2019-06-13T22:34:11Z

what i’m saying is we don’t strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

This method doesn’t look for indicators of CommonJS. It only looks for module syntax.

GeoffreyBooth · 2019-06-13T23:18:34Z

Thanks @devsnek.

Can someone please trigger a build? The last build failed on some crypto tests that have nothing to do with this PR.

nodejs-github-bot · 2019-06-16T02:37:26Z

CI: https://ci.nodejs.org/job/node-test-pull-request/23908/

BridgeAR · 2019-12-20T01:02:27Z

Should this stay open?

GeoffreyBooth · 2019-12-20T04:41:13Z

Technically it's still on the modules roadmap, and there's no alternative yet. Until we discuss and agree to table it, I think this stays open.

GeoffreyBooth · 2020-01-23T07:31:19Z

Closing per nodejs/modules#465.

GeoffreyBooth mentioned this pull request May 22, 2019

Utility method for detecting ES module syntax nodejs/ecmascript-modules#69

Closed

ljharb approved these changes May 22, 2019

View reviewed changes

devsnek added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels May 22, 2019

BridgeAR reviewed May 22, 2019

View reviewed changes

lib/internal/modules/cjs/loader.js Show resolved Hide resolved

Trott reviewed May 25, 2019

View reviewed changes

doc/api/modules.md Show resolved Hide resolved

devsnek reviewed Jun 4, 2019

View reviewed changes

benjamingr mentioned this pull request Jun 4, 2019

Vendor official NPM modules #28062

Closed

esm: add utility method for detecting ES module syntax

3e7d55b

GeoffreyBooth force-pushed the detect-import-export branch from 695d780 to a8a7776 Compare June 5, 2019 06:13

GeoffreyBooth force-pushed the detect-import-export branch from a8a7776 to 750d617 Compare June 5, 2019 19:54

esm: update containsModuleSyntax re removal of stripShebang

0d7db35

GeoffreyBooth force-pushed the detect-import-export branch from 750d617 to 0d7db35 Compare June 5, 2019 22:55

This comment has been minimized.

Sign in to view

joyeecheung mentioned this pull request Jun 16, 2019

CI failures: 20190612-20190615 nodejs/reliability#23

Closed

Trott force-pushed the master branch from 1ecc406 to 49cf67e Compare September 17, 2019 16:51

GeoffreyBooth mentioned this pull request Sep 27, 2019

Script-based configuration files in "type": "module" packages nodejs/modules#389

Closed

devnexen force-pushed the master branch from e8a4568 to 5289f80 Compare December 26, 2019 19:46

GeoffreyBooth mentioned this pull request Jan 12, 2020

Table source type detection nodejs/modules#465

Merged

BridgeAR added the pending label Jan 12, 2020

GeoffreyBooth closed this Jan 23, 2020

GeoffreyBooth removed the pending label Jan 23, 2020

ljharb deleted the detect-import-export branch January 23, 2020 07:35

danielroe mentioned this pull request Oct 22, 2021

ESM / CJS syntax detection unjs/mlly#11

Closed

esm: utility method for detecting ES module syntax #27808

esm: utility method for detecting ES module syntax #27808

Conversation

GeoffreyBooth commented May 22, 2019

Checklist

ljharb commented May 22, 2019

ljharb left a comment • edited Loading

Choose a reason for hiding this comment

devsnek commented May 22, 2019

BridgeAR commented May 22, 2019

Fishrock123 commented May 22, 2019

GeoffreyBooth commented May 22, 2019

TimothyGu commented May 22, 2019

GeoffreyBooth commented May 22, 2019 • edited Loading

devsnek commented May 22, 2019

sendilkumarn commented May 22, 2019

SMotaal commented May 23, 2019

addaleax commented Jun 3, 2019

SMotaal commented Jun 3, 2019 • edited Loading

addaleax commented Jun 3, 2019

SMotaal commented Jun 3, 2019

SMotaal commented Jun 3, 2019 • edited Loading

devsnek Jun 4, 2019

Choose a reason for hiding this comment

GeoffreyBooth Jun 5, 2019 • edited Loading

Choose a reason for hiding this comment

GeoffreyBooth Jun 5, 2019

Choose a reason for hiding this comment

SMotaal Jun 5, 2019

Choose a reason for hiding this comment

benjamingr commented Jun 4, 2019

GeoffreyBooth commented Jun 5, 2019

targos commented Jun 5, 2019

GeoffreyBooth commented Jun 5, 2019

GeoffreyBooth commented Jun 5, 2019

devsnek commented Jun 5, 2019

GeoffreyBooth commented Jun 5, 2019 • edited Loading

GeoffreyBooth commented Jun 7, 2019

devsnek commented Jun 13, 2019

GeoffreyBooth commented Jun 13, 2019

SMotaal commented Jun 13, 2019

ljharb commented Jun 13, 2019

devsnek commented Jun 13, 2019

GeoffreyBooth commented Jun 13, 2019

GeoffreyBooth commented Jun 13, 2019

This comment has been minimized.

nodejs-github-bot commented Jun 16, 2019

BridgeAR commented Dec 20, 2019

GeoffreyBooth commented Dec 20, 2019

GeoffreyBooth commented Jan 23, 2020

ljharb left a comment •

edited

Loading

GeoffreyBooth commented May 22, 2019 •

edited

Loading

SMotaal commented Jun 3, 2019 •

edited

Loading

SMotaal commented Jun 3, 2019 •

edited

Loading

GeoffreyBooth Jun 5, 2019 •

edited

Loading

GeoffreyBooth commented Jun 5, 2019 •

edited

Loading