Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: utility method for detecting ES module syntax #27808

Open
wants to merge 2 commits into
base: master
from

Conversation

@GeoffreyBooth
Copy link
Contributor

commented May 22, 2019

This PR adds a utility method to detect whether input JavaScript source code contains ES module syntax:

const { containsModuleSyntax } = require('module');

containsModuleSyntax('import { shuffle } from "lodash"'); // true
containsModuleSyntax('console.log(process.version)'); // false

containsModuleSyntax('import "./file.mjs"'); // true
containsModuleSyntax('import("./file.mjs")'); // false

The use case for this is any utility that currently uses vm.Script’s runInThisContext and might also want to support ESM input source code, and therefore would need to know when to use vm.Script versus the ES module vm.SourceTextModule (and has no definitive signal from the user as to the intended source code type, and where the risk of evaluating in the wrong module system is acceptable for the use case). Two such utilities that I’m familiar with are Babel and CoffeeScript, which each take arbitrary source code as user input (like --eval) and use vm.runInThisContext for evaluation. I would also imagine that this method would be useful to some loaders or testing frameworks.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
@ljharb

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

@ljharb

ljharb approved these changes May 22, 2019

Copy link
Contributor

left a comment

It seems more useful to have a method that returns an array of possible parse goals - ie, esm, cjs, or both - which allows for future options like “wasm”, “json”, etc - but “contains module syntax” certainly does what it says on the tin.

@devsnek

This comment has been minimized.

Copy link
Member

commented May 22, 2019

i am concerned about the longevity of this api, not that the api itself is a bad thing. is this the right place to put it, is there some other api in the future we might want instead, etc. perhaps flagging it would make sense for now?

@BridgeAR

This comment has been minimized.

Copy link
Member

commented May 22, 2019

I agree with @devsnek that this API might change again. An alternative to a flag would be to mark it as experimental.

@Fishrock123

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Does this api really belong in core? Can’t it be an npm module?

Especially since it’s future may be uncertain, out of core flexibility seems better to me...

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

Does this api really belong in core? Can’t it be an npm module?

Of course, any utility method could be outside core. I was thinking that this belonged in core because it’s useful in relation to vm.Script and vm.SourceTextModule. We might also find ourselves using this method in core someday, for example if we wanted to provide some kind of automatic way to choose the type for --input-type.

@TimothyGu

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Hmm, I think we should wait until such a usage in core becomes apparent and then discuss adding this. It seems possible right now that a future feature would in fact require a slightly different algorithm for deciding if a file is CJS or a module.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

Ultimately I think there will be multiple algorithms for deciding how a file or a source input string should be treated. There’s no one ideal algorithm that will work for all scenarios.

The difficulty in “detecting” CommonJS is that most (all?) CommonJS files that have 'use strict' or could have 'use strict' without erroring are also valid as ESM. Such source code may reference the CommonJS globals like require, but that’s permitted in ES module code; such references would just throw, unless the reference was one like if (require). So you could make a theoretical “probably CommonJS source” method, that looks for references to these globals and possibly uses of non-strict mode syntax like with or HTML comments, and that might be a useful companion to this method. But I wouldn’t combine them. There’s value in having a method that does exactly one thing and does it fast. containsModuleSyntax doesn’t even parse the source code, it only tokenizes; and it breaks as soon as an import or export statement token is encountered. Since the current behavior of Node is to default to running all code as CommonJS, it will be useful to have a way to very quickly do “if definitively ESM, run as ESM; else run as CommonJS” for the tools that want to do so.

@devsnek

This comment has been minimized.

Copy link
Member

commented May 22, 2019

There’s no one ideal algorithm that will work for all scenarios.

that's a pretty good argument to leave it out of core for now. perhaps you could develop a module with lots of algorithms and such that can be merged into core in the future?

@sendilkumarn

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Yeah, I also think adding it as an npm module makes more sense at this moment rather than having this in the core.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

Thinking of this long enough — I must admit it is neither a core thing nor just another npm thing — Some things are just to intertwined to specific features of core to a point where it creates too much noise when forced to always play catch up in the wild.

So in my opinion — this needs to have:

  1. Steady core support — ie any aspects used from core in the detection.

To be clear, not an actual API, just what an API like this one needs to get the job done without ugly package soup.

  1. A clear procedural document in core — ie how to detect.

To the point, ESM where otherwise not unambiguous is a source text that meets the following criteria… etc.

  1. A recommended npm package — while maintained.

Opportunity to transition this out right here :)

Last thing we need is a popularity contest on how a package seems to magically detect ESM modules only to discover it just drops safe guards to do so — that is if we are serious about having an auto detect mode (assuming it is a loader extension at some point, yes?).

Show resolved Hide resolved doc/api/modules.md
@addaleax

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

Does the API here lock us into shipping with a parser module like acorn? If so, I’d also really prefer this to be an npm module.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

npm module

@addaleax is there precedence for @nodejs npm modules? I'd really like this to have the benefit of being a Node.js backed utility so that it always meets the bar.

@addaleax

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@SMotaal If I recall correctly, we own the @nodejs namespace on npm, so if we wanted to, we could do it, but I’m not sure if this is really a case where it’s necessary to have something “official” – it seems like this module would not typically require an unusual amount of maintenance?

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

a parser module like acorn

@addaleax I raised this a few months back, and the rational at the moment was acorn ships because of REPL, which I must point out was already causing me issues, but reason still dictated that since it is there already, that was a different problem space to work out, and that was more appropriate than adding yet another parser to the mix.

I am working on that other one regardless in unrelated efforts, looking for better ways to avoid AST parse reporting errors artificially when they are not.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

have something “official”

I think that this is a very fine grain special case — when people implement loaders, they will aim for convenience, they will sell anything goes mentality and claim that "we the mojo will always detect CJS vs ESM formats for you" which is in very high demand these days.

I'd like this problem to be solved by an "official" solution so that people do not solve this in ways that overload the loader or make it too hard for security aspects that they fail to properly predict what code will be loaded before it is actually loaded.

(sorry for the edits — it is annoying but takes my me time to catch my mistakes)

// writing, Acorn doesn't support import() expressions as they are only Stage
// 3; yet Node already supports them.
const acorn = require('internal/deps/acorn/acorn/dist/acorn');
source = stripShebang(source);

This comment has been minimized.

Copy link
@devsnek

devsnek Jun 4, 2019

Member

the correct thing to use now is stripShebangOrBOM if you know that you're parsing cjs. If you're parsing ESM, you should only use stripShebang. This obviously presents a problem, and I don't have a solution for it.

This comment has been minimized.

Copy link
@GeoffreyBooth

GeoffreyBooth Jun 5, 2019

Author Contributor

stripShebangOrBOM was removed in 702331b, because the V8 parser handles stripping shebangs/hashbangs now. Only stripBOM remains.

This is a problem, as Acorn doesn’t strip hashbangs. If I remove stripShebang from my method, the following throws:

containsModuleSyntax('#!/usr/bin/env node\nimport "./file.js"');

When it should return true. I could resurrect stripShebangOrBOM for this method, I suppose?

I guess this is the problem with relying on Acorn for parsing in certain places (like the REPL) and V8 everywhere else.

This comment has been minimized.

Copy link
@GeoffreyBooth

GeoffreyBooth Jun 5, 2019

Author Contributor

Nevermind, Acorn has an allowHashBang option, and it’s supported by its tokenizer. All is well. I rebased from the latest master and added a test with a shebang line.

This comment has been minimized.

Copy link
@SMotaal

SMotaal Jun 5, 2019

Contributor

@GeoffreyBooth can we maybe have a preflight checklist re making sure that acorn configs/plugins... etc remain effective for this feature?

I'm thinking that if it is used sparsely, we might want to sanely keep track of dependencies so that updates don't break features like this.

@benjamingr

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

I've opened an issue to discuss vendoring this and other packages as NPM modules as @addaleax and @SMotaal discussed above.

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:detect-import-export branch from 695d780 to a8a7776 Jun 5, 2019

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

@targos

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

Does the API here lock us into shipping with a parser module like acorn?

Yes, but Node core already uses Acorn in the REPL and in assert. I assume that refactoring out Node’s dependency on Acorn from those places would probably entail some deprecations, and this method would just be one of several deprecations that would happen. I think it’s more likely that Acorn would stick around until we can somehow use V8 as our parser, in which case it can support this API along with the REPL and assert.

As long as this doesn't expose anything specific to Acorn (such as error messages or objects returned by the library), I'm fine with it.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

As long as this doesn’t expose anything specific to Acorn (such as error messages or objects returned by the library), I’m fine with it.

It doesn’t. The method either returns true or false or throws whatever exception is thrown by new vm.Script(source, { displayErrors: true });.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

As long as this doesn't expose anything specific to Acorn (such as error messages or objects returned by the library), I'm fine with it.

@targos Interesting you bring that up though, because I'd think that a utility of that nature should always try {} catch (explicitly) { handle(explicitly) }.

thrown by new vm.Script(source, { displayErrors: true });

@GeoffreyBooth Can you elaborate on why having a user-facing exception might be helpful?

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

why having a user-facing exception might be helpful?

If the source code cannot be parsed, it may or may not contain module syntax but the function wouldn't be able to tell either way, because attempting to parse it throws an exception in the tokenizer. Rather than surface the Acorn tokenizer exception, we try to compile it using Node so that we get a friendlier and more expected-looking exception with the invalid syntax emphasized.

@devsnek
Copy link
Member

left a comment

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don't even strip the bom for esm.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

If the source code cannot be parsed

@GeoffreyBooth I think that covers the first aspect, and yes, I see that here throwing is the more appropriate effect.

The bit left to address here is if the error generated from vm.Script faithfully relates from the perspective of the consumer of this API, ie no cases where the error would add confusion to someone not aware of vm.Script or at least not feel inclined to think this was caused by a direct call to vm.Script lost in their project somewhere.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

The bit left to address here is if the error generated from vm.Script faithfully relates from the perspective of the consumer of this API, ie no cases where the error would add confusion to someone not aware of vm.Script or at least not feel inclined to think this was caused by a direct call to vm.Script lost in their project somewhere.

A throw looks like this:

> containsModuleSyntax('@')
Thrown:
evalmachine.<anonymous>:1
@
^

SyntaxError: Invalid or unexpected token

There’s no mention of vm.Script.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don’t even strip the bom for esm.

@devsnek Okay, I removed stripBOM.

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:detect-import-export branch from a8a7776 to 750d617 Jun 5, 2019

@devsnek

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

@GeoffreyBooth but we do strip bom for cjs.

@GeoffreyBooth GeoffreyBooth force-pushed the GeoffreyBooth:detect-import-export branch from 750d617 to 0d7db35 Jun 5, 2019

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

@GeoffreyBooth but we do strip bom for cjs.

@devsnek Okay, then we should keep stripBOM. The BOM might confuse Acorn, so we can and should remove it. Whether or not source code has a BOM is irrelevant to whether or not the source contains module syntax.

Whether or not the source code is runnable is outside the scope of what this method is aiming to tell you. If a string has a BOM and also has module syntax, then this method should return true, even if Node chokes on the BOM when it tries to run the file. This method isn’t aiming to tell you if Node can successfully parse or evaluate the source, just whether or not it contains module syntax.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

along with allowHashBang this will let a file with \uFEFF#! in it through, which is incorrect for cjs, and we don't even strip the bom for esm.

@devsnek Whether or not source code contains a BOM or a hashbang is irrelevant to whether or not the source contains module syntax. For the purposes of this method we can strip out anything we want that's not an import or export statement, and we have to strip out anything that causes Acorn to choke when it shouldn't (like a hashbang). So I think the current version is correct. Do you mind lifting your block?

@devsnek

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

@GeoffreyBooth it's a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@GeoffreyBooth it’s a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it?

I’m not sure what you’re asking for here. If a BOM is permissible in ESM, and therefore ESM may or may not contain a BOM, then sniffing it doesn’t tell us anything about whether code contains module syntax.

The stripping, of BOM or of hashbangs, is only to prevent Acorn from throwing an exception on code that should be parseable. Now that hashbangs are permitted in JavaScript, Acorn shouldn’t require a special option to strip them—but it does. Now that BOMs are permitted in JavaScript, Acorn shouldn’t require us to strip them preemptively—but we need to. These edits are just to work around Acorn’s limitations.

@SMotaal

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

it's a syntactic difference for esm and cjs

@devsnek can you elaborate what you mean, would:

  1. BOM presence negate the outcome for a source text that is otherwise ESM?
  2. BOM absence negate the outcome for a source text that is otherwise not ESM?

I am almost certain we're not considering BOM a disambiguating aspect here so my apologies for needing more clarification please.

@ljharb

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

(note that hashbangs are stage 3, so not yet permitted in JS)

@devsnek

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

what i'm saying is we don't strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

what i’m saying is we don’t strip bom for source text modules, so we can use the presence of bom as another indicator for cjs.

This method doesn’t look for indicators of CommonJS. It only looks for module syntax.

@GeoffreyBooth

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

Thanks @devsnek.

Can someone please trigger a build? The last build failed on some crypto tests that have nothing to do with this PR.

@Trott

This comment has been minimized.

1 similar comment
@nodejs-github-bot

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.