Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: support loading data: URLs #28614

Open
wants to merge 10 commits into
base: master
from

Conversation

@bmeck
Copy link
Member

commented Jul 9, 2019

This PR allows loading some formats of modules as data: URLs. This matches the web spec and has some concerns as we grow what can be loaded into ESM as per the open discussion at nodejs/security-wg#520 . I'm opening this with expectation of some discussion to take place around MIME parsing (which is left in #21128) which this PR doesn't properly do and what to do in the cases where module formats are not supported outside of file contexts. Currently I simply didn't include the CJS MIME or the C++ Addon MIME for what can be loaded via data: URLs. We could expose those but likely things would be awkward for things like __filename and that C++ addons would have to be written to disk first for dlopen to work. Once we resolve that we can write up docs on the decision and move this PR forward.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@bmeck bmeck added the ES Modules label Jul 9, 2019

@bmeck

This comment has been minimized.

Copy link
Member Author

commented Jul 9, 2019

@jkrems

jkrems approved these changes Jul 9, 2019

Copy link
Contributor

left a comment

LGTM - I think this would be great to have.

Currently I simply didn't include the CJS MIME or the C++ Addon MIME for what can be loaded via data: URLs.

I think that's reasonable, especially at first. The require loader isn't protocol aware so it would be
awkward to support file types that are "require-native" here.

Show resolved Hide resolved lib/internal/modules/esm/default_resolve.js Outdated
Show resolved Hide resolved lib/internal/modules/esm/translators.js
Update lib/internal/modules/esm/default_resolve.js
Co-Authored-By: Jan Olaf Krems <jan.krems@gmail.com>
@devsnek

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

this is very exciting :) i had been planning to open something like this when mime parsing landed. as long as this doesn't use proper parsing, it should probably be flagged.

if (parsed.protocol === 'data:') {
const [ , mime ] = /^([^/]+\/[^;,]+)(;base64)?,/.exec(parsed.pathname) || [ null, null, null ];
const format = ({
'text/javascript': 'module',

This comment has been minimized.

Copy link
@ljharb

ljharb Jul 9, 2019

Contributor

text/javascript doesn't necessarily mean Module, it could also mean Script.

This comment has been minimized.

Copy link
@bmeck

bmeck Jul 9, 2019

Author Member

Under no JS loading spec does Script get checked against the MIME text/javascript. Script is effectively without a MIME and this table matches web standards.

This comment has been minimized.

Copy link
@jkrems

jkrems Jul 9, 2019

Contributor

I don't think we support (browser-style) Script anywhere, so it would feel weird to do that here. A CommonJS script would be something like text/vnd.node.js according to nodejs/TSC#371.

This comment has been minimized.

Copy link
@bmeck

bmeck Jul 9, 2019

Author Member

This comment has been minimized.

Copy link
@ljharb

ljharb Jul 9, 2019

Contributor

then can application/node be added to this object?

This comment has been minimized.

Copy link
@bmeck

bmeck Jul 9, 2019

Author Member

@ljharb but the require cache works off file paths, not URLs. It also wouldn't make sense for things like __dirname, __filename, or module.parent (if the parent is not a CJS module). I don't think starting to deal with the results of adding a new type of key and potentially parent is a simple task. We would also need to ensure that require starts to understand non-file contexts and doesn't load non-builtins, which would be a new thing as well.

This comment has been minimized.

Copy link
@SMotaal

SMotaal Jul 9, 2019

Contributor

The expression though — /^([^/]+\/[^;,]+)(;base64)?,/ seems to assume the presence of mime type before the [;,] — regardless of if it is mandatory, it might make sense to test it against long and malformed urls to potentially refine it if necessary.

Sorry for not wanting to muddy this with a bad attempt to wing it here, but I will try to locate the ones I worked on a while back for that very same purpose if it helps.

This comment has been minimized.

Copy link
@bmeck

bmeck Jul 10, 2019

Author Member

@SMotaal we could, but i can't think of much we could do except limiting the size? Right now this lacks a variety of things, including MIME parameter parsing and the PR for parsing MIMEs is stuck.

This comment has been minimized.

Copy link
@SMotaal

SMotaal Jul 10, 2019

Contributor

Since it is anchored, it is certainly possible to add efficient guards in the current expression. I'd like to take on exploring how we can do that here, which is mainly just to carve a limited allowed chars when delimited per spec (I did this a while back just need to dig).

This comment has been minimized.

Copy link
@SMotaal

SMotaal Jul 12, 2019

Contributor

@bmeck I looked into the various options for the expression and recommend:

/^(?:((?:text|application)\/(?:[A-Z][-.0-9A-Z]*)?[A-Z]+)((?:;[A-Z][!%'()*\-.0-9A-Z_~]*=[!%'()*\-.0-9A-Z_~]*)*)(;base64)?),/i

This would match any text/ and application/ subtype, along with the attribute-value parameters like charset= (to be parsed separately), and optional base64 (captured separately from previous parameters).

For now, simply being more restrictive of the character ranges for greedy * and + captures is likely all we need to avoid unpredictable performance hazards with very long crafted/malformed strings.

See gist for more details.

Please let me know how to proceed, if this is worth incorporating.

@benjamingr

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

I am +1 on this change and code looks fine. I am not LGTMing this because tests are missing and two LGTMs mean this could land.

I'd also like to echo Jan's statement (about the CJS and addon mime types):

I think that's reasonable, especially at first. The require loader isn't protocol aware so it would be

@guybedford

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

To try to understand how import resolves within data: URLs, does this seem right:

  • data:text/javascript,import './x' would always fail.
  • data:text/javascript,import 'x' would always fail.
  • data:text/javascript,import 'fs' works.
  • data:text/javascript,import '/path/to/file.js' would always fail.
  • data:text/javascript,import 'file:///path/to/file.js' works.

I must admit I'm a little worried about the security model implications, especially around systems that might want to restrict import access. Eg if we have packages on npm that use a import('data:text/javascript,import "' + pathToFileURL(resolve('./dep.js')) + '"') approach then we may struggle to do any type of import restriction in future while retaining working code. Also how does this interact with policy integrity?

In addition, what are the driving use cases for this work?

@bmeck

This comment has been minimized.

Copy link
Member Author

commented Jul 10, 2019

data:text/javascript,import './x' would always fail.

yep. ./x relative to the data URL doesn't make sense and fails since data is not a special scheme. Same as running new URL('./x', 'data:...').

data:text/javascript,import 'x' would always fail.

yep. x has no relative lookup space (node_modules is on a different scheme/context) to resolve its bare name. we could provide ways to populate the lookup space in a way similar to import maps or using loaders.

data:text/javascript,import 'fs' works.

yep. even though we could ban these, it doesn't seem like a big difference given how many evaluators there are with the ability to get fs etc.

data:text/javascript,import '/path/to/file.js' would always fail.

/path/to/file.js is probably incorrect, /path/to/file.js doesn't point to files on the data: scheme, same as if you tried to use that specifier on http it wouldn't point to a file on disk. This mimics running new URL('/path/to/file.js', 'data:...').

data:text/javascript,import 'file:///path/to/file.js' works.

yup, that can be resolved since the specifier is an absolute URL.

I must admit I'm a little worried about the security model implications, especially around systems that might want to restrict import access. Eg if we have packages on npm that use a import('data:text/javascript,import "' + pathToFileURL(resolve('./dep.js')) + '"') approach then we may struggle to do any type of import restriction in future while retaining working code. Also how does this interact with policy integrity?

Policy integrities are how people should restrict import access. We have a variety of evaluators that can access powerful APIs: eval, Function, AsyncFunction, vm.*, worker_threads, etc. . This is just another form of evaluator, the only way to prevent evaluators from accessing powerful APIs would be to introduce mechanisms to prohibit access to those APIs.

The policy files currently are meant to prohibit loading resources that are not whitelisted through module loaders. A user would need to whitelist the data URL and the file URL to dep.js for that to work. Exceptions allowing universal importing are not currently supported by --experimental-policies, nor is the ability to interactively whitelist resources while running. Both would alleviate usability in cases where users are sure they want to allow some resources to have that level of power.

In addition, what are the driving use cases for this work?

Cross environment compatibility and runtime generation of modules in the main module map (with limits similar to browsers). A variety of uses of data URLs are possible, including but not limited to creation of shared module namespaces keyed by strings allowing things like modules to share a communication channel without directly needing to know where the other is.

@jkrems

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

This is just another form of evaluator, the only way to prevent evaluators from accessing powerful APIs would be to introduce mechanisms to prohibit access to those APIs.

That is actually an interesting point - should a policy that prevents eval also prevent import of data URLs? Or is "an untrusted string is passed into dynamic import" already considered insecure enough to make this not matter?

EDIT: I assume that integrity checks or other "allowed module sources" would be the more appropriate policy as opposed to throwing this in with eval.

@bmeck

This comment has been minimized.

Copy link
Member Author

commented Jul 10, 2019

@jkrems alternative works like Trusted Types are being looked at for ways to label "strings" as trusted for evaluators and has some agenda items at this month's TC39 meeting. We could add whatever policies people want to --experimental-policies but I do not consider that in scope for this PR. I would note that if we push things unflagged though, the default values for policies will have to deal with backwards compatibility constraints.

bmeck added some commits Jul 10, 2019

Show resolved Hide resolved doc/api/esm.md Outdated

bmeck added some commits Jul 11, 2019

@bmeck bmeck changed the title DO NOT MERGE esm: support loading data: URLs esm: support loading data: URLs Jul 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.