Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loaders allow breaking JS spec invariants #171

Open
nicolo-ribaudo opened this issue Nov 8, 2023 · 16 comments
Open

Loaders allow breaking JS spec invariants #171

nicolo-ribaudo opened this issue Nov 8, 2023 · 16 comments

Comments

@nicolo-ribaudo
Copy link

Consider this file:

import * as staticImport from "a";

const dynamicImport = await import("a");

console.log(staticImport === dynamicImport);

Two imports in the same file must always resolve to the same module, so that console.log must always log true. However, the following loader allow breaking that language invariant and console.log will log false:

import { register } from "node:module"

register(import.meta.url);

let counter = 0;

export function resolve(specifier, context, nextResolve) {
  if (specifier === "a") {
    return nextResolve(counter++ ? "./a.js" : "./b.js", context);
  }
  return nextResolve(specifier, context);
}

For reference, the relevant specification is https://tc39.es/ecma262/#sec-HostLoadImportedModule:

Where referrer is the importer module, specifier is the string passed to import ("a" in both cases in this example), and "the result is a normal completion" means that loading the module does not throw.

@aduh95
Copy link
Contributor

aduh95 commented Nov 8, 2023

Custom hooks allows you work with non-JS files, e.g. you can make import './file.ts' with custom hooks, and I don't think anyone will complain that the content of file.ts is not parsed in conformance with ECMAScript spec, it's the point of custom hooks.

It's possible to build a hook that's conform to the ECMAScript spec, albeit not very practical as you have to maintain your own resolve cache, but doable, so I'd argue there's nothing that needs to be done from our side. wdyt?

@nicolo-ribaudo
Copy link
Author

nicolo-ribaudo commented Nov 8, 2023

Custom hooks allows you work with non-JS files, e.g. you can make import './file.ts' with custom hooks, and I don't think anyone will complain that the content of file.ts is not parsed in conformance with ECMAScript spec, it's the point of custom hooks.

Well, that is spec-conformant 😛 The spec only gives guarantees about how code inside JS modules/scripts executes, but allows loading modules of other types (think for example about JSON, CSS and WASM modules).

It's possible to build a hook that's conform to the ECMAScript spec, albeit not very practical as you have to maintain your own resolve cache, but doable, so I'd argue there's nothing that needs to be done from our side. wdyt?

I think this is an ok position -- loaders are indeed an advanced feature and the expectation from Node.js could be that people writing them should know what they are doing.

The risk I see here is that a naively implemented loader might accidentally not behave according to the spec. As a user, I would then expect my modules to at least follow JS semantics: they obviously will not follow Node.js semantics, given that well... I'm using a loader to deviate, but I would write my code assuming that "bare JS semantics that work everywhere" will keep being held.

I wonder if the loaders API could do something so that the default behavior is to do "the correct thing" rather than expecting loaders authors to be aware of it.

When a module is loaded twice from the same file, there is some internal caching the load() hook is called just once. Maybe Node.js could implement similar caching for the resolve()` hook, and call it just once (using (importer_module, unresolvedSpecifier) as the cache key).

@aduh95
Copy link
Contributor

aduh95 commented Nov 8, 2023

I wonder if the loaders API could do something so that the default behavior is to do "the correct thing" rather than expecting loaders authors to be aware of it.

Oh for sure, we ended up implementing a resolve cache precisely to match that part of the spec, see nodejs/node#46662. It was decided that we should stop using that cache as soon as a custom loader is being registered, so in a way we delegate that to hooks authors to "do the right thing" indeed. It'd be nice if there was a way to keep using that cache unless a hook opts-out of it, because there are use cases for which one would want to deviate from the spec, but I couldn't find a way to do it, so that's the situation today.

@nicolo-ribaudo
Copy link
Author

Thank you for the context :) I see that this has been already discussed explicitly. Feel free to close this issue, or to re-use is as a feature-request for allowing loaders to rely on the built-in resolve cache.

@aduh95
Copy link
Contributor

aduh95 commented Nov 8, 2023

I think it makes sense to keep it open as a feature request, if someone finds a way to make it opt-out, it would certainly be a welcome change.

@ljharb
Copy link
Member

ljharb commented Nov 8, 2023

What are the use cases where a loader would want to deviate from the spec here?

@laverdet
Copy link

Custom hooks allows you work with non-JS files, e.g. you can make import './file.ts' with custom hooks, and I don't think anyone will complain that the content of file.ts is not parsed in conformance with ECMAScript spec, it's the point of custom hooks.

The specification provides a means for this. A TypeScript module would be a concrete implementation of a Cyclic Module Record, in the same way Source Text Module is. For all practical purposes this would implemented as a transpilation stage on top of Source Text Module, but regardless running TypeScript in a JavaScript runtime is not a violation of the specification. Otherwise WebAssembly, which is not defined in es-262, would not be compliant.


Given the discussion here: nodejs/node#50618 do we want to revisit the register hook? @ljharb do you have an opinion one way or another? This feature explicitly allows any module to tinker with module resolution in a specification-breaking way without an explicit flag to the CLI.

@ljharb
Copy link
Member

ljharb commented Nov 13, 2023

I think that if it becomes stable and/or commonplace for any JS code in node_modules to be able to change how module resolution works, it will be a security and PR disaster for node. It wouldn't take a security researcher much time to generate enough CVEs that the entire loader feature would have to be removed, or at least restricted to only operating in the startup phase (through --import or another CLI flag).

It's certainly not my decision, but continuing on this course would be a grave mistake, especially in the current climate where governments, including the US, are pushing very hard on supply chain security concerns.

@laverdet
Copy link

I guess I'm confused about the security argument. Maybe under deno it would be a concern because in that ecosystem system capabilities are strictly opt-in. Given that, in nodejs, every module has complete access to the running user's account [including the capability to run arbitrary binary code] isn't that a greater concern?

Anyway this is pretty off-topic but the design choice of making register a runtime function is baffling to me for entirely different reasons. Seems like I missed the boat on that discussion though and it's just something that we'll have to tolerate.

@GeoffreyBooth
Copy link
Member

Given that, in nodejs, every module has complete access to the running user’s account [including the capability to run arbitrary binary code] isn’t that a greater concern?

Agreed, I don’t see the security concern here. It’s always been part of Node’s security model that all code, including code within node_modules, is considered trusted code. There’s no sandboxing of dependencies or protection from anything that they can do.

register works at runtime because of the move of module hooks to run off thread, and the need for many hooks to do things on the main thread. One of the primary purposes of register is to set up the communications channel with the initialize hook, so that actions can have effects both off-thread and on the main thread. That’s a lot to try to accomplish via CLI flags. The recommended pattern is to call register within a file referenced by --import, so that it’s loaded before application code. Any hooks registered later would only effect modules import()-ed after registration, which is little to nothing for most apps.

@laverdet
Copy link

require.register was just an awful experience as a module consumer so I was super disappointed to see it come back. A lot of naive module authors add something like require('ts-node/register') at the top of their project, publish unstripped TypeScript and call it a day. Then deep in some dependency you end up with several versions of TypeScript running for no reason. It can be hell to track down. It shouldn't be configurable at runtime or you'll see the same thing ESM. Or at the very least invocations to register should throw once the main entrypoint has been entered to prevent this behavior spreading.

@GeoffreyBooth
Copy link
Member

A lot of naive module authors add something like require('ts-node/register') at the top of their project

We need to provide equivalent functionality in ESM, though. Some people can’t define the flags used to run Node for whatever reason, and need to do things like add an instrumentation library at runtime. (You’ve probably seen instructions like “ensure that require('instrumentation-library') is the first line of your application.”) Yes it’s unfortunate that the API can be misused, but that’s the case for any API. We do try to limit footguns that we provide users, but some things are footguns in one context but completely necessary in another.

Really the lesson here is that users need to be responsible for their dependencies. They don’t have to read every line of source code of a dependency, but there are good tools now for scanning libraries and they can find red flags to warn about misbehaving dependencies.

@aduh95
Copy link
Contributor

aduh95 commented Nov 13, 2023

We need to provide equivalent functionality in ESM, though

You say "we need", but I think it's more accurate to say "I want". Nothing's wrong with that, but it's only fair to acknowledge it's a self imposed limit, and is up for debate.

@GeoffreyBooth
Copy link
Member

You say “we need”, but I think it’s more accurate to say “I want”.

It’s even more accurate to say that it’s an explicitly defined goal for the API: https://github.com/nodejs/loaders#milestone-1-parity-with-commonjs. The point of creating these hooks was to allow the user to do in ESM whatever they could do in CommonJS. I would be fine with removing the ability to register hooks after entry point evaluation once we also remove the ability to monkey-patch in CommonJS, to preserve parity.

@ljharb
Copy link
Member

ljharb commented Nov 14, 2023

So it sounds like you're saying that monkey-patching in CJS is explicitly part of the API, which is why parity for it is appropriate?

Your previous statements are that it's not part of the API, which would mean parity would not be appropriate.

@arcanis
Copy link
Contributor

arcanis commented Nov 14, 2023

It seems to me this discussion is getting very off-topic.

the cache is used only when there are no loaders involved (there wasn't consensus if we wanted to also have a cache when loaders are involved; the upside is that it simplify the cache setup as the default resolve is sync, so no async failure to take care of).

Do you have details on the discussion around that (I'm also curious about some of the anticipated use cases for resolvers returning unstable results; I can imagine some, but they don't seem very practical)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants