Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look for node_ceiling #43828

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

arhart
Copy link
Contributor

@arhart arhart commented Jul 13, 2022

It is easy to accidentally allow another user to influence what code node loads and executes. Details can be found at HackerOne reports 1564437 (CommonJS module loading), 1564444 (ECMAScript module resolution), and 1564445 (package.json). While these behaviors are documented, the security implications are easy to overlook.

By stopping the search for node_modules and package.json after encountering a node_ceiling file, this change allows an application to be protected with few changes to the file system and a high degree of compatibility.

Stop searching for package.json or node_modules when a node_ceiling
file is found.

Refs: nodejs#43192
Refs: nodejs#43368
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Jul 13, 2022
@arhart
Copy link
Contributor Author

arhart commented Jul 13, 2022

Some tests are failing, even before my changes.

@arhart
Copy link
Contributor Author

arhart commented Jul 13, 2022

Refs: #43192
Refs: #43368

@ljharb
Copy link
Member

ljharb commented Jul 13, 2022

This seems exceedingly premature prior to a lot more discussion.

Changing the CJS module resolution algorithm - especially with "Stability 2" - will impact a ton of tools in the ecosystem, including bundlers, transpilers, packages like resolve and enhanced-resolve, etc. Such a change should have extensive discussion with stakeholders, and broadcasting, long before writing any code.

@arhart
Copy link
Contributor Author

arhart commented Jul 13, 2022

@ljharb there has been some discussion in the referenced issue, but I hoped having code to play with might facilitate additional discussion.

@ljharb
Copy link
Member

ljharb commented Jul 13, 2022

and that discussion largely seems to indicate that this change shouldn't be made, except via loaders.

@bmeck
Copy link
Member

bmeck commented Jul 13, 2022 via email

@guybedford
Copy link
Contributor

Did you consider just using an environment variable for this like NODE_PROJECT_ROOT or something?

@bmeck
Copy link
Member

bmeck commented Jul 13, 2022 via email

@arhart
Copy link
Contributor Author

arhart commented Jul 14, 2022

@guybedford @bmeck
When considering policies for issue 43192 , I considered command line arguments, environment variables, and a file-based indicator. Some projects, such as NPM, spawn child processes which either directly or transitively exec additional node processes. Passing an extra command line argument potentially requires modifying code across projects. Passing an environment variable (such as NODE_OPTIONS) reaches some invocations that command line arguments can't easily reach, although some programs intentionally strip out environment variables. Both environment variables and command line arguments

  1. rely on intermediate code passing them along as intended (there is potentially a LOT of intermediate code to consider)
  2. are difficult to observe, especially for short-lived processes
  3. are difficult to ensure don't accidentally stop behaving as intended, across dependency upgrades, for example

The using something in the file system OTOH

  1. needn't rely on intermediate code leaving unchanged if the process doesn't have permissions to modify the relevant files
  2. is easy to observe statically as part of an RPM, tarball, ZIP, or other package used to install the program or directly on the filesystem after install and/or during development
  3. can be protected from accidental or intentional modification using normal system tools (chmod, chown, mount options, etc.)
  4. if structured to match the existing search paths for node_modules and package.json, can follow the same rules used for existing resolution, allowing for different search termination for different files (just like different files today can have different package.json files and/or walk a different path toward the root due to symlinks).
  5. could be adopted by system-wide install locations, protecting common scripts like npm, mocha, node-gyp, etc. Unlike environment variables or command-line options which would seem to require some sort of merge-algorithm to handle dependencies installed in multiple locations (main app, system-wide, user home directory, ...), a file-system approach has a built-in way to distribute the information that applies to the different install locations.

I also considered placing something inside package.json or inside node_modules, but it seemed useful to be able to terminate the search for either without the presence of the other. Also, the search for package.json terminates at the first one located, while the search for node_modules can continue beyond a package.json, so a new file seemed useful.

I chose to look specifically for a file rather than a file-or-directory because it seemed useful to have a single pattern. Either would allow future expansion (file contents or new entries in a directory), but having a mix of both sounded needlessly complicated.

The name seemed reasonably meaningful, didn't commit to a file-format when it's currently only a present or not-present boolean, and is the same length as node_modules (so it's probably OK on any system node currently searches for node_modules).

@arhart
Copy link
Contributor Author

arhart commented Jul 15, 2022

Would it be useful to have an environment variable in addition to looking in the file system? If so, there would be questions of how multiple paths are joined into the variable and how they are normalized (or not) and compared with the search paths, but I think both approaches could coexist.

@bnoordhuis
Copy link
Member

Environment variables have the problem that they may not get propagated to child processes. Ex. child_process.fork().

@bmeck
Copy link
Member

bmeck commented Jul 17, 2022 via email

@arhart
Copy link
Contributor Author

arhart commented Jul 17, 2022

If policies were automatically looked up it would solve the problem as well. They intentionally do not compose though. If we go for files I would prefer that approach as it is fewer features to keep track of.

Policies offer options like dependency redirection. Any proposal would need to avoid automatically loading a policy under malicious control. Are there any proposals in that direction?

The code in this pull request only prunes the search path and only looks in directories that have already been examined for node_modules or package.json.

@bmeck
Copy link
Member

bmeck commented Jul 17, 2022 via email

@arhart
Copy link
Contributor Author

arhart commented Jul 17, 2022

Like environment variables, CLI arguments don't necessarily get passed to child processes.

@devsnek
Copy link
Member

devsnek commented Jul 17, 2022

I don't have any strong opinions here but if we do end up using a file sentinel I'd suggest package.json with "root": true, like how eslint works.

@ljharb
Copy link
Member

ljharb commented Jul 17, 2022

I like that idea much better - it mirrors private: true, it's in the place node already has to look, and the root of a project is almost certain to already have a package.json.

@arhart
Copy link
Contributor Author

arhart commented Jul 22, 2022

Using a field in package.json might not be as convenient as it first sounds.

  1. Libraries often ship the same package.json that is used during development, but using a field in package.json might prevent that.
    1. During development it might be desired to stop searching for node_modules at its package.json location, but when the library is installed that might not be appropriate (depending how which package manager with which options installed it).
  2. The desired location might not be somewhere node already looks for package.json
    1. Node currently stops looking for a package.json once it has found one.
    2. Consider a pnpm install of express in a project called topproject. Express might be found in topproject/node_modules/express as a symlink to .pnpm/express@4.18.1/node_modules/express. It might find all of its dependencies under topproject/node_modules/.pnpm/express@4.18.1/node_modules/* and want to stop searching for node_modules at topproject/node_modules/.pnpm/express@4.18.1/, which doesn't currently have a package.json (but it could be added; so could a node_ceiling). It seems unlikely node would currently look for a package.json there because it is likely all the files further under express@4.18.1/ already have a nearer package.json. It seems it might be desirable to stop there rather than continuing to topproject/node_modules, for example to prevent packages that are direct dependencies of topproject from being found when required-but-not-declared by other (possibly transitive) dependencies of topproject. I could imagine pnpm might want to support both use cases of stopping at topproject/node_modules/.pnpm/express@4.18.1/ or at topproject/.
    3. Node could be made to look for package.json there specifically to look for a "root": true, but would that cause more confusion than a distinct filename for that purpose?

@guybedford
Copy link
Contributor

This feature comes down to specifying a single path only within which Node.js should execute code - I think rather than a file signifier, we should just have an out-of-band way to achieve that such as a flag or environment variable or configuration file.

There are a number of things in Node.js at this point that want to just be applied ambiently through configuration - loaders, hooks, instrumentation, policies, custom flags. Perhaps this is another data point on the journey to having a generic configuration file for Node.js invocations that is looked up on execution. If such a thing were to be designed, having a key for this configuration seems like a good solution to me.

So starting from a flag / environment variable I think we can get there. But I think this PR has a better chance of landing if it were to take this approach rather than introduce a new pattern which provides another statting reliance and pattern for people to learn.

@arhart
Copy link
Contributor Author

arhart commented Jul 23, 2022

@guybedford could you elaborate on your single path idea? With Node.js looking in so many paths for dependencies, especially with symlink handling, perhaps you have a simplified or recommended pattern in mind?

@guybedford
Copy link
Contributor

guybedford commented Jul 23, 2022

@arhart sure, if you're after an explicit suggestion I can certainly provide that. My simplified or recommended pattern for implementation would be:

node --ceiling-path="./app" --ceiling-path="./"

Possibly alternatively supporting some ; or : separator form for a combined flag:

node --ceiling-paths=".;app"

In due course then extending support for such a flag through project-level / app-level configuration system. Name can be bikeshed further of course.

This is just my preference, it's an open process and you have no explicit blocks, so it is your decision how to navigate further of course.

@arhart
Copy link
Contributor Author

arhart commented Aug 9, 2022

@guybedford Thanks. I think I see the direction you are suggesting.

While some sort of mechanism for ambient configuration of Node.js seems useful, it also seems like a long route to walk to get there.

@arhart
Copy link
Contributor Author

arhart commented Aug 9, 2022

I wouldn't characterize this feature as specifying path(s) within which Node.js should execute code. Instead, I'd say it's about stopping module resolution at particular path(s).

If an application dynamically downloads and loads a module, then it seems good for the path(s) [where] module resolution stops to be similarly flexible.

@arhart
Copy link
Contributor Author

arhart commented Aug 9, 2022

An install tool (ex: npm) knows where it is placing modules and their dependencies. It might not have visibility to the larger application structure or even how many applications use those modules.

@GeoffreyBooth
Copy link
Member

I’m very hesitant to modify the CommonJS resolution algorithm at this point. There are so many other tools that depend on it and reimplement it, like many of the bundlers out there. I would encourage you to implement this idea as a loader, at least at first; that can prove it its viability, and then you can make the argument that it should be part of Node core.

@arhart
Copy link
Contributor Author

arhart commented Aug 9, 2022

@GeoffreyBooth is that a general hesitancy to change something so central or are there also specific reasons to think such a change would not be compatible with existing code?

Implementing as a loader could be interesting, at least the parts that loaders currently allow. My understanding is that loaders can only be setup by command line arguments and can't change how CommonJS modules are loaded. What do you think might be learned by having a partial loader implementation rather than the more complete branch in this pull request?

@GeoffreyBooth
Copy link
Member

@GeoffreyBooth is that a general hesitancy to change something so central or are there also specific reasons to think such a change would not be compatible with existing code?

Changes to the resolution algorithm impose a heavy burden on the community. Many tools, especially build tools, reimplement the resolution algorithm and therefore would need to be updated to accommodate this change. Before we impose that cost on everyone, I’d like some indication that anyone beyond you would use this feature. Shipping it as a loader (using the ESM hooks for modifying ESM resolution, and monkey-patching require to update the CommonJS resolution) and seeing it find adoption would prove that this is something that should become part of core, and that all the other affected tools are updating for a worthwhile reason.

@guybedford
Copy link
Contributor

I wouldn't characterize this feature as specifying path(s) within which Node.js should execute code. Instead, I'd say it's about stopping module resolution at particular path(s).

In both forms, it is and always is just a list of paths, just with different sources of truth over which paths constitute stop paths. Roundabout paths to goals are unfortunately usually necessary!

@bmeck bmeck mentioned this pull request Aug 19, 2022
4 tasks
@richardeschloss
Copy link

richardeschloss commented Sep 16, 2022

Hi 👋 thanks for considering a feature like this! Some package managers like yarn LTS (3.x) allow an install config to define the ceiling during installs (which should override the hoisting limits but I'm not sure if it's for ESM loading or just CommonJS?).

So, could Node also look at those config options in package.json when it does it's module resolution? I think if someone has a file tree like:

app.js
node_modules/ <-- contains lib/v1.js
workspace_a/
  - node_modules/  <-- contains lib/v2.js
  - file1.js
  - package.json

And inside workspace_a's package.json, is:

{
  ...
  "nmHoistingLimits": "workspace",
  ...
}

I would want this to mean: when I do import * from 'lib', I would want to not search the parent node_modules, only the lib in the workspace.

For the time being, I have to use the resolve hooks:

import { resolve as pResolve } from 'path';

const parentDir = pResolve('..');
const ignoreDir = `file://${parentDir}/node_modules`;

export async function resolve(specifier, context, nextResolve) {
  let localPkg
  if (specifier.startsWith(ignoreDir)) { // If the dep is tin the parent...
    const pkg = specifier.match(/node_modules\/([^\/]+)\//)[1] // pull out the pkg name...
    localPkg = await import.meta.resolve(pkg).catch(() => {}) // and only use the local one, not the parent's
  }
  return nextResolve(localPkg || specifier); // return the local one if found, otherwise carry on...
}

@arhart
Copy link
Contributor Author

arhart commented Jan 6, 2023

@richardeschloss Thanks for the resolve hook example.

Here is some code I sketched out in August for the CommonJS case, but I never got back to it. Some use cases really want some sort of ambient configuration to load the necessary hooks. Perhaps something like #44975

'use strict';
const fs = require('fs');
const Module = require('module');
const path = require('path');
const ModuleFindPath = Module._findPath;
Module._findPath = function(request, paths, isMain) {
  if ( paths && paths.length && !path.isAbsolute( request ) ) {
    /* By modifying the paths argument to ModuleFindPath rather than
     * completely replacing Module._findPath we:
     * 1. (+) are more accepting of changes to Module._findPath, including
     *    other monkeypatches
     * 2. (-) eagerly stat for node_ceiling rather than lazily
     * 3. (-) lack access to the statCache in lib/internal/modules/cjs/loader.js
     * 4. (-) lack access to internalModuleStat and its optimizations
     *
     * By monkeypatching Module._findPath rather than editing it directly we:
     * 1. (+) avoid requiring a change in core
     * 2. (-) require some caller to call this code "soon enough" for each
     *    script which needs this protection
     * 3. (-) introduce a dependency on undocumented implementation
     *    details of CommonJS module loading in Node.js
     */
    // TODO say something about how this addresses a security concern
    // TODO say something about --experimental-policy and NODE_OPTIONS
    // TODO say something about ESM
    // TODO say something about package.json
    // TODO copy paths, look for node_ceiling, modify copy
    console.log('paths is', paths);
    const oldPaths = paths;
    paths = [];
    let foundRoot = false;
    let foundCeiling = false;
    for( const curPath of oldPaths ) {
      if ( !foundRoot ) {
        foundRoot = curPath === path.resolve( curPath, '../../node_modules' );
        if ( foundRoot ) {
          console.log( 'foundRoot', curPath );
        }
        if ( foundCeiling ) {
          continue;
        }
      }

      if ( !foundCeiling && !foundRoot ) {
        const ceilingPath = path.resolve( curPath, '../node_ceiling' );
        console.log( 'looking for node_ceiling at', ceilingPath );
        const statFoundFile = fs.statSync( ceilingPath, { throwIfNoEntry: false } )?.isFile() ?? false;
        if ( statFoundFile ) {
          console.log( 'found node_ceiling at', ceilingPath );
          foundCeiling = true;
        }
      }

      paths.push( curPath );
    }
  }
  return ModuleFindPath( request, paths, isMain );
};

require('no-such-thing');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants