Skip to content
This repository has been archived by the owner. It is now read-only.

Improving support for symlinks in the module system #46

Closed
jasnell opened this issue Dec 6, 2016 · 88 comments

Comments

@jasnell
Copy link
Member

commented Dec 6, 2016

Continuation of the discussions nodejs/node#10132 and nodejs/node#10107 ... Please use this thread to discuss the high level architectural, merits, and long term impact of the proposed change so that the PR discussion can be focused on the actual code review.

@jasnell jasnell referenced this issue Dec 6, 2016

Closed

Fix --preserve-symlinks, Enhancement to exploit #10132

3 of 4 tasks complete
@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

@phestermcs Please stop trying to change the process. I'm literally in the middle of writing a response to the issues you brought up in the PR right now. Let's do things the way that the Node project does things, respecting the established process, and try to follow the lead set out by the project maintainers.

@ghost

This comment has been minimized.

Copy link

commented Dec 6, 2016

I'm attempting to offer solutions for the following when using symlinks:

  • Memory Bloat
  • Addon Crashing
  • Tooling issues related to "main.js" file symlinks causing the __dirname to be the realpath as opposed to the link's target (where there may by symlink dirs in the target path the must be preserved)
  • Using Machine Level stores while keep dependency version resolution coupled to a given /node_modules root.
@mikeal

This comment has been minimized.

Copy link

commented Dec 6, 2016

Pinging @bmeck. We should consider the impending ESM support and any impact module system changes might have with that in mind.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

Re nodejs/node#10132 (comment)

@isaacs You're link to a proposed solution seems to be addressing peer dependencies, and is not clear as to how it would address the issue you state it would.

I see. So, to make sure we're on the same page, let me try to describe the problem, and y'all can tell me if I'm missing anything.

Suppose two apps with the same logical dependency requirements:

one
├── baz@1.2.5
└── foo@1.2.3
two
├── baz@1.2.5
└── foo@1.2.3

Let's further stipulate that foo depends on bar@1.x and baz depends on bar@9.x.

foo@1.2.3
└── bar@1.x
baz@1.2.5
└── bar@9.x

We install in one, and get this logical result:

one
├── baz@1.2.5
│   └── bar@9.0.0
└── foo@1.2.3
    └── bar@1.2.5

Some time later, after some updates have been published to bar, we install in two and get this result instead:

two
├── baz@1.2.5
│   └── bar@9.8.7
└── foo@1.2.3
    └── bar@1.2.5

If we use an approach like pnpm and ied, where every instance of foo@1.2.3 and baz@1.2.5 are symlinked into a shared store, and in that shared store, they each have their deps linked into their appropriate node_modules folders, then when we install two, it performs some unexpected action at a distance, updating one's meta-dependency because it updates baz@1.2.5 in the module store.

So this arrangement would thus be impossible:

one
├── baz@1.2.5
│   └── bar@9.0.0  <-- because this...
└── foo@1.2.3
    └── bar@1.2.5
two
├── baz@1.2.5
│   └── bar@9.8.7 <-- conflicts with this
└── foo@1.2.3
    └── bar@1.2.5

This is of course trivial to do without symlinking into a store, because you just have multiple copies of stuff. This is the "disk is cheap" approach. It works pretty well today, because disk is pretty cheap, but it does incur some less obvious costs like building packages, downloading multiple times, etc. There are some other ways to mitigate those costs, of course, and they should be explored separately.

However, in this challenge, we want to say that there's exactly one copy on disk of baz@1.2.5 package contents. The "disk is expensive" model.

Furthermore, in any satisfying solution, (A) there must be a logically correct dependency graph loaded at run time (ie, everything gets a copy of what it depends on, meaning that there are 2 copies of bar in the runtime no matter what, to resolve the dependency conflict), and (B) the two apps have different versions of bar@9.x served to their respective (linked) copies of baz. (If we just did the "resolve deps based on symlink destination as well as target" approach, we'd have a problem because we can't put both versions of baz in one/node_modules or two/node_modules.)

If I'm not understanding the nature of the logical puzzle, please let's stop and clarify. I don't want to rush to solutions until the problem is very well understood.

@bmeck

This comment has been minimized.

Copy link
Member

commented Dec 6, 2016

No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

Now that we have established the technical challenge and agreed on the terms of it, I'd like to seek alignment on the heuristics to be used in evaluating potential solutions.

Here's a sketch of a list of what's important to me, personally, in ranked priority order. This is a question of values, and it's necessarily somewhat subjective, so I'd love to get some input from others in the nodejs core maintainer group about what the project's priorities are.

  1. It must be backwards compatible with current node, npm, ied, and pnpm use cases. If the only way to fix this problem is to break our current ecosystem, then we just have to live with the problem. We must not up-end our community, no matter what.
  2. It must satisfy the conditions of the challenge. (Of course, or else it's not a solution.)
  3. It must add a minimum of new surface area to the module loading architecture. (If there is a solution that requires no change to the node module loader, then that would be ideal here, but that might not be possible. Less change is better, in general.)
  4. It must not cause a significant slowdown in the Node.js startup time. (For example, searching 3 extra module locations is almost certainly fine. 300 would likely be a problem.)
  5. It should ask a minimum of additional complexity on the part of module architecture schemes seeking to solve this sort of problem. This is the lowest priority, because once we show that it is possible, the implementation is just a matter of typing.

Does this seem like a fair set of heuristics for judging solutions? I realize that these will be somewhat in conflict with one another, but that's why they call it a trade-off :)

If it seems that I'm being overly methodical and pedantic, it is because any change to the module system demands extreme diligence. Last time there was a disruption here, I promised that I would do everything in my power to help avoid it in the future.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

Or is it your intent that this exceptional change not be handled in this exceptional way?

Hah, my intent is that we make any change as un-exceptional as possible ;)

@ghost

This comment has been minimized.

Copy link

commented Dec 6, 2016

Regarding the condition of the challenge, I think it should also include a way out of symdir cycles when representing module dependency cycles.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

@phestermcs Can you elaborate on what you mean by "a way out of symdir cycles"?

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

This is an example where the modules have a cycle, symlinks to those modules are still being used, but themselves dont create a directory cycle, which can cause infinite loops in some forms of tools.

So, is it a case like this?

A -> B
B -> C
C -> A

So, my app depends on A, A depends on B, B depends on C, and C depends on A.

You want to avoid situations where I can end up with a path like myapp/node_modules/A/node_modules/B/node_modules/C/node_modules/A/... because that makes some other file-viewing tool like a text editor or build script or whatever freak out.

Am I understanding it properly? If so, that seems like a fair restriction to me.

@ghost

This comment has been minimized.

Copy link

commented Dec 6, 2016

Correct. Fwiw, Sticking with subordinate /node_modules would prevent being able to meet this constraint.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

Yes, sticking with the somewhat naive "link everything into the store" approach, where things in the store have a node_modules folder with other links to things in the store, fails this approach and also fails the other requirements of a solution.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

@phestermcs I mean you have a folder structure like this:

├── app
│   └── 2.2.2
│       ├── index.js
│       └── node_modules
│           ├── baz -> ../../../baz/1.2.5
│           └── foo -> ../../../foo/1.2.3
├── bar
│   ├── 2.3.4
│   │   └── index.js
│   └── 9.8.7
│       └── index.js
├── baz
│   └── 1.2.5
│       ├── index.js
│       └── node_modules
│           └── bar -> ../../../bar/9.8.7
└── foo
    └── 1.2.3
        ├── index.js
        └── node_modules
            └── bar -> ../../../bar/2.3.4

That won't work because updating a second app updates the bar module under foo's node_modules folder in the shared store, and also if bar depends on foo, you get a symlink cycle.

This is the naive "use symlinks into a central store" approach, but it falls down on 2 points. (I'm using "naive" as a compliment here, meaning "do the simplest, most obvious, least clever thing".)

@ghost

This comment has been minimized.

Copy link

commented Dec 6, 2016

@isaacs Ok, I thought that's what you may have meant; just double checking.

But that does remind me of still another "meet the challenge" point. There also is the case of handling "bundled" node_modules as well. i.e. It should still be possible to have a module in a machine store that does have a '/node_modules' subfolder but all of it's content is 'locked' to that module (no symlinks to outside of module), most simply as copies. For example npm itself ships with all its dependencies.

@isaacs

This comment has been minimized.

Copy link

commented Dec 6, 2016

@phestermcs Please let's keep the scope of this problem and solution limited, or else we'll never be able to make progress in reasonable time. Saying that any solution is backwards compatible means that any existing solutions to those problems (for example, just copying files into folders) will continue to work just fine.

@isaacs

This comment has been minimized.

Copy link

commented Dec 7, 2016

Yes, backwards compatibility means that current apps work with the future node. Forwards compatibility, where future apps work with current node, is much harder to attain for any semantic change or bug fix. Presumably the entire point is that there's a problem we want to fix or something we want to do that is not currently possible.

@isaacs

This comment has been minimized.

Copy link

commented Dec 7, 2016

Here's another proposal possibility that might work.

  1. Track both the requested path (where the module was found) and the realpath (where the module actually is) on the module object.
  2. Caches remain as they currently are (using realpath as a key)
  3. node_module lookup path is the node_module directory set from the realpath and then the node_module directory set from the requested path.

I haven't tested it yet, but I believe that would solve the peer dependency issue addressed by --preserve-symlinks (similar to how adding the cwd-based node_module lookup paths would in the proposals in isaacs/node6-module-system-change), and also open the door for two module-store symlink based approaches that could satisfy the necessary criteria listed above.

The first module store symlink approach is to create actual nested directory hierarchies, but symlink the contents of the packages into their destinations. This results in a layout on disk that is more parallel to the naive "disk is cheap" approach, and that parallel is appealing.

However, it necessarily means that the package manager has to keep a record of files in each package, and it's WAY more actual symbolic links. If packages are linked from their development location into the module store, then the developer will have to refresh all of their module installations whenever a file is added to the package. (Maybe that happens rarely? Or can be somehow detected/automated?) So, it doesn't score very highly on the 5th heuristic (minimizing package manager complexity).

That would look something like this in practice (where index.js can be many files, of course)

app1
├── index.js
└── node_modules
    ├── baz
    │   ├── index.js -> ../../../.store/baz/1.2.5/index.js
    │   └── node_modules
    │       └── bar
    │           └── index.js -> ../../../../../.store/bar/2.3.4/index.js
    └── foo
        ├── index.js -> ../../../.store/foo/1.2.3/index.js
        └── node_modules
            └── bar
                └── index.js -> ../../../../../.store/bar/9.0.0/index.js

The second module store symlink approach is to lay files out on disk with a package.json file containing {"main":"content"} and a content folder that links into the store. This loses the parallelism with the naive copying approach, but it also means that package contents can be modified without a rebuild. It also adds an additional folder and some extra file system activity on startup, so the performance impact will have to be measured.

It looks like this:

app3
├── index.js
└── node_modules
    ├── baz
    │   ├── contents -> ../../../.store/baz/1.2.5
    │   ├── node_modules
    │   │   └── bar
    │   │       ├── contents -> ../../../../../.store/bar/2.3.4
    │   │       └── package.json
    │   └── package.json
    └── foo
        ├── contents -> ../../../.store/foo/1.2.3
        ├── node_modules
        │   └── bar
        │       ├── contents -> ../../../../../.store/bar/9.0.0
        │       └── package.json
        └── package.json

I'll try to find some time to make a patch for this minimal module system change so that we can investigate it further and try to shake out some other tradeoffs.

Once there are at least 2 or 3 options, we can figure out which one satisfies the criteria the best, and with a minimum of new footguns.

@isaacs

This comment has been minimized.

Copy link

commented Dec 7, 2016

@phestermcs Both module store approaches would depend on the same change to the module loader, described at the start of my post in 3 bullets. The existing module loader already knows what to do with the package.json file as described.

The only internal change to node would be that the module search paths include both the node_modules folders based on the realpath as well as the requested path.

@zkochan

This comment has been minimized.

Copy link

commented Dec 7, 2016

If the symlink-folder will be always named contents, why should there be a package.json?

This structure would be enough, wouldn't it? package.json is in contents anyway...

app3
├── index.js
└── node_modules
    ├── boo
    │   ├── contents -> ../../../.store/boo/1.0.0
    │   └── node_modules -> ../../../.store/boo/1.0.0/node_modules
    ├── baz
    │   ├── contents -> ../../../.store/baz/1.2.5
    │   └── node_modules
    │       └── bar
    │           └── contents -> ../../../../../.store/bar/2.3.4
    └── foo
        ├── contents -> ../../../.store/foo/1.2.3
        └── node_modules
            └── bar
                └── contents -> ../../../../../.store/bar/9.0.0
@zkochan

This comment has been minimized.

Copy link

commented Dec 7, 2016

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

@isaacs

This comment has been minimized.

Copy link

commented Dec 8, 2016

I wrote up a patch for what I'm suggesting: isaacs/node@d0f9a99

It isn't yet ready for a PR (not least because it needs tests, docs, and the like), and I'd like to drive this discussion towards a shared understanding of the problem and requirements of a solution. I only wrote this patch because sometimes it's better to communicate thoughts in code rather than prose :)

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

Indeed! That's a shortcoming of the contents folder approach! Symlinking package contents would still work, though.

Another unaddressed concern: loading deps from modules loaded within the dependent module.

For example, if we slightly alter my foo package in the examples above to look like this:

// foo/index.js
console.error('in foo', module.paths)
require('./other-module.js')
// foo/other-module.js
console.error('in foo/other-module.js', module.paths)
var bar = require('bar')
console.log('foo using %s', bar)

Then we get this result:

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]
module.js:498
    throw err;
    ^

Error: Cannot find module 'bar'
    at Function.Module._resolveFilename (module.js:496:15)
    at Function.Module._load (module.js:443:25)
    at Module.require (module.js:529:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/other-module.js:2:11)
    at Module._compile (module.js:602:32)
    at Object.Module._extensions..js (module.js:611:10)
    at Module.load (module.js:519:32)
    at tryModuleLoad (module.js:473:12)
    at Function.Module._load (module.js:465:3)

This can be fixed by also passing along the parent module paths to its children modules. I'll add that in soon.

@phestermcs I must admit I don't understand what you're talking about with respect to main.js in your most recent comment. Can you elaborate with a minimal example?

@isaacs

This comment has been minimized.

Copy link

commented Dec 8, 2016

Yeah, this fixes the issue: isaacs/node@f727658

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
foo using bar 9.0.0
baz using bar 2.3.4

EDIT: updated commit link after rebasing to node master

@ghost

This comment has been minimized.

Copy link

commented Dec 8, 2016

@isaacs

node must "experience" the /node_modules rooted symbolic directory structure just like a human would when viewing through a file system explorer and expanding the directory hierarchy. Converting "main.js"'s path all the way down to its absolute realpath on program launch, and using that as it's __dirname prevents this, and breaks tooling.

Effectively, all modules derive there __dirname from the __dirname of the parent module that first required them, and all require resolutions are relative to __dirname. The main modules __dirname, a function of "main.js"'s path, is the root, or start, of this.

Example

In practice the structure and how accessed (tooling launching tooling, etc) is more complex and has different patterns, so this is a contrived simplification to highlight the fundamental problem inherent to all of them.

/store
  /peer
    /v1
      index.js
  /mod
    /v1
      index.js

/projectA
  /bin
    mod        -> ../node_modules/mod/index.js
  /node_modules
    /peer      -> /store/peer/v1
    /mod       -> /store/mod/v1
      index.js                        // var = require('peer')

$> node /projectA/bin/mod
Exception: cant find module 'peer'

With current behavior of converting "mod"'s path to its realpath, and using as the main modules __dirname, the require('peer') attempts to resolve relative to the "real path" /store/mod/v1, when it should resolve from the "symbolic path" /projectA/node_modules/mod.

In your proposed solution, using a realpath in the search list will also have the same effect, just lower down in the link tree.

@isaacs

This comment has been minimized.

Copy link

commented Dec 8, 2016

@phestermcs I'm trying to find solutions that do not add additional new API surface, including new folder paths that users might be confused by.

Adding module+node_modules as a folder to be searched at each level is an additional thing to document and explain. That is a cost. But some of these edge cases (not finding peer deps, resolving based on the real path only rather than linked path, etc.) do routinely come up as misunderstandings of how the current system works, so there's less cost in adding support for them.

And since it only adds support for something that didn't work at all before, and only at a lower priority than the current search paths, there's very little chance of someone's existing use case being disrupted.

@mikeal

This comment has been minimized.

Copy link

commented Dec 8, 2016

Adding lookup paths will have a performance impact as well. We've done a lot of work recently to make all this faster and if we go the .esm router we're already taking another perf hit from lookups. I'm a little worried about the performance regression this could cause.

@isaacs

This comment has been minimized.

Copy link

commented Dec 8, 2016

@phestermcs @mikeal Yes, I agree, that's a concern. Adding 2 or 3 additional folder lookup paths is probably fine. Adding 10 is less likely to be ok.

So, if we go from this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]

to this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules' ]

then that feels a lot less risky to me than this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/foo+node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/.store+node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/dev+node_modules',
  '/Users/isaacs/node_modules',
  '/Users/isaacs+node_modules',
  '/Users/node_modules',
  '/Users+node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo+node_modules',
  '/Users/isaacs/dev/app4/node_modules',
  '/Users/isaacs/dev/app4+node_modules' ]
@ghost

This comment has been minimized.

Copy link

commented Dec 8, 2016

@isaacs @mikeal I had this very same concern, but in normal use, most of the time only the first couple of paths actually get searched before somethings found, not all of them, and with the caching enhancements, the actual stat() cost only happens once

@ghost

This comment has been minimized.

Copy link

commented Dec 8, 2016

For now, don't think of my solution as 'the' solution, but as a fully working one.. a baseline.. a bar to meet. a way to actually experience any solution.

@isaacs

This comment has been minimized.

Copy link

commented Dec 12, 2016

@bmeck Ah, that's a great point!

Since it's strictly additive, we could say that all of the parents in the import step can add their unique node_modules folders to the set.

Alternatively, we could say that only the first importer is the "parent", which is closer to how CJS modules would work anyway. That is, even if a CJS module is loaded multiple times, the "parent" is always the first one, and it's cached forever.

@bmeck

This comment has been minimized.

Copy link
Member

commented Dec 12, 2016

@isaacs if it isn't kept up to date w/ all parents doesn't that mean order of imports becomes important for ESM's resolution algorithm (it isn't currently)?

@isaacs

This comment has been minimized.

Copy link

commented Dec 13, 2016

@bmeck Yeah, whether it includes all parents or just the first one, any inheritance of node_modules paths will mean that the order of imports is relevant. Consider this layout on disk:

.
└── node_modules
    ├── baz -> ~/.store/baz
    │   └── node_modules
    │       ├── bar -> ~/.store/bar   (a)
    │       └── bloo   (v1)
    └── foo -> ~/.store/foo
        └── node_modules
            ├── bar -> ~/.store/bar   (b)
            └── bloo   (v2)

So, there are two copies of bloo on disk. If bar does require('bloo'), then (with isaacs/node@ep-46 proposal) it'll get the one from the module (foo or baz) that first loaded bar.

Even if both of the parent modules' node_modules folder paths are added to bar's set, whichever parent is added first will have higher priority.

Currently, bar will just fail when it runs require('bloo'). With --preserve-symlinks, it'll load 2 copies of bar into memory, and with bar (a) getting bloo (v1) and bar (b) getting bloo (v2). If we land isaacs/node@ep-46, then it'll depend on whether the app first loads foo or baz. If foo is loaded first, then bar (b) will get loaded, and pull in bloo (v2). When baz is loaded, it'll get bar from cache, including bloo (v2). If baz is loaded first, then the reverse will happen.

I'm honestly unclear about the best way to decide which of these approaches is best, even before worrying about ESM. Certainly resolving everything is better than resolving nothing (ie, --preserve-symlinks), if only for backwards compatibility, but the non-determinism of any compromise makes me a little bit itchy.

@wmhilton

This comment has been minimized.

Copy link

commented Dec 13, 2016

Does that get around the breaking cases I mentioned above? -@isaacs

1. A file symlink that is not the main module. Haven't tested it.

2. A file symlink that is a link to a symlink, even as the main module. Still fails.

3. Since anm only looks one symlink deep....../home/isaacs/node_modules won't be in the module search path. I'm pretty sure that is the desired behavior though, isn't it?

I have been thinking about it, and I've concluded that symlinks per se aren't needed to pull off machine-level stores. For instance, it would be an acceptable compromise to wrap commands with a shim, because npm already does this on Windows. If we're going to need to wrap top-level applications in a shim regardless, we might as well use that shim to monkey-patch module.require. Maybe everyone will boo and hiss at that idea, but I'm really a fan of keeping core lean, and being able to solve machine-level stores purely with userspace code appeals to me. Thoughts?

@bmeck

This comment has been minimized.

Copy link
Member

commented Dec 13, 2016

@wmhilton the ESM proposal does not allow modifying the resolution algorithm. The very specific hooks that are safe to allow are shown in the slides from the Sept TC39 meeting. I would take a look at that and see if it is possible using ESM cache manipulation or if your solution only working in CJS works.

@isaacs

This comment has been minimized.

Copy link

commented Dec 13, 2016

@wmhilton #46 (comment) Warning! This is definitely possible, but leads almost invariably to a very dark and twisted maze of shims and complexity.

The reason that node and npm switched to exclusively using node_modules paths for resolution as of npm 1.0 and node 0.4 is precisely because the "shims upon shims" creates a lot of surface area for edge case bugs, and makes programs difficult to debug.

I've spoken about this with a few folks who actually see the "update metadeps at a distance" as a feature rather than a downside of ied and linked module directories. (Not people using ied itself, but using npm link, and eager to try out ied/pnpm for this reason, because "it sounds like npm link, but even better".) Again, I'm not sure how this changes the breakage/value considerations here, but at least, it's worth keeping in mind that removing support for "lookup deps based on realpath" is not going to be an unmitigated benefit.

@bmeck I think that we should be very wary of adding any features that will make ESM have to behave (even more) significantly different from CJS modules. The more parallels we can maintain, the better it'll be for people transitioning from one to the other.

Thanks for sharing these slides. I'll review these this week and try to see if I can imagine a way to conceptualize the same effective behavior in a way that doesn't violate the constraints of ESM loading.

The nice thing about both the current behavior and --preserve-symlinks is that the module lookup location is only based on the module itself, and not the order of loading or the first parent module.

At first glance, it looks like it'll be very difficult to maintain this unless we go the route (like --preserve-symlinks) of saying that a module loaded from two different symlink locations is a fundamentally different module. That is, taking up another spot in the cache and in memory. It may be possible to start with some of the presumptions of --preserve-symlinks, but walk back from there to adding the realpath-based lookup set. This will of course break our de facto singleton behavior, but that's not an entirely safe assumption anyway.

@isaacs

This comment has been minimized.

Copy link

commented Dec 13, 2016

Warning! This is definitely possible, but leads almost invariably to a very dark and twisted maze of shims and complexity.

Just to be clear, I mean this warning in the sense of "That is going to be an exciting adventure!", and I strongly encourage you to try it, if you've got the time and are in the mood for adventure. But it definitely won't be safe, so it shouldn't be something that npm or node-core try to do :)

@zkochan

This comment has been minimized.

Copy link

commented Dec 13, 2016

@isaacs I wanted to try your fork of Node with a slightly modified pnpm.

I had this error when trying to run the pnpm tests:

module.js:498
    throw err;
    ^

Error: Cannot find module '/home/zkochan/src/pnpm/node_modules/.bin/_bin.js'
    at Function.Module._resolveFilename (module.js:496:15)
    at Function.Module._load (module.js:443:25)
    at Module.runMain (module.js:643:10)
    at run (bootstrap_node.js:420:7)
    at startup (bootstrap_node.js:139:9)
    at bootstrap_node.js:535:3

Is your fork of Node ready for experimenting with? Let me know if I can help with something.

@bmeck

This comment has been minimized.

Copy link
Member

commented Dec 14, 2016

@isaacs keep us up to date, i am concerned with potential startup time increase from searching if symlinks are overly common and the fact that things become more non-deterministic than they already are.

@isaacs

This comment has been minimized.

Copy link

commented Dec 14, 2016

@bmeck Yeah, the ep-46 branch is probably dead on the vine because of the nondeterminism. The startup time is likely not going to be relevant. It only adds 4-5 lstat calls total, since the stats are cached, and all modules in use are likely linked into the same store path. Now that realpath uses the system builtin, it's also a lot faster than it used to be.

@zkochan It should work fine, I'm surprised that any tests are failing. Does it work with node on master?

@MylesBorins

This comment has been minimized.

Copy link
Member

commented Dec 14, 2016

@isaacs is there a change in the ep-46 branch to use the native real-path? AFAIK we reverted it and didn't maintain the implementation

@zkochan

This comment has been minimized.

Copy link

commented Dec 14, 2016

@isaacs I tried with master. No issues there

@isaacs

This comment has been minimized.

Copy link

commented Dec 15, 2016

@thealphanerd Ah, ok. I'd thought it was using the native realpath and then falling back to the JavaScript implementation, but I guess that was just a suggestion I heard somewhere and didn't make it into core.

@zkochan Is it possible to reproduce that test in a reasonably minimal way? Anything that works on master should work on ep-46.

@zkochan zkochan referenced this issue Dec 16, 2016

Closed

Update module.js #1

@wmhilton

This comment has been minimized.

Copy link

commented Dec 16, 2016

Just to be clear, I mean this warning in the sense of "That is going to be an exciting adventure!", and I strongly encourage you to try it, if you've got the time and are in the mood for adventure. But it definitely won't be safe, so it shouldn't be something that npm or node-core try to do :)

Thanks! And yes, it's precisely because it is crazy and out there that I'm going to try to do it in user space via monkey patching rather than by branching node core. It'll be fun. When/if I get a machine-store loader working I'll post back with a link.

@zkochan

This comment has been minimized.

Copy link

commented Dec 17, 2016

@isaacs with a small fix in your ep-46 branch that you can see in this PR, pnpm tests are passing.

I checked and after your changes pnpm works with regular Node.js. No --preserve-symlinks mode is required.

I totally vote for your solution!

@zkochan

This comment has been minimized.

Copy link

commented Dec 26, 2016

I noticed that @isaacs's solution uses the real path for __dirname. As a consequence, babel-cli fails because it uses __dirname in some strange way to execute a file from the same dir in a new process (what I mean: https://github.com/babel/babel/blob/master/packages/babel-cli/src/babel-node.js#L11).

This has made me thinking, should there be some new variable that will have the symlink location of the file? The symlink equivalents of __dirname and __filename. Something like __ldirname and _lfilename

@zkochan

This comment has been minimized.

Copy link

commented Feb 3, 2017

I have an update regarding this thread. I've been working on changing pnpm, so that it does not rely on the --preserve-symlinks feature of Node. And my attempts have succeeded!

The latest version of pnpm (which is 0.51.2) uses a global (machine) store and works without any changes in Node.js.

We did a lot of tweaks to make it work, but the main ones are:

  • files are linked (not symlinked) from the global store to the project's node_modules
  • command shims are rewritten to set the NODE_PATH env variable before running the binstubs.

So it is achievable to create a global store without changes in Node.js and without --preserve-symlinks. And performance is good enough with linking.

@wmhilton

This comment has been minimized.

Copy link

commented Feb 4, 2017

Just updating this to say I no longer care about symlink support. Since we've managed to come up with two ways to create a global store without using symlinks (require-hacking and hard links) I think the global store use case is no longer a valid justification for changing the behavior of require in node core.

@isaacs

This comment has been minimized.

Copy link

commented Feb 4, 2017

@zkochan @wmhilton Interesting, thanks for the update.

That still doesn't solve the symlinked peer-dependency use-case though, correct?

@wmhilton

This comment has been minimized.

Copy link

commented Feb 4, 2017

@isaacs Er... what was that use case again?

@isaacs

This comment has been minimized.

Copy link

commented Feb 4, 2017

@wmhilton

node_modules/
├── framework@ -> /path/to/framework
└── plugin@ -> /some/other/path/to/plugin

I do require('plugin') and then plugin does require('framework') and can't find it.

@wmhilton

This comment has been minimized.

Copy link

commented Feb 4, 2017

@isaacs I don't think require-hacking nor hard-linking would necessitate that situation though... @zeke is this the situation you were just describing in the webpack plugin issue on pnpm?

@zkochan

This comment has been minimized.

Copy link

commented Feb 4, 2017

@isaacs your example is a little bit confusing. Plugin will most likely have framework as a peer dependency, and pnpm links peer deps to the node_modules folder of the dependent packages. So framework would be symlinked to /some/other/path/to/plugin/node_modules.

On the other hand, if framework needs to require plugin, it will work if the framework is a CLI app. The node_modules/.bin/framework cmd shim would add node_modules to the NODE_PATH env variable. So framework would find plugin. If framework is not a CLI app, probably it is used programmatically and plugins are passed into it.

Time will show, but this approach seems to work fine with apps like eslint, karma, babel

@isaacs

This comment has been minimized.

Copy link

commented Feb 5, 2017

The node_modules/.bin/framework cmd shim would add node_modules to the NODE_PATH env variable.

What system writes cmd shims that update the NODE_PATH env? npm doesn't do that.

Dig up the discussions that led to --preserve-symlinks being added added originally. This is definitely a valid use case that people actually have. I don't think --preserve-symlinks is a good solution, but it is a thing that's probably worth solving.

@zkochan

This comment has been minimized.

Copy link

commented Feb 5, 2017

I forked cmd-shim and added the possibility to set the NODE_PATH env. pnpm then defines the value of NODE_PATH.

This is my personal opinion, but I think the solution should be either the same as --preserve-symlinks or the current, which is resolving to real path. Other solutions would extend too much the possibilities of require. Like flat node_modules did it with npm@3. We can end up in a situation where any silly code's require will work because of the flat structure + the symlink logic. It won't matter if the dependencies were specified in package.json or not.

@wmhilton

This comment has been minimized.

Copy link

commented Feb 5, 2017

Dig up the discussions that led to --preserve-symlinks being added added originally. This is definitely a valid use case that people actually have.

Is this for development purposes? Or are symlinks and --preserve-symlinks used in production? I'm just confused because I've installed packages that use peer dependencies and I've never seen npm insert a symlink for them.

I don't think --preserve-symlinks is a good solution, but it is a thing that's probably worth solving.

Maybe, but I have reason to believe at this point it could be solved outside of node core.

@bmeck

This comment has been minimized.

Copy link
Member

commented Aug 11, 2017

@jasnell is this still desired since --preserve-symlinks seems to be staying around?

@bmeck bmeck closed this Oct 13, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants
You can’t perform that action at this time.