Skip to content
This repository has been archived by the owner on Aug 11, 2022. It is now read-only.

npm install should deduplicate its tree as part of the install process #6912

Closed
1 task done
iarna opened this issue Dec 12, 2014 · 52 comments
Closed
1 task done

npm install should deduplicate its tree as part of the install process #6912

iarna opened this issue Dec 12, 2014 · 52 comments
Assignees
Milestone

Comments

@iarna
Copy link
Contributor

iarna commented Dec 12, 2014

This is intended to solve a few larger project goals:

  1. Installs should be smaller, without having to explicitly ask for it.
  2. For Windows user's sake, we should do our best to avoid long path names. Many windows tools are limited to 256 characters.

As such, @othiym23 and I have decided this should be a maximally flat deduplication, which means if you have a dep tree of:

A → B → C

Then if you install A, you'll get B and C next to it in your node modules. That is, it'll be installed as:

A
B
C

But we'll hold onto metadata to know why B and C were installed such that removing A will remove B and C as it would now.

As for devDependencies, they shouldn't have their dependencies folded in as above. Instead each of them should essentially act as its own root. So if A above were a dev dependency then it would be installed as:

A → B
    C

The following work remains to be done:

See also: #4761, #4037, #5465

@domenic
Copy link
Contributor

domenic commented Dec 12, 2014

One interesting consequence of this is that now you'll be able to do

{
  "dependencies": { "A": "1" }
}

but then

var a = require("A");
var b = require("B");
var c = require("C");

If this is undesired, you could probably sidestep it while still gaining many of the benefits by creating the structure

A
A/node_modules/B
A/node_modules/C

However, this is a bit of a slippery slope. If we say that preventing require-of-indirect-dependencies is a priority, that places a hard cap on the possible deduplication, e.g. in the scenario

A -> B
K -> B

you would be unable to deduplicate B. So, probably not worth it.

@iarna
Copy link
Contributor Author

iarna commented Dec 12, 2014

Yeah, this is all true, but even the existing npm dedup will under some circumstances hoist things to the top level, enabling that. I'm hoping this isn't too much of a footgun.

@othiym23
Copy link
Contributor

@domenic Do you have a concrete scenario in mind where this could become an issue? As @iarna indicates, this is already an issue with dedupe, and it hasn't been an issue to date.

We're planning on tracking why a hoisted package is there (i.e. its dependents), and also planning on operating on and displaying the logical tree rather than what's on disk (so npm ls will show dependencies rather than disk trees). This is a big change, and will probably resolve as many issues as it causes. Concrete scenarios would be really helpful for thinking through things.

@domenic
Copy link
Contributor

domenic commented Dec 12, 2014

No concrete scenario based on experience. I'm just imagining someone e.g. installing facebook's jest, then starting to do require("jsdom") and it'll work, which is problematic in various ways.

Or, people explicitly creating packages like "my-favorite-libs" that depend on a bunch of libs, then saying in the readme "after you do npm install --save my-favorite-libs, you can do require("underscore") or require("tape") or require("request")---no need to add them to your package.json yourself!"

It'll indeed help that npm ls and friends show the logical tree.

@piranna
Copy link

piranna commented Dec 12, 2014

Or, people explicitly creating packages like "my-favorite-libs" that depend on a bunch of libs, then saying in the readme "after you do npm install --save my-favorite-libs, you can do require("underscore") or require("tape") or require("request")---no need to add them to your package.json yourself!"

To prevent this, require() discovery algorythm should also be
modified to follow the logical tree instead of the phisical
directories. This has de advantage that the current behaviour will not
change since currently the logical al phisical trees are the same.

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un
monton de sitios diferentes, simplemente escribe un sistema operativo
Unix."
– Linus Tordvals, creador del sistema operativo Linux

@othiym23
Copy link
Contributor

Any changes to the Node module resolution algorithm are completely out of scope for npm, especially because that API is marked "locked" within Node itself.

@domenic
Copy link
Contributor

domenic commented Dec 12, 2014

Something to keep in mind though when designing the ES6 module loader for Node (in the so-far-hypothetical future where V8 ships ES6 modules).

@piranna
Copy link

piranna commented Dec 12, 2014

Any changes to the Node module resolution algorithm are completely out of scope for npm, especially because that API is marked "locked" within Node itself.

The behaviour and API would still be the same, it only change the
implementation... but it's true it's not a Node.js bug but instead a
NPM problem, so I don't know how it would be fixed... :-/

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un
monton de sitios diferentes, simplemente escribe un sistema operativo
Unix."
– Linus Tordvals, creador del sistema operativo Linux

@stuartpb
Copy link
Contributor

@domenic Isn't this the same problem I mentioned back in September?

@othiym23
Copy link
Contributor

@stuartpb Your issue and its various ramifications are real, but some of the things that @iarna is doing with her pass at deduplication in the multi-stage install work are designed to address some of the edge cases you raise.

On the one hand, people will now be able to inadvertently require things from the top level that they probably shouldn't be able to, and there's a risk they'll come to depend on things that aren't actually part of the install. On the other, npm will keep track of what's a direct dependency and what's a transitive dependency, and should be able to not lose its bananas when a dependency switches from one category to the other.

I don't think there's a solution to this problem that always prevents anyone from ever being surprised, but deduplication by default is otherwise such a win that I think the tradeoff is acceptable.

@timoxley
Copy link
Contributor

timoxley commented Jan 1, 2015

Wholly +1 on this.

Note that this is going to break a lot more package.json scripts incorrectly relying on executables being installed in node_modules/.bin. I'm wondering if perhaps it's worth putting an an explicit check for node_modules/.bin in scripts and shooting off a warning, because there's few valid reasons to reference node_modules/.bin directly in package.json scripts, or anywhere for that matter.

there's a risk they'll come to depend on things that aren't actually part of the install.

We could adapt some existing require-detecting tooling to pick this up pretty easily.

@isaacs
Copy link
Contributor

isaacs commented Feb 2, 2015

I'm wondering if perhaps it's worth putting an an explicit check for node_modules/.bin in scripts and shooting off a warning, because there's few valid reasons to reference node_modules/.bin directly in package.json scripts, or anywhere for that matter.

That's really smart. We should just strip node_modules/.bin from the start of them, I think, and maybe also warn.

@iarna iarna added in-progress and removed ready labels Feb 2, 2015
@iarna
Copy link
Contributor Author

iarna commented Feb 4, 2015

We actually already do strip node_modules/.bin already in read-package-json, here:

npm/read-package-json@56c7f5de

I'm going to be moving this over to normalize-package-data and adding a warn.

@iarna
Copy link
Contributor Author

iarna commented Mar 27, 2015

Closing this as fixed in npm/npm#multi-stage

@iarna iarna closed this as completed Mar 27, 2015
@iarna iarna removed the in-progress label Mar 27, 2015
@piranna
Copy link

piranna commented Mar 27, 2015

Great! :-D Could you stimate when/in what version this could be integrated in master? Maybe npm 3?

@jmm
Copy link

jmm commented Apr 27, 2015

@iarna is there any plan to do automatic deduping as part of npm update? I'm trying to figure out the best way to handle this scenario:

{
  "name" : "A",
  "dependencies": {
    "B": "^1.2.3",
    "C": "^4.5.6",
}

Pinned C:

{
  "name" : "B",
  "dependencies": {
    "C": "4.5.6"
}

It's desirable for A to depend on the same version of C that B does without having to manually keep them in sync by updating a pinned version number in A every time it changes in B.

@ivan-kleshnin
Copy link

This was a huge mistake I should say. Basic example: I have a lodash.isarray dependency, install a lot of gulp packages, remove lodash.isarray dependency... now all those gulp packages are broken because isarray was deduped at the moment of their installation. And now I need to either reinstall everything from scratch (traffic!) or to manually discover and reinstall broken things. Just wow.

@FractalizeR
Copy link

I have a lodash.isarray dependency, install a lot of gulp packages, remove lodash.isarray dependency...

Well, I guess we should not allow a required dependency to be uninstalled without the use of something like --force key. This is a part of dependency management.

@piranna
Copy link

piranna commented Apr 30, 2015

Or remove the other ones recursively, too.

@FractalizeR
Copy link

@piranna Yep, but some new option should be introduced to do that like --recursive

@piranna
Copy link

piranna commented Apr 30, 2015

@piranna Yep, but some new option should be introduced to do that like --recursive

👍

@ivan-kleshnin
Copy link

Well, I guess we should not allow a required dependency to be uninstalled without the use of something like --force key. This is a part of dependency management.

--force key sounds like you're saying "it's not my problem pal". It does not solve anything.

Please explain how do you see "correct" uninstall from the user point of view.
My app is no longer need lodash.isarray, so it's clearly NOT a required dependency.
It's not marked as peerDep either.

I shouldn't remember all deps of all deps, right? So the problem here is caused exactly by NPM deduplication, not any of my actions which were pretty basic.

By adding new options you'll just get more tricky cases on the next levels of relations.
I'm feeling like NPM is already fairly overcomplicated and we need to remove things, not add.

@FractalizeR
Copy link

--force key sounds like you're saying "it's not my problem pal". It does not solve anything.

It solves some cases where computer wants to be smarter than a human. Sometimes I want a computer to do something he thinks dangerous or impossible. For example, rollback migrations on production machine. Such actions must be confirmed with --force which means "I know that is dangerous, but I want you to do that no matter what you, brainless machine, think about it".

Please explain how do you see "correct" uninstall from the user point of view. My app is no longer need lodash.isarray, so it's clearly NOT a required dependency. It's not marked as peerDep either.

Dependency is something, that is required by any module in your system. Not just by the package you requested to be installed. This way it works with every other package manager: yum, apt-get, php-composer etc. I don't see why node should be an exception.

You cannot allow user remove a module for which is known that it is used by other modules. The fact, that user never installed it himself doesn't matter. You must present a user with a clear error message about this fact.

But still, there are some cases when this operation (module installation) MUST be performed disregarding consequences (we are programmers after all, not some silly children). For inspecting aftereffects, for example. Or may be we are going to replace all broken dependency links by a module with the same interface, but didn't edit npm registry. Or may be, module we are removing is used only by another modules, that are going to be pruned by my next command. Or... well, there could be some cases, believe me. Of course user must understand, that the use of --force key brings npm registry into a broken state.

By adding new options you'll just get more tricky cases on the next levels of relations. I'm feeling like NPM is already fairly overcomplicated and we need to remove things, not add.

Here I agree. But I do not consider npm complicated when you compare it to similiar package managers.

@ivan-kleshnin
Copy link

You cannot allow user remove a module for which is known that it is used by other modules. The fact, that user never installed it himself doesn't matter. You must present a user with a clear error message about this fact

Ok, i'm trying to remove lodash.isarray. I'll get error message. My next steps?
As I surely want to remove lodash.isarray I google and discover I should use --force flag.
I add it. Things got removed. Now I'm in the very same initial situation I've described.
What is improved for my workflow? Nothing, I just got another pesky step.

I can accept flag if it prevents you from doing something that may be done better.
But in my case there isn't better manual way, right? So you just add me one more trouble to deal with and call this improvement. I'm not so surprised, this is how software is built today.

What NPM should do instead is to provide a dialog:

$ npm uninstall lodash.isarray
You're going to remove required dependency:
[r] - remove it (if it's required by other packages, it will be reinstalled in their own node_modules)
[f] - just remove (don't check if other packages require it)
[n] - cancel removal

This is helpful and doesn't leave anything in broken state in cases y and n.
Machine solves things for me, not vice versa.

@FractalizeR
Copy link

What NPM should do instead is to provide a dialog:

Yes, this is another option. But it is just the same thing - --force option just in the dialog interactive implementation. Or may be I didn't catch the difference?

But sometimes actions must be done from scripts without user interaction.

@ivan-kleshnin
Copy link

But sometimes actions must be done from scripts without user interaction.

Yes. Then just give us one flag -i to operate in interactive mode in the "smart" helpful way I described. Human memory is far from limitless. It's impossible to remember all flags when things are built with machine-first approach.

@FractalizeR
Copy link

Well, -i is also a good way :). Edited my post, btw. Incorrectly formatted quotation.

@parshap
Copy link

parshap commented Apr 30, 2015

I have a lodash.isarray dependency, install a lot of gulp packages, remove lodash.isarray dependency... now all those gulp packages are broken because isarray was deduped at the moment of their installation.

$ npm uninstall lodash.isarray should remove the dependency from the top-level package, but it should not have affect on other installed dependencies. If lodash.isarray was previously deduped because the top-level package shared the that module with a dependency, then npm uninstall should move it from the top-level node_modules/ to the nested dependency's node_modules/ directory.

(Just my take on how I would expect things to work.)

@FractalizeR
Copy link

When you do yum remove on some package, that will lead to uninstallation of other packages, you get a question (Y/N one) with the list of affected packages (they will also be uninstalled) unless you provide -y flag, that is treated like a 'yes' answer.

@ivan-kleshnin
Copy link

When you do yum remove on some package, that will lead to uninstallation of other packages, you get a question (Y/N one) with the list of affected packages (they will also be uninstalled) unless you provide -y flag, that is treated like a 'yes' answer.

Thank you for this example @FractalizeR. It's sad to see that yum also behaves very unfriendly.
I don't want to remove other packages just because I remove one of their deps.

The simplest correct solution is what @parshap says. Just reinstall this damned lodash.isarray without any dialog for packages that require it in their own node_modules. Install should utilize cache e.g. be cheap. And even if not – it's better to make a few network requests without my permission, than provide so awful user experience as it is now.

@pzuraq
Copy link

pzuraq commented May 4, 2015

How will deduplication handle different versions of the same dependency?

A → B@1.2
C → B@2.0

In this scenario, would we end up with a directory structure like the current node_modules in order to preserve the discovery algorithm of require?

node_modules
├── A
│   └── node_modules
│       └── B
└── C
    └── node_modules
        └── B

@othiym23
Copy link
Contributor

othiym23 commented May 4, 2015

In this scenario, would we end up with a directory structure like the current node_modules in order to preserve the discovery algorithm of require?

Yes. The install / deduplication algorithm will only flatten the dependency tree as much as it can without introducing semver conflicts or other forms of dependency hell. If there is a conflict, then each dependency will get its own version of the conflicting dependency.

@piranna
Copy link

piranna commented May 4, 2015

If there is a conflict, then each dependency will get its own version of the conflicting dependency.

Will there be a flag on package.json to notify we don't allow conflicts on a package to make sure that there's only one version of it and throw an error if not, similar to the sanity-check done by peerDependencies?

@othiym23
Copy link
Contributor

othiym23 commented May 4, 2015

Will there be a flag on package.json to notify we don't allow conflicts on a package to make sure that there's only one version of it and throw an error if not, similar to the sanity-check done by peerDependencies?

No. There is discussion of a new browserDependencies (name not final, almost certain to change, don't hold us to this, etc.) dependency type to to handle dependencies that are guaranteed to be flat and that will error on conflicts, which may or may not install to somewhere other than node_modules, but this is not generally desirable behavior for regular npm dependencies, configurably or otherwise. Regardless of its eventual semantics, it will not be part of the npm@3 release.

@piranna
Copy link

piranna commented May 4, 2015

My question was not regarded to browser, but instead about the fact that having duplicated libraries I've found several errors when doing instanceof because the same class was being fetch from two diferent copies of the same package and Node.js understood them as two diferent classes.

@othiym23
Copy link
Contributor

othiym23 commented May 4, 2015

That's not an error, and that's why instanceof checks are generally considered a bad idea in larger Node apps where a module may not have control over all its dependencies – this is far from the only situation in which Node (or JavaScript in general) will fail instanceof tests. Use duck-typing or branding (attaching a symbol containing its canonical type name as a known property with a special name or Symbol) instead, if you want to check the type of an object. This is not a problem npm@3 is intended to solve.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests