Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

npm dedupe #2256

Closed
mikeal opened this Issue · 10 comments

5 participants

@mikeal

Here's the problem.

I've got a dozen top level dependencies and those deps have many, many, dependencies of their own.

Many of those required dependencies are actually identical requirements as other dependencies.

I'm sure some people complain about wasting space or creating huge directory trees but I don't really care about any of that, my main issue is upgrades.

I fix a bug in request, request is depended on by half of these deps and it's just impossible to go through and update them all. I guess you could have a --deep flag on install that would go through and update them all but that seems a little hacky and it would be hard to provide good visibility in to what is getting upgraded.

So, I had this idea that might be able to fix all this. npm flatten would go through all your deps of deps and move them to the top level. It would provide a lot more visibility in to the real number of deps you are carrying and make updates easier.

If two packages had differing version requirements you could just install the latest as request and the older still required version at request@1.2.3.

Thoughts?

@isaacs
Owner

The intent of npm update request would be to do this in a deep way. However, the update command is a bit flaky and overly magical.

It would be kind of nice if running npm install without any arguments also updated dependencies to the maximum allowed, but this would be new behavior, and potentially surprising.

I think a --deep flag would be a nice approach, but it does introduce some complexity of its own that would have to be worked through.

So, to explore flatten,

Re conflicting deps: Just leave them in-place, I'd say. Installing two versions of something in the same target folder with magic filenames is awful. Not going back to that place.

Then this becomes sort of a "pull left" system, where everything in the tree gets pulled up to the root if possible, and if not, it gets left behind in its previous location. So, if you had this kind of dep structure:

a
+-- b@1
|   +-- c@1
|   +-- d
|   |    `-- e
|   `-- f
+-- g
|   +-- c@2
|   `-- d
|       `-- e
`-- h

Then you'd end up with this after the transformation:

a
+-- b@1
|   `-- c@1
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

We could perhaps pull c@1 left, but leave c@2 in place to override it.

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

b@1 -> (c@1, d, f)
b@2 -> (c@2, g)

Once we do that, we're left with:

a
+-- b@2
|   `-- c@2
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

Noting the repetition, we flatten again, and get:

a
+-- b@2
+-- g
+-- d
+-- e
+-- f
+-- h
`-- c@2

Now there's litter! No one depends on package f, but it's been placed in the root's package.json, so it's not clear that its extraneous.

A possible solution here would be to put a marker in the root package.json that packages c, d, e, f, h, and g are only required by virtue of flattening, and not directly depended upon. So, if at any point no dependency depends on them, then they'll be considered extraneous.

@mikeal

it might be a little much, but you could make extraneous consider the root package.json. this way, if a package isn't in the root package.json and none of the other deps it would be marked as extraneous.

@mikeal

NEED THIS SO BAD!

@aseemk

There's definitely merit in this, but I just wanted to point out though that it does break the abstraction that a module can rely on its dependencies being inside its own node_modules folder.

This doesn't matter for the regular case of require()-ing a dependency, but it does matter in two other cases:

  • A dependency requiring a sibling dependency, e.g. for an enhancement/integration support; and
  • Using a binary in a lifecycle event script, e.g. in postinstall.
  • Edit: one client changing the default options of a dependency, which leaks to other clients.
Maybe flatten could still be achieved while maintaining symlinks within `node_modules` and `node_modules/.bin` to the actual dependencies higher up.

Edit: issue #2443.

@tblobaum

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

@isaacs Is it possible to mark those deps as children rather than extraneous?

@isaacs
Owner

Actually, removing the extraneous marks turns out to be pretty easy: npm/read-installed@5df6bc4

@isaacs
Owner

@mikeal Would it be enough to simply de-duplicate, rather than actually flatten it all to a single level?

Consider this:

a
+-- b
|   +-- c
|   `-- d
`-- e
    +-- c
    `-- f

In this case, you'd end up with a tree like:


a
+-- b
|   `-- d
+-- e
|   `-- f
`-- c

So, anything that isn't duplicated won't be moved, but things that are duplicated are moved up to the highest level necessary to de-dupe, or the operation fails if that's not possible (due to conflicting dependencies).

Another example:

a
+-- b
|   +-- c
|   |   `-- d
|   `-- e
|       `-- d
`-- f
    `-- d

becomes:

a
+-- b
|   +-- c
|   `-- e
+-- f
`-- d
@mikeal
@juuxstar

Has there been any progress on this by chance?

I just ran into a problem that directly relates to this: when packages use a singleton pattern if there's two different instances loaded then any changes to one aren't seen in the other. This could perhaps be seen as desired behaviour but in my case, it was not and it took me several days (I kid you not, days) to figure out what the problem was.

Specifically, the package in question (passport) uses a singleton pattern which stores configuration information. My top-level server code requires it and configures it. (providing callbacks, etc..) But another package, passport-socketio (which also uses the passport package to integrate with socketio) was getting it's own copy loaded from it's local node_modules folder. As a result, none of my configuration was making it to this instance of the package which generated very strange errors.

In the end, the solution was simple: delete the local node_modules copy of passport from the passport-socketio package so that it would get the one top-level copy. But this is not at all intuitive.

It would be very useful to be able to visualize in some way (perhaps kinda like npm list) package dependencies that are redundant in a code base (ie. which dependent packages appear further up and thus could be "flattened") Maybe even have the ability to have packages be marked (in package.json?) as having state (for example, singletons) vs pure library packages where it's okay if multiple copies are accidentally loaded.

@mikeal

it's called npm dedupe.

@isaacs isaacs closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.