npm dedupe #2256

Closed
mikeal opened this Issue Mar 12, 2012 · 10 comments

Comments

Projects
None yet
5 participants
@mikeal
Contributor

mikeal commented Mar 12, 2012

Here's the problem.

I've got a dozen top level dependencies and those deps have many, many, dependencies of their own.

Many of those required dependencies are actually identical requirements as other dependencies.

I'm sure some people complain about wasting space or creating huge directory trees but I don't really care about any of that, my main issue is upgrades.

I fix a bug in request, request is depended on by half of these deps and it's just impossible to go through and update them all. I guess you could have a --deep flag on install that would go through and update them all but that seems a little hacky and it would be hard to provide good visibility in to what is getting upgraded.

So, I had this idea that might be able to fix all this. npm flatten would go through all your deps of deps and move them to the top level. It would provide a lot more visibility in to the real number of deps you are carrying and make updates easier.

If two packages had differing version requirements you could just install the latest as request and the older still required version at request@1.2.3.

Thoughts?

@isaacs

This comment has been minimized.

Show comment Hide comment
@isaacs

isaacs Mar 12, 2012

Member

The intent of npm update request would be to do this in a deep way. However, the update command is a bit flaky and overly magical.

It would be kind of nice if running npm install without any arguments also updated dependencies to the maximum allowed, but this would be new behavior, and potentially surprising.

I think a --deep flag would be a nice approach, but it does introduce some complexity of its own that would have to be worked through.

So, to explore flatten,

Re conflicting deps: Just leave them in-place, I'd say. Installing two versions of something in the same target folder with magic filenames is awful. Not going back to that place.

Then this becomes sort of a "pull left" system, where everything in the tree gets pulled up to the root if possible, and if not, it gets left behind in its previous location. So, if you had this kind of dep structure:

a
+-- b@1
|   +-- c@1
|   +-- d
|   |    `-- e
|   `-- f
+-- g
|   +-- c@2
|   `-- d
|       `-- e
`-- h

Then you'd end up with this after the transformation:

a
+-- b@1
|   `-- c@1
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

We could perhaps pull c@1 left, but leave c@2 in place to override it.

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

b@1 -> (c@1, d, f)
b@2 -> (c@2, g)

Once we do that, we're left with:

a
+-- b@2
|   `-- c@2
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

Noting the repetition, we flatten again, and get:

a
+-- b@2
+-- g
+-- d
+-- e
+-- f
+-- h
`-- c@2

Now there's litter! No one depends on package f, but it's been placed in the root's package.json, so it's not clear that its extraneous.

A possible solution here would be to put a marker in the root package.json that packages c, d, e, f, h, and g are only required by virtue of flattening, and not directly depended upon. So, if at any point no dependency depends on them, then they'll be considered extraneous.

Member

isaacs commented Mar 12, 2012

The intent of npm update request would be to do this in a deep way. However, the update command is a bit flaky and overly magical.

It would be kind of nice if running npm install without any arguments also updated dependencies to the maximum allowed, but this would be new behavior, and potentially surprising.

I think a --deep flag would be a nice approach, but it does introduce some complexity of its own that would have to be worked through.

So, to explore flatten,

Re conflicting deps: Just leave them in-place, I'd say. Installing two versions of something in the same target folder with magic filenames is awful. Not going back to that place.

Then this becomes sort of a "pull left" system, where everything in the tree gets pulled up to the root if possible, and if not, it gets left behind in its previous location. So, if you had this kind of dep structure:

a
+-- b@1
|   +-- c@1
|   +-- d
|   |    `-- e
|   `-- f
+-- g
|   +-- c@2
|   `-- d
|       `-- e
`-- h

Then you'd end up with this after the transformation:

a
+-- b@1
|   `-- c@1
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

We could perhaps pull c@1 left, but leave c@2 in place to override it.

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

b@1 -> (c@1, d, f)
b@2 -> (c@2, g)

Once we do that, we're left with:

a
+-- b@2
|   `-- c@2
+-- g
|   `-- c@2
+-- d
+-- e
+-- f
`-- h

Noting the repetition, we flatten again, and get:

a
+-- b@2
+-- g
+-- d
+-- e
+-- f
+-- h
`-- c@2

Now there's litter! No one depends on package f, but it's been placed in the root's package.json, so it's not clear that its extraneous.

A possible solution here would be to put a marker in the root package.json that packages c, d, e, f, h, and g are only required by virtue of flattening, and not directly depended upon. So, if at any point no dependency depends on them, then they'll be considered extraneous.

@mikeal

This comment has been minimized.

Show comment Hide comment
@mikeal

mikeal Mar 12, 2012

Contributor

it might be a little much, but you could make extraneous consider the root package.json. this way, if a package isn't in the root package.json and none of the other deps it would be marked as extraneous.

Contributor

mikeal commented Mar 12, 2012

it might be a little much, but you could make extraneous consider the root package.json. this way, if a package isn't in the root package.json and none of the other deps it would be marked as extraneous.

@mikeal

This comment has been minimized.

Show comment Hide comment
@mikeal

mikeal May 11, 2012

Contributor

NEED THIS SO BAD!

Contributor

mikeal commented May 11, 2012

NEED THIS SO BAD!

@aseemk

This comment has been minimized.

Show comment Hide comment
@aseemk

aseemk May 12, 2012

There's definitely merit in this, but I just wanted to point out though that it does break the abstraction that a module can rely on its dependencies being inside its own node_modules folder.

This doesn't matter for the regular case of require()-ing a dependency, but it does matter in two other cases:

  • A dependency requiring a sibling dependency, e.g. for an enhancement/integration support; and
  • Using a binary in a lifecycle event script, e.g. in postinstall.
  • Edit: one client changing the default options of a dependency, which leaks to other clients.

Maybe flatten could still be achieved while maintaining symlinks within node_modules and node_modules/.bin to the actual dependencies higher up.

Edit: issue #2443.

aseemk commented May 12, 2012

There's definitely merit in this, but I just wanted to point out though that it does break the abstraction that a module can rely on its dependencies being inside its own node_modules folder.

This doesn't matter for the regular case of require()-ing a dependency, but it does matter in two other cases:

  • A dependency requiring a sibling dependency, e.g. for an enhancement/integration support; and
  • Using a binary in a lifecycle event script, e.g. in postinstall.
  • Edit: one client changing the default options of a dependency, which leaks to other clients.

Maybe flatten could still be achieved while maintaining symlinks within node_modules and node_modules/.bin to the actual dependencies higher up.

Edit: issue #2443.

@tblobaum

This comment has been minimized.

Show comment Hide comment
@tblobaum

tblobaum Aug 9, 2012

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

@isaacs Is it possible to mark those deps as children rather than extraneous?

tblobaum commented Aug 9, 2012

The question is, how to do you prevent this from resulting in a bunch of "extraneous" deps? Maybe it should add them to the root's package.json file? However, this could lead to an odd situation when you update a dep, and it then no longer depends on the same stuff. In the situation above, imagine that we update b@1 to b@2. The deps look like this:

@isaacs Is it possible to mark those deps as children rather than extraneous?

@isaacs

This comment has been minimized.

Show comment Hide comment
@isaacs

isaacs Aug 19, 2012

Member

Actually, removing the extraneous marks turns out to be pretty easy: npm/read-installed@5df6bc4

Member

isaacs commented Aug 19, 2012

Actually, removing the extraneous marks turns out to be pretty easy: npm/read-installed@5df6bc4

@isaacs

This comment has been minimized.

Show comment Hide comment
@isaacs

isaacs Aug 19, 2012

Member

@mikeal Would it be enough to simply de-duplicate, rather than actually flatten it all to a single level?

Consider this:

a
+-- b
|   +-- c
|   `-- d
`-- e
    +-- c
    `-- f

In this case, you'd end up with a tree like:


a
+-- b
|   `-- d
+-- e
|   `-- f
`-- c

So, anything that isn't duplicated won't be moved, but things that are duplicated are moved up to the highest level necessary to de-dupe, or the operation fails if that's not possible (due to conflicting dependencies).

Another example:

a
+-- b
|   +-- c
|   |   `-- d
|   `-- e
|       `-- d
`-- f
    `-- d

becomes:

a
+-- b
|   +-- c
|   `-- e
+-- f
`-- d
Member

isaacs commented Aug 19, 2012

@mikeal Would it be enough to simply de-duplicate, rather than actually flatten it all to a single level?

Consider this:

a
+-- b
|   +-- c
|   `-- d
`-- e
    +-- c
    `-- f

In this case, you'd end up with a tree like:


a
+-- b
|   `-- d
+-- e
|   `-- f
`-- c

So, anything that isn't duplicated won't be moved, but things that are duplicated are moved up to the highest level necessary to de-dupe, or the operation fails if that's not possible (due to conflicting dependencies).

Another example:

a
+-- b
|   +-- c
|   |   `-- d
|   `-- e
|       `-- d
`-- f
    `-- d

becomes:

a
+-- b
|   +-- c
|   `-- e
+-- f
`-- d
@mikeal

This comment has been minimized.

Show comment Hide comment
@mikeal

mikeal Aug 19, 2012

Contributor

not for me. i have issues where two packages i require both have the same dep. this wouldn't push that dep up.

On Aug 18, 2012, at August 18, 20125:57 PM, "Isaac Z. Schlueter" notifications@github.com wrote:

@mikeal Would it be enough to simply de-duplicate, rather than actually flatten it all to a single level?

Consider this:

a
+-- b
| +-- c
| -- d-- e
+-- c
`-- f
In this case, you'd end up with a tree like:

a
+-- b
| -- d +-- e |-- f
`-- c
So, anything that isn't duplicated won't be moved, but things that are duplicated are moved up to the highest level necessary to de-dupe, or the operation fails if that's not possible (due to conflicting dependencies).

Another example:

a
+-- b
| +-- c
| | -- d |-- e
| -- d-- f
`-- d
becomes:

a
+-- b
| +-- c
| -- e +-- f-- d

Reply to this email directly or view it on GitHub.

Contributor

mikeal commented Aug 19, 2012

not for me. i have issues where two packages i require both have the same dep. this wouldn't push that dep up.

On Aug 18, 2012, at August 18, 20125:57 PM, "Isaac Z. Schlueter" notifications@github.com wrote:

@mikeal Would it be enough to simply de-duplicate, rather than actually flatten it all to a single level?

Consider this:

a
+-- b
| +-- c
| -- d-- e
+-- c
`-- f
In this case, you'd end up with a tree like:

a
+-- b
| -- d +-- e |-- f
`-- c
So, anything that isn't duplicated won't be moved, but things that are duplicated are moved up to the highest level necessary to de-dupe, or the operation fails if that's not possible (due to conflicting dependencies).

Another example:

a
+-- b
| +-- c
| | -- d |-- e
| -- d-- f
`-- d
becomes:

a
+-- b
| +-- c
| -- e +-- f-- d

Reply to this email directly or view it on GitHub.

@juuxstar

This comment has been minimized.

Show comment Hide comment
@juuxstar

juuxstar May 30, 2013

Has there been any progress on this by chance?

I just ran into a problem that directly relates to this: when packages use a singleton pattern if there's two different instances loaded then any changes to one aren't seen in the other. This could perhaps be seen as desired behaviour but in my case, it was not and it took me several days (I kid you not, days) to figure out what the problem was.

Specifically, the package in question (passport) uses a singleton pattern which stores configuration information. My top-level server code requires it and configures it. (providing callbacks, etc..) But another package, passport-socketio (which also uses the passport package to integrate with socketio) was getting it's own copy loaded from it's local node_modules folder. As a result, none of my configuration was making it to this instance of the package which generated very strange errors.

In the end, the solution was simple: delete the local node_modules copy of passport from the passport-socketio package so that it would get the one top-level copy. But this is not at all intuitive.

It would be very useful to be able to visualize in some way (perhaps kinda like npm list) package dependencies that are redundant in a code base (ie. which dependent packages appear further up and thus could be "flattened") Maybe even have the ability to have packages be marked (in package.json?) as having state (for example, singletons) vs pure library packages where it's okay if multiple copies are accidentally loaded.

Has there been any progress on this by chance?

I just ran into a problem that directly relates to this: when packages use a singleton pattern if there's two different instances loaded then any changes to one aren't seen in the other. This could perhaps be seen as desired behaviour but in my case, it was not and it took me several days (I kid you not, days) to figure out what the problem was.

Specifically, the package in question (passport) uses a singleton pattern which stores configuration information. My top-level server code requires it and configures it. (providing callbacks, etc..) But another package, passport-socketio (which also uses the passport package to integrate with socketio) was getting it's own copy loaded from it's local node_modules folder. As a result, none of my configuration was making it to this instance of the package which generated very strange errors.

In the end, the solution was simple: delete the local node_modules copy of passport from the passport-socketio package so that it would get the one top-level copy. But this is not at all intuitive.

It would be very useful to be able to visualize in some way (perhaps kinda like npm list) package dependencies that are redundant in a code base (ie. which dependent packages appear further up and thus could be "flattened") Maybe even have the ability to have packages be marked (in package.json?) as having state (for example, singletons) vs pure library packages where it's okay if multiple copies are accidentally loaded.

@mikeal

This comment has been minimized.

Show comment Hide comment
@mikeal

mikeal May 30, 2013

Contributor

it's called npm dedupe.

Contributor

mikeal commented May 30, 2013

it's called npm dedupe.

@isaacs isaacs closed this May 30, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment