Enable automated running of dedupe #4761

bclinkinbeard · 2014-02-23T03:13:37Z

The nested versus flat dependency tree argument is literally the only thing people regularly bring up as a reason to use Bower over npm. The only reply ever given is to use dedupe, but there is no way to automate it. If npm is going to significantly improve adoption for client side use, I think this needs to change. You can't make the argument that people should use a different tool than they currently are while rejecting their top complaint about the tool being offered.

Currently, the primary hurdle to automating dedupe is that running a command like npm install --save foo does not trigger the postinstall hook. (I personally think that should be considered a bug, but that is neither here nor there.) There also seem to be issues with the npm update hook being inconsistent but I've yet to document their exact nature.

What it comes down to is there needs to be a way to run a command whenever the contents of node_modules changes for any reason (as a result of an npm command). This could be used to run dedupe for people worried about such things, or even something like lockdown.

I want npm to "win" as badly as anyone, but refusing to address this issue just seems like a non-starter.

/cc @isaacs @domenic @timoxley @kristoferjoseph @techwraith @robdodson @IgorMinar

The text was updated successfully, but these errors were encountered:

domenic · 2014-02-24T01:54:42Z

This makes sense to me, although I would approach it more from first principles. You are going from the particular to the general in a direction I'm not that excited about. (After all, people can use their favorite watch utility: mywatch node_modules/**/* "$my_command_here" or whatever.)

What I would be more interested in is making sure npm always minimizes duplication. That is, npm install first downloads and analyzes the entire deep dependency graph, then figures out the minimal number of package installs that satisfies that. That would mean making dedupe unnecessary in most cases.

@isaacs has mentioned that there was a conscious choice in the past not to do this, presumably because the two-stage download is inefficient in some cases. I would ask if he wants to reconsider.

Otherwise, I'd suggest making this a config you could set on all package-modifying operations. npm install --dedupe or npm update --dedupe or similar would run npm dedupe afterward, at first approximation.

An alternative, more ambitious idea:

Right now we kind of have the worst of both worlds: it sometimes dedupes automatically, if it can, and it's convenient. But usually not. This inconsistency means you get neither the benefits of complete-deduping, nor the benefits of complete isolation. (I know @defunctzombie is a fan of the latter, and has in fact written npm-install to carry that out.)

It might be nice to make npm do only one of these two modes, I guess controlled by a flag. Which the default would be is unclear to me; I'd prefer dedupe, but isolated would be closer to the current default.

bclinkinbeard · 2014-02-24T02:50:15Z

I like the config flag idea, though that precludes automating the use of something like lockdown. A hook is more general purpose and opens the door to more workflow customizations we may not have considered.

I'm also hesitant to endorse any of the more ambitious suggestions, simply because they seem likely to delay any resolution by quite a bit of time.

domenic · 2014-02-24T02:53:08Z

I really think the general solution for the general problem you describe is to just use a watch utility.

bclinkinbeard · 2014-02-24T03:01:17Z

I think a watch utility on a project of any significant size is going to run into "too many open files" issues and be a regular source of headaches. It's also a barrier to entry of being able to say "just use npm... oh, and set up a watch utility, configure it to do x and y in response to these events, etc.". It will have the same issues hook scripts do now, which is not knowing when an operation has really completed. A watch utility receiving 73 update events and 284 add events is far from useful.

defunctzombie · 2014-02-24T03:06:47Z

Localized dependencies make module maintenance and dependency management wonderful and I never understood why npm dedupes anything at all during install time. This seems like a premature optimization at best and leads to broken, non-deterministic installs at worst (yes, I have faced these issues myself). IMO npm should not do any sort of deduplication at any time and should always install the full tree (note, this does not mean that modules have to be downloaded over and over, we already have caching to solve this). Deduplication should be opt-in and not even something npm needs to do, deferring to external tools.

To answer the comment about bower, I again don't see the issue. People are complaining that the same module version is installed several times? This is a job for the package tool (browserify already handles de-duping the same code which is a far better approach than some sort of arbitrary version de-duplication). The safety net with isolated modules and dependencies allows us to move away from the world of (install this giant framework on your page to use my module) and helps keep modules small, maintainable, and tested. I will also add that I think the notion of some duplicate code in client js is really a small nitpick trying to spread into FUD as to why npm can't work for client side modules. The reality is that it works fine and the few instances where you run into some duplicate code, it doesn't matter! If it does start to matter, you can take a look at your deps and see where you might have issues, but avoiding a great module and dependency system just to save 1k is not gonna make your break your hobby project or your production website.

bclinkinbeard · 2014-02-24T03:28:22Z

For the record, I completely agree with you that the concerns over duplication are overblown. The example always cited is multiple (conflicting?) versions of jQuery, leading to a bundle larger than you ideally want to serve, but I think the chances of that happening, especially without someone noticing, are very slim.

That being said, it is something that comes up a lot and simply providing a hook that can be used seems like a decent way to appease people who feel like it is a reason to avoid npm.

ryanflorence · 2014-02-24T05:11:08Z

As npm gains adoption for client-side packaging, modules will get smaller and iterate faster like those for the server, then we will start to see surprising amounts of JS being delivered to the browser just like the seemingly endless scrolling of npm install.
the jQuery case is real, people who are using jQuery from npm are just more clueful today. However, in the future, if it becomes commonplace, some modules depending on jQuery will unfortunately be tied to a specific version w/o reason, and novice devs won't even know its a problem, and probably won't realize how big their app.js's file is.

I've been screwing around with dedupe on some random projects of mine and have noticed it drops the size of node_modules nearly 75%.

However, it is a compelling case that browserify handles this. Perhaps it is a job for the bundling script, not the package installer.

bclinkinbeard · 2014-02-24T23:24:13Z

LOL, Bower appears to have shipped this feature today. bower/bower#718

timoxley · 2014-02-25T01:41:29Z

@bclinkinbeard I had a quick look into what would be involved in implementing the additional hooks, seems easy enough, maybe start with this, add tests, make names better, add remove hooks, and send a PR – Might be a waste of time but it's easier to discuss concrete implementations rather than hypotheticals.

isaacs · 2014-02-25T02:14:36Z

FWIW: I am ok with deduping in the process of installation. It's basically a rewrite of lib/install.js, which is a sensitive thing to go around rewriting, but it's a rewrite that can certainly be valuable.

It'd be best to approach as a multi-stage process. The first step builds the tree, doing lots of http/git stuff, then it prunes and dedupes, then fetch all the things and lay them down on the file system, then walk through the tree three times to run preinstall/install/postinstall scripts.

The other benefit is that we could conceivably do a bunch of the fs stuff synchronously, and have less locking and other monkey business.

isaacs · 2014-02-25T02:15:02Z

Also, then we could have progress bars, which is pretty much even theoretically impossible with the current approach.

kristoferjoseph · 2014-02-25T02:45:24Z

For me dedupe on install isn't as big of a deal as dedupe on publish. I would rather rest assured that this will be done at publish time.

kristoferjoseph · 2014-02-25T02:46:43Z

Running dedupe and lock at publish time would be ideal for all the use cases I have.
This could also open an opportunity to prompt a module developer to make educated decisions on what they really want bundled with their published library.

domenic · 2014-02-25T02:49:34Z

I don't understand dedupe on publish; published packages don't contain dependencies, so what would that accomplish?

timoxley · 2014-02-25T02:51:38Z

@kristoferjoseph

Running dedupe and lock at publish time would be ideal for all the use cases I have.

You can already do anything at publish time via the prepublish script: https://www.npmjs.org/doc/misc/npm-scripts.html

This could also open an opportunity to prompt a module developer to make educated decisions on what they really want bundled with their published library.

https://www.npmjs.org/doc/misc/npm-developers.html#Keeping-files-out-of-your-package

bclinkinbeard · 2014-02-25T03:11:09Z

@isaacs Does that mean you would prefer not to create a new hook and/or change the conditions under which postinstall runs? Or does integrating dedupe not necessarily preclude those as well? I don't know that I have a preference either way at this point, just curious.

kristoferjoseph · 2014-02-25T03:18:53Z

That will teach me to try and comment on an issue from the train. I conflated the need to educate package publishers with issues bower proponents made for needing to not include multiple versions of a library. I.e. Compiling two versions of jQuery into your bundle etc.
Deduplicating modules by default at install time would be useful for optimizing files downloaded to disk. +1000

stuartpb · 2014-09-16T21:23:43Z

I'd complain that tree-flattening/deduping will lead to problems if I require ("module/node_modules/submodule"), but I guess that would be a natural consequence I'd have to face for not just adding "submodule" to my package's dependencies in the first place. (Might be worth checking how many modules in the npm registry do this atm.)

If a tree-flattening dedupe were made part of installation, I can still kinda foresee issues with people writing code that requires modules that they didn't specify as dependencies, but were coincidentally installed to their module's root as a shared dependency of all of the dependencies they did specify.

(ie. a lists b 0.4.x and c 0.5.x as dependencies, b 0.4.1 and c 0.5.3 both list d 0.3.x as a dependency, dedupe installs d 0.3.7 alongside b and c in node_modules, and then a has require("d") in its code).

Then, later, a conflict introduced by dependency changes in an explicit dependency's patch bump makes that shared junction disappear (eg. b 0.4.2 updates to d 0.6.x, so then d 0.6.8 is installed under node_modules/b and d 0.3.7 is installed under node_modules/c, and node_modules/d is no longer installed)...

This could warrant adding a warning to the core Node require function, something like WARNING: requiring module from node_modules not specified as a dependency in package.json, meaning Node would now have two points where it knows about package.json.

edef1c · 2014-09-16T22:42:24Z

@stuartpb If you require from a node_modules that isn't yours, that's user error. We're not going to support that (in fact, for the Windows tree flattening thing, we're looking for ways to actively break it so people don't come to depend on it)

The issue with tree-flattening is more subtle, but I'd say that's an issue of education mainly.
Don't ls node_modules, npm ls.

Meddling with node's module loader comes with the usual caveat of it being frozen, though node core folk might be more open to adding a warning like that.

stuartpb · 2014-09-17T03:32:26Z

Meddling with node's module loader comes with the usual caveat of it being frozen, though node core folk might be more open to adding a warning like that.

If not, another alternative would be to have a "nanny" bot on the npm registry that scans require calls in all published modules (like browserify) and emails the author with a warning when it finds something wrong like this. (Granted, that means it wouldn't catch problems in unpublished projects, like apps, but it'd still prevent problems like this from finding their way into places where they could spread.)

iarna · 2014-12-12T11:29:31Z

This is going to be implemented under #6912. As such, I'm going to close this ticket and any further discussion should occur over there.

domenic added the i-wanna-be-the-very-best label Feb 24, 2014

bclinkinbeard mentioned this issue Apr 16, 2014

npm link and multiple levels of peerDependencies = psychedelic mess #5080

Closed

isaacs added the multi-stage-install label Aug 11, 2014

iarna mentioned this issue Sep 3, 2014

Don't install duplicated packages #6108

Closed

othiym23 added this to the multi-stage install milestone Sep 16, 2014

jokeyrhyme mentioned this issue Sep 16, 2014

merge bower into npm, why aren't we using npm? bower/bower#1520

Closed

othiym23 removed the multi-stage-install label Sep 16, 2014

othiym23 added the feature-request label Sep 20, 2014

iarna mentioned this issue Dec 12, 2014

npm install should deduplicate its tree as part of the install process #6912

Closed

1 task

iarna closed this as completed Dec 12, 2014

npm locked and limited conversation to collaborators Jun 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable automated running of dedupe #4761

Enable automated running of dedupe #4761

bclinkinbeard commented Feb 23, 2014

domenic commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

domenic commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

defunctzombie commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

ryanflorence commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

timoxley commented Feb 25, 2014

isaacs commented Feb 25, 2014

isaacs commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

domenic commented Feb 25, 2014

timoxley commented Feb 25, 2014

bclinkinbeard commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

stuartpb commented Sep 16, 2014

edef1c commented Sep 16, 2014

stuartpb commented Sep 17, 2014

iarna commented Dec 12, 2014

Enable automated running of dedupe #4761

Enable automated running of dedupe #4761

Comments

bclinkinbeard commented Feb 23, 2014

domenic commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

domenic commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

defunctzombie commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

ryanflorence commented Feb 24, 2014

bclinkinbeard commented Feb 24, 2014

timoxley commented Feb 25, 2014

isaacs commented Feb 25, 2014

isaacs commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

domenic commented Feb 25, 2014

timoxley commented Feb 25, 2014

bclinkinbeard commented Feb 25, 2014

kristoferjoseph commented Feb 25, 2014

stuartpb commented Sep 16, 2014

edef1c commented Sep 16, 2014

stuartpb commented Sep 17, 2014

iarna commented Dec 12, 2014