use async.mapSeries for --slow installs #4553

deoxxa · 2014-01-27T07:04:15Z

This is clearly a ridiculous pull request, and I don't expect it to be accepted. What I would like, however, is to call attention to the fact that during an install operation on a large project, the cpu and memory consumption of npm borders on obscene. I have several projects now that have too many dependencies (about 30 top-level items) to install properly in their production environments, because the environments are allocated according to the resource consumption of the projects themselves and not the installation process.

Rather than installing everything on another system, then moving the resulting files over to the destination system (which is basically what I'm doing now), I'd like it if I could tell npm to just cool it a little bit and maybe not go completely hambone trying to install everything it possibly can at once. This is a proof-of-concept of that idea, in that with this patch applied, and using the --slow install option introduced therein, I am able to perform a successful installation of the aforementioned projects in their production environments.

With this patch applied, the install time for browserify (chosen as a commonly-used project with a lot of dependencies) jumps from about a minute to over 15 minutes, which is obviously unacceptable for most use cases. For my case, it's perfectly fine, because there is a delayed, staggered install process anyway across multiple nodes, and this is just some additional latency in that process getting started.

timoxley · 2014-01-27T07:08:34Z

maybe a flag to set a concurrency limit would be sane, caolan/async has this baked in as mapLimit

domenic · 2014-01-27T07:09:45Z

maybe a flag to set a concurrency limit would be sane

I was going to say, I feel like I've seen such a flag around the npm docs somewhere...

deoxxa · 2014-01-27T07:11:06Z

I had a look at that, but because of the way that the asyncMap calls are nested in install.js, each trip down the dependency tree potentially involves expanding the active operations to 2^depth-1 (as a worst case).

EDIT: actually, that's incorrect. It'd be depth^n I think... Either way, not constant. There'd need to be some kind of a queue with a limited number of active jobs to make it a constant value.

timoxley · 2014-01-27T07:16:32Z

overall I think this is a valid concern for low power devices, e.g. there were issues around this when deploying on ninjablocks.

isaacs · 2014-02-17T01:25:04Z

This is a completely valid concern, but accepting this pull request is rather problematic.

If we decide to do this, I'd like to do it in stages.

Replace all instances of slide-flow-control with async instead. That is, it should be a full replacement, and then we can just never speak of slide again. (If async can't deliver what's needed, then find/write something that can.)
Add a --concurrency config, to set the max concurrency. Default to a relatively high number like 256 or something, since most of the actual "work" is usually waiting on I/O.
Enforce that concurrency level ruthlessly, everywhere. Maybe even use a single global queue, such that we never go above a specific number of concurrent jobs.

Then you can do something like npm config set concurrency 4 on your low-power device, and installs will take longer, but not peg resources so badly.

deoxxa · 2014-02-17T01:30:16Z

Well as I mentioned in the opening sentence, I don't expect this change to actually be accepted. It's more the concept of the problem that I wanted to highlight; I'm not even sure my proposed solution (implementation aside) is the right one.

EDIT: the reason it's a pull request and not an issue is that I wanted to "put my money where my mouth is" as it were.

othiym23 · 2014-10-06T21:56:46Z

🎉 for "go completely hambone". This is a more fully-fleshed version of #6325, and as such I'm going to close that issue in favor of this one.

I think @isaacs has a good summary of an approach we could follow, and I agree that npm's greed for resources makes it scary as a deployment tool, so we do need to do something about this. I personally would prefer to use promises over async (for reasons unrelated to this patch, but entirely related to how I've been spending almost all of my time over the last month – diagnosing and addressing race conditions in caching and installing), but I'm not dead set on that. I'd like to see something like this land soon, because getting rid of all the extraneous locking code has made npm even worse about sucking up all the RAM and bandwidth anywhere near it (although it's also much faster, so there's that).

See also #6411, which is another PR biting off another piece of this. It would be great if we could address this problem in a unified way.

novocaine · 2015-01-27T14:41:15Z

+1; my particular problem is npm smashes my corp proxy and causes it to connection reset

Malet · 2015-01-28T09:56:30Z

+1

othiym23 · 2015-01-31T22:35:29Z

@iarna is this something that we could incorporate into the installer? Does it belong in the caching layer? Maybe in npm-registry-client? We've seen issues similar to what @novocaine describes with customers using private registries in locked-down corporate networks; having a way to exert control over how npm uses the network would be useful.

boutell · 2015-02-10T15:42:52Z

What if, rather than attempting to apply something like async.mapSeries or async.mapLimit when making requests, npm made its calls to fetch remote resources through a proxy function that provided a pool of reusable instances of the direct fetch function, and just queued you until one of those was free?

This isn't much different from the classic "six simultaneous connections, then you queue" request behavior really. You don't have to think about that, it just works (or doesn't work, if it's not what you wanted).

It would seem a lot easier than trying to limit the concurrency of node's algorithms throughout.

othiym23 · 2015-02-11T02:23:16Z

@boutell The way the caching and fetching layer is written prevents this from being a straightforward thing to implement. I'm also not sure it directly addresses the OP's problem without some form of serialization higher up the stack. npm is very Nodelike in its insistence that it DO EVERYTHING RIGHT NOW, and changing that requires some fairly careful redesign in the absence of hacks.

othiym23 · 2015-02-28T03:14:02Z

See #6411 for my thoughts on this, which largely echo @isaacs's above. I would love to see npm become more intelligent about how it uses resources, but I think to do that properly requires it to be baked into npm at the architectural level, instead of done around the edges. This doesn't have to be part of some big huge reimplementation of the core – while I was moving a lot of stuff around when I redid npm's locking code, the actual changes to minimize the locking and inflighting and put it where it was actually necessary were pretty narrow and tailored. This feels like a sketch of a solution, not the solution itself (which I know what you were proposing, @doxxa, so all credit to you for framing it this way).

I guess what I'd like to see is either something like what @isaacs proposes, or maybe a short document describing what resources npm uses, and how to properly limit them to make npm a better intranet / metered internet / CPU bus citizen, before actually putting things in code. Thanks for starting this discussion, and thanks for your time and work, @deoxxa.

lxe · 2015-06-30T05:22:48Z

What's wrong with adding a concurrency parameter/flag/config? It's not npm's problem, but some SSH hosts are configured to block concurrent SSH connections.

This feature is really hard to handle through GIT_SSH config, and would be super-swell if something along the lines of npm_config_ssh_concurrecy flag would be supported.

use async.mapSeries for --slow installs

9e3004e

deoxxa mentioned this pull request Jan 27, 2014

'killed' error when installing connect package #4552

Closed

This was referenced Oct 6, 2014

Maximum number of concurrent connections #3911

Closed

limit number of concurrent fetches during caching / installing #6325

Closed

othiym23 added enterprise feature-request patch-welcome labels Oct 6, 2014

othiym23 added this to the 2.2.0 milestone Oct 6, 2014

othiym23 added the next-minor label Oct 7, 2014

othiym23 removed this from the 2.2.0 milestone Oct 7, 2014

othiym23 force-pushed the master branch from a97a65b to 73ee2aa Compare October 31, 2014 09:27

othiym23 mentioned this pull request Feb 28, 2015

Simple fix to address bug #6325. Added a limit to number of concurrent git commands #6411

Closed

othiym23 closed this Feb 28, 2015

paschelino mentioned this pull request Feb 15, 2016

throttle npm http requests #7200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use async.mapSeries for --slow installs #4553

use async.mapSeries for --slow installs #4553

deoxxa commented Jan 27, 2014

timoxley commented Jan 27, 2014

domenic commented Jan 27, 2014

deoxxa commented Jan 27, 2014

timoxley commented Jan 27, 2014

isaacs commented Feb 17, 2014

deoxxa commented Feb 17, 2014

othiym23 commented Oct 6, 2014

novocaine commented Jan 27, 2015

Malet commented Jan 28, 2015

othiym23 commented Jan 31, 2015

boutell commented Feb 10, 2015

othiym23 commented Feb 11, 2015

othiym23 commented Feb 28, 2015

lxe commented Jun 30, 2015

use async.mapSeries for --slow installs #4553

use async.mapSeries for --slow installs #4553

Conversation

deoxxa commented Jan 27, 2014

timoxley commented Jan 27, 2014

domenic commented Jan 27, 2014

deoxxa commented Jan 27, 2014

timoxley commented Jan 27, 2014

isaacs commented Feb 17, 2014

deoxxa commented Feb 17, 2014

othiym23 commented Oct 6, 2014

novocaine commented Jan 27, 2015

Malet commented Jan 28, 2015

othiym23 commented Jan 31, 2015

boutell commented Feb 10, 2015

othiym23 commented Feb 11, 2015

othiym23 commented Feb 28, 2015

lxe commented Jun 30, 2015