-
Notifications
You must be signed in to change notification settings - Fork 3k
investigating npm 3 perf #10380
Comments
While the above network details are from forced HTTP, I was able to get one chart of insight from npm3 on HTTPS (the default).
|
This is some excellent analysis! It looks very similar to profiling results i've seen from a previous project. I'd built a crawler in Node which had to crawl a deeply nested tree structure somewhat similarly to npm, the code itself is unfortunately proprietary, but perhaps some of the approaches I took would be relevant here. From the looks of the npm3 graph, there's a starvation in work while waiting for network responses, I'll have to poke around into the code to confirm, but to me this implies that npm is doing a depth-first traversal rather than a breadth-first I think it should be possible to model the downloading/tree-walking process as a work queue with a certain amount of concurrency - I used https://github.com/glenjamin/async-pool-cue to achieve this previously. The rough sketch looks something like this:
The reason I believe this to be effective is that it prioritizes requests which are more likely to produce more requests, therefore keeping the queue topped up and avoiding work droughts. It might also be beneficial to use https://www.npmjs.com/package/agentkeepalive on the "connection pool", and maintain a separate pool with a distinct concurrency limit for the tarball downloads. |
You can explore HAR files that Sam refers to here (give it a sec to load the timeline): |
Thank you for the thorough and detailed investigation, @samccone (and for the HTTPS followup, @paulirish βΒ I'm glad to have so much of Google's help here). I agree that this information is still frustratingly hard to pull together, and I'm impressed by the determination that it took to get this far. Background & contextOne of the major goals @iarna and I (and @isaacs, who was the original impetus for the installer rewrite) had for
This is because it was very hard to follow the operation of the installer before, even if you were intimately familiar with the code (there are maybe a handful of people who can claim that familiarity, and I'm not sure I can count myself among them). Because much (if not most) of the functionality of the installer accreted organically over time, features like You'll note that "deduping by default" and "flattening the tree" aren't in the list above βΒ @iarna noticed while she was doing exploratory work (a year ago β software is hard) that these would be easy features to add, and would greatly improve the usefulness of npm for both Windows and front-end developers. More significantly, they don't have a really major impact on performance βΒ the bulk of the work is in resolving dependencies, and building up the two trees in a form such that they can be used to produce the final order of execution for the install. So the changes to performance characteristics seen here are not due to those changes, but the deeper redesign of the install process. Everyone involved in this project knew that there was a certain amount of slowdown inherent in this work. There are two primary things that are inherently slower in
One last consideration to keep in mind is that we're starting to see significant problems with installs failing due to network issues. This is something that happens regardless of the npm version, with only loose correlation to the underlying Node version. It's particularly bad for users on Windows, users with less robust broadband connections or cheap DSL routers, and users of containerization software, and while we haven't isolated the cause with total certainty, right now the finger seems to point at Node and npm's propensity to open many, many network connections at once and hammer out tremendous numbers of requests in short order. One of the consequences of the small module philosophy is that the amount of work being requested at the network level has grown tremendously over the last couple of years, and it makes the caching process much more brittle. Also, it magnifies the effects of whatever latency may exist between the installing process and whatever registries or git hosts npm may be talking to. Where we are nowSo, with the above in mind, and looking at @samccone's results, I'm left in a bit of a quandary, mostly because I don't see obvious or simple fixes. We do have a number of ideas we're mulling over, and none of them are the sort of thing that we can do as a quick patch (both the existence of and the patch to the Minimize round trip latency to the registryAccording to @seldo, average request latency for simple metadata fetches to the registry average ~500ms at present. This means that even if npm's cache is just doing a freshness check (i.e. it will end up being logged as a 304 and the cached metadata & tarball will be used for an install), there's a 500ms wait for each package involved in the install. @soldair (with some help from @chrisdickinson) is looking into this, and @seldo seems confident that we can cut this number in half. I think that will have a pretty significant impact. Cache more metadataOne of the trickiest pieces of the install process is that packages need to be unpacked and their Another thing it needs to do is scan the package tarballs for Were we to add a serialized dump of what the Push more of the tree-building work onto the registryThis one could have a very significant impact, but it's trickier than it looks (for reasons very similar to the complications mentioned above). We've long had hopes that SPDY or HTTP2 could, along with some reworking of the npm cache APIs, greatly reduce the number of round trips necessary to produce a finished install. That said, this implies a much more sophisticated API between the CLI and the registry than currently exists, and as such is going to take a while so that both the CLI and registry teams have the time and resources available to build this out. Be smarter about how git is usedThis is an important piece, and probably too large for this issue, but it's worth keeping in mind that the way that git-hosted dependencies are downloaded and cached is very different from packages hosted on a registry, and is largely done by the Git toolchain itself, over which npm has very little control. Replace npm's download codeAlthough it may not look like it, a fair amount of attention has already been given to how npm downloads packages. The networking code is some of the most delicate and important in the CLI, because it has to work in an astonishing array of pretty adverse circumstances. It needs to work under Node all the way back to 0.8, it has to support a wide variety of proxies (and is still barely adequate in many Windows environments, given the lack of built-in NTLM or Kerberos support) and custom SSL configuration for same, deal with lots of fairly dicey corners of the internet (or, as @sebmck has recently noted, with the horrors of hotel or conference wifi). It needs to take advantage of HTTP Keep-Alive whenever possible. It needs to deal with connection resets and timeouts and other failures with some degree of resiliency. Not to put too fine a point on it, Node's built-in HTTP client code is not nearly as high quality as its server implementation. I think npm has reached a point where it needs its own download manager that is optimized for its peculiar and somewhat extreme needs. I have a really rough cut at a spec for what that thing might look like, but it's early days for that still, and I don't want to understate the difficulty or complexity of that work. This work needs to serve two masters: reducing the frequency with which installs fail due to excessive network traffic, and also ensuring that downloads do happen as quickly as possible once they've started. What's next?
How you can help
Thanks again, everybody! I apologize for the length of this, but I want to tweet this around and get more eyes on it, and I've had a lot of these thoughts in my head for a while, but not written down in one place. I hope you find the detail more useful than distracting. |
I've made a flamegraph of running an uncached, un-shrinkwrapped I notice a lot of idle (waiting on network/io?), and a huge chunk of time taken by it makes sense that the network reqs will not be parallelized ideally in an unshrinkwrapped project, but I'm curious to see how much other things like tarball unzipping, SVG and cpuprofile source here: |
I imagine this would require a significant amount of work, but one idea that would significantly improve performance, and reduce the number of requests npm needs to do when installing things, would be to do a lot more work on the server side instead of on the client. For example, when requesting a module, the server could bundle the module, and its entire dependency tree into a single tarball. This way, the npm client would only need to request the modules at the root level, and would get the rest of the tree for free. Going a step further, when requesting multiple modules, the requests could be batched so that only a single tarball of the entire dependency tree (pre-deduped by the server) could be downloaded. Doing this on the server side instead of on the client has several benefits. It would be faster (since the data is local to the server), and require fewer HTTP requests for one. But for another, it would be much easier to add future optimizations on top of. For example, the npm server could pre-render common install trees for popular modules into static tarballs that could be hosted on a CDN, requiring no work to bundle each time they are installed. |
@devongovett The issue with this approach is that modules allow for version ranges. Multiple modules may ask for the same dependency. One might ask for "4.3.1" exactly, and another might ask for ">= 4.0.0" -- In this case, npm.com can't know which to provide. The only other way would be to ask npm.com to fulfill the entire dependency tree for a whole project. Further complicating this, npm tries to do the right thing in the case that you already have some but not all of the required packages installed locally. |
The client would receive tarballs, each containing a root dependency and all sub-dependencies. Then it would merge the trees. It could receive multiple copies of the same module (different versions, or the same version), due to shared dependencies, but it could dedup them on the client side while merging the trees. While this might seem like a waste, my guess is that it would be faster to download a few extra bytes (most modules aren't that large) than make a million http requests from the client in series, especially if some of the popular modules were pre-rendered and cached on a CDN. For ranges, the server would bundle the latest version matching that range in the tarball (the same as if you ran If batching were supported, the deduping could happen on the server side ( This is just an idea at this point. Would require further experimentation to determine actual viability. |
The network usage graphs point to a case where a number of TCP connections are fighting among one another. Rather than fundamentally rework the sending strategy/revise downloading, the new HTTP spec has a number of new techniques to allow for efficient parallel transfers to occur on a single TCP connections. Simply switching to HTTP2 is a clear, very low coupling (no need to restructure how npm works) way to improve things in radical way that ought get a lot of progress, and adding some basic prioritization to that is a simple refinement that should be investigated well before any restructuring work is attempted, to get a baseline of what http2 is capable of doing on a widely-parallel problem. |
"Simply switching to HTTP 2" is not something that can be done right now:
HTTP 2 offers a lot of possibilities to improve network performance and download reliability, but it's not a simple, drop-in change, nor does it address the architectural serialization issues I discussed in my response. It does bring with it a whole host of interoperability challenges and opportunities for new sources of fragility and regressions. It's the future, but it's going to take some sustained effort to get there. |
How about npm keeping a local registry to do lookups in and only go to server if there is no match and on npm update? This is how a lot of other well performant package managers work. |
That's what the npm cache ( Because there's a wide variety of registry software in use, and because the semantics of the registry API are ill-defined, the CLI can't make too many assumptions. For instance, it can't assume that a package at a given version is never going to change on the registry side, even though package versions are immutable on npm's own registry. That's why the CLI's caching code makes a request with a cached etag (and One thing that some package managers do to speed things up for local operations is to store basic information about every available package locally. This is what apt, rubygems, and FreeBSD's pkg and ports systems do. It works well up to a point, but it doesn't scale well to hundreds of thousands of packages in an environment where there's the level of activity that there is on npm. Just the metadata is a couple of gigabytes, and it changes continuously. It's possible to follow that stream (mostly) in (mostly) real time, but it's not something you're going to want running on your laptop. As an example: Finally, it is possible to run a registry mirror locally, and there are several packages (including npm On-Site) that will make it relatively easy to run and maintain a mirror. That said, it's still not a trivial undertaking, and the performance gains are less significant than the architectural changes I discuss above. |
Would it be possible to special-case this for the npm registry? Or better yet add a header or something that specifies that package versions are immutable? |
Maybe it should assume that anyways? package managers will have to learn to bump version. This can't be true for git dependencies but for packages in the registry I can't see why not. |
500ms to download one tarball? At the risk of sounding naive... Would it make sense to stick extracted package.json's (and shrinkwraps et al) on a CDN and be done with it? |
because npm consumers can publish already existing version to the registry |
cdn for |
no they can't, that was disabled over a year ago, wasn't it? You can only unpublish, not overwrite. But even so - stick a new package.json in a CDN and download tarballs later. |
isnt it the same from discussed perspective? packages versions are not immutable |
@iamstarkov that's trivial to solve by having a cached "all versions for this package" file, with narrowed down "all minor versions for this major version for this package" file. On a CDN. (edit) even an |
@dominykas @iamstarkov package JSONs are already served from the cache 99% of the time; the 500ms latency is the result primarily of the average size of the data, which these days is pretty big, especially for old packages with lots of versions. There's a lot of low-hanging fruit on this front and so we think we can rapidly bring it down. Look for results on this front before the end of the year. |
@othiym23 thank you mucho for the in depth response and thoughts! On the possible actions to takeβ¦
In the above HARs, and other recordings, I havent' seen network latency as a significant factor. Some of the requests in the HAR manage to round-trip in 30ms. So, basically I'm seeing the CDN performance as great and overall NPM registry response time as good enough.
Yup, I think this could have a pretty big impact, but totally understand that it's not only a large engineering effort, but likely has dramatic impact on resource requirements. Huge potentials here, but I think it isn't worth exploring just yet.
Based on the above results, I do think HTTP2 does give you the biggest bang for your buck. |
Would HTTP2 be a huge gain vs better keepalives + pipelining? AIUI it would mostly add header compression? The npm3 network graph looks to me more like a case of work starvation than the network being a bottleneck - the requests seem quite short but are spread throughout the process' run time. |
@seldo damn it, I knew it... I'm not the only smart person in the world... good luck. |
Doing this would introduce as many complications as it would remove, unfortunately. Also, remember that
Yes. This is why reducing latency, and also caching more metadata could be big wins, and why I'm comfortable waiting until both the CLI and registry teams have the free time necessary to make the registry and CLI use HTTP 2 effectively. I really can't stress enough that moving to HTTP 2 is a nontrivial task βΒ our CDN, which pretty much makes npm's current scale thinkable, still doesn't do IPv6, and that's a rollout that's been underway for, what, 10 years now? Also, coming up with effective logic for things like figuring out which versions of dependencies to bundle, and how to tell the server what the client already has, is eminently doable but full of gotchas and corner cases of its own. It's dangerous to optimize the network flow too heavily towards either ends of the connectivity spectrum. Optimize for a fast pipe by bundling everything into big wads, and you're going to make life miserable for people on slow or intermittent connections. Optimize for making the payloads as finely-grained and concurrent as possible (which is more or less what
People are welcome to give this a shot βΒ most of the network logic is confined to |
Well put, @wraithan. I don't foresee a model where the CLI yields control over telling the registry what it needs, both because of what you've outlined, and because only a client can know what's already in its cache. |
@othiym23 I feel the cache problem is easily solved (which probably means I'm not understanding some complexity) by not having the server return all of the tarballs directly, but instead just returning the recommended tree (maybe even with tolerable semver ranges). The CLI could then make the determination which tarballs it still needs and fire requests for them. Theoretically the tree could include hashes so the client wouldn't need to do freshness checks, but then you are bloating the structure and maybe it becomes less worth it again. Multiple registries still makes this pretty complex though and possibly not worth it. The problem with having a team of high quality engineers working on the CLI and registry means that those gut instinct optimizations have been at least partially thought out, if not completely. I'll try to keep churning this problem in my head though and see if I can come up with anything. |
I don't know about the inner workings of npm, but isn't it possible to do optimistic prefetching of commonly used libraries like lodash? |
@othiym23 Recommend reading this blog while moving to HTTP/2 http://engineering.khanacademy.org/posts/js-packaging-http2.htm |
Is it possible for cli to get As far as I understand there are already some way to do that for README.md to show it on npm packageβs page. |
@iamstarkov isn't that what's happening already? When you go into |
@Qix- dunno, thats why im asking |
@iamstarkov if not, that seems like it should be the first thing to happen. Is there a diagram anywhere that outlines exactly what happens during NPM's operations? That would be incredibly useful. |
/CC @IgorMinar |
Awesome work, @samccone. This is super cool! Angular 2 is migrating to npm3 in a week or two, so this is a great timing for us actually. Thank you! |
if decompressing brotli is also fast enough⦠|
...yeah this seems hard. In my naive thinking: Server: If nobody used semver then it would be a lot easier to construct/store a dependency tree for each package on the server and you could send down all that info in the first request so that the CLI could then sort out the ideal tree and fetch packages as quickly as possible. But semver means that the set of possible trees branches out crazily as you work back through each level...which I imagine means that you more or less have to do the entire ideal tree building algorithm on the server, or do some heuristics in terms of what the ideal tree is and have the client fix things later (which could be pretty slow and in general would make things harder to understand which is itself a problem). Client: Storing all dependency information on the client sounds good but yeah, i imagine its impractical at this scale and you'd have to update everything from the main registry before any install anyway. So it seems hard to think of something that
Apart from suggestions like HTTP2 the only thing I can think of off the top of my head is a kind of Just my (probably misguided) thoughts. |
Some questions:
Or is the problem that the cache cannot be reused in case someone switches registry between installs? I suppose this could be fixed by storing the origin of a tarball in the cache.
|
I feel like we've gotten far afield here. This thread was started in relation to a performance regression that appeared between npm@2 and npm@3. Many of the proposed solutions appear to be targeting bottlenecks that would (in theory) equally affect npm2. For now, I think that we need to focus on profiling, and determining why npm@3 is so much more prone to work starvation. Also, remember again that any radical changes to NPM's hosting infrastructure (particularly, anything that would break NPM's ability to host almost everything as static files on a CDN) are going to be a non-starter. |
Also see my replies in November about why server built trees would be much On Wed, Jan 27, 2016 at 10:07 AM Andrew Schmadel notifications@github.com
|
This was on HN today: Disabling npm's progress bar yields a 2x npm install speed improvement |
That bug was fixed later that same day: iarna/gauge@a7ab9c9 but is not the focus of this issue. |
I've been doing a bit of work trying to trace the code path followed by a typical install in npm3 and representing it in a MindMap diagram as v rough pseudocode. Will do the same for npm2 and, if it seems useful will post links to them here. If nothing else, by giving people a more concrete idea of whats happening where/when it could hopefully help people (myself included) understand which proposed solutions are viable, which aren't and why things are the way they are atm. π |
This has been an illuminating and useful discussion, but it reached a point of diminishing returns a while ago, and so I'm going to close and lock it in the hopes that folks will start more focused and actionable discussions about specific aspects of npm's performance, similar to #11283. For those who have been following this thread, here's the CLI team's current road map:
1-4 are important but unglamorous technical debt to be paid, and the faster the team can get through those, the faster we'll be able to start working on performance in a serious way. Help with any of those (especially anything marked Thanks again to @samccone for his detailed analysis, and to everyone for their participation in this discussion. |
npm3 perf π
TL;DR: We've got a network π utilization regression that's at the root of the npm3 install performance. Here's the bird's π¦ eye view:
How'd we get here? Read onβ¦
After using npm3 for a while now, it is clear that there has been a slowdown in the time to install packages. Although a level of slowdown was to be expected given the new complexity around the flat tree and deduping dependencies, a 2 - 3x (and sometimes worse) slowdown across installs has become somewhat painful across the node community.
Instead of just yelling at the world, I decided to hop in and do some profiling work to better understand what is happening under the hood and observe (via profiling) the cause and perhaps just maybe some ideas to make it fast again. (yay OSS β¨)
Profiling NPM's JS Activity
To start this journey into answering the question of why something is slow, the first thing that I wanted to do was gather some numbers and some metrics. As anyone who has ever done or attempted to try to profile in node.js, you know that upon a basic google search you are immediately thrown into a dystopian world of outdated stackoverflow answers, gists that redirect to stackoverflow questions, that end in WIP, assorted comments buried deep within the threads on github, and enterprise debugging tools :( β¦.
However if you are persistent you eventually will stumble upon the two magical and useful things. First, the ability to get v8 logs out of node by passing the
--prof
flag which gives us a dandy v8 trace you can load inchrome://tracing
⦠In our case this is not so useful.The second thing that you will find is this repo node-inspector/v8-profiler which gives us exactly what we want⦠a CPU profile of our node process. A small patch then hooks in the instrumentation: npm 3 patch: (samccone@64cc5ca) npm 2 patch (kdzwinel@3978acc)
Profiling NPM's Network Activity
There was one major piece of the perf story that I was missing and that I knew was really important to get, on top of just the cpu profile, and that was the network activity. I had the hypothesis that one of the reasons why the new version of npm got slower was not due to a CPU bound issue #8826 (comment) but rather a change in how the requests to the npm registry were being made. To get a HAR dump of the process I got in touch with @kdzwinel who has been working on just this thing https://github.com/kdzwinel/betwixt . He was able to send over raw HAR dumps from both npm 2 and npm 3 π
For my test case I chose to use the package yo https://github.com/yeoman/yo as the example. For both npm2 and npm3 I made sure to install the package with no local npm cache, and also in forced http mode (due to a limitation around the current HAR dumping)
After gathering both of the results I was left with CPU profiles and the HAR dumps for each version of npm installing the same project under the same kind of constraints. It was now time to take a look at the numbers inside of chrome devtools profiler and a handy HAR viewer to try and prove my initial hypothesis about network perf vs cpu bound perf.
First letβs take a look at the CPU profile to see where time is being spent, and see if we can visually see a difference between what is going on in npm2 and npm3.
Results: NPM2 JS Activity
Results: NPM3 JS Activity
Well that is interesting, npm3 seems to be a much more sparse flame chart when installing the exact same package (yo), with large gaps in between brief periods of javascript execution. When we zoom in closer to get a look at exactly what is happening between the blank spots we get our first lead
It looks like each of the spikes in-between the blank space on the flame chart is when npm is handling the result from the registry. This is repeated over and over again until all of the packages have been resolvedβ¦.
Now this pattern of downloading things, and moving on seems totally 100% reasonable, but still at this point I am not sure why there is a big difference between npm2 and npm3, why are there gaps in the timeline of npm3 and not npm2 what exactly is different?
At this point, I am pretty sure the answer to these questions is all within the HAR dump, so let's take a look.
Results: NPM2 Network Activity
Results: NPM3 Network Activity
Right off the bat you notice the difference in how the network requests are being made, instead of groups of parallel requests as in npm2, npm3 seems to be doing its requests in a serial fashion (thus taking around 4x as long), where it waits for previous requests to complete before continuing the chain of requests. I further confirmed this by zooming into the waterfall of requests in npm3 where you can pretty clearly see this exact stair stepping of dependant requests
On the initial request for yo everything lines up, we see a batch of requests for all of the root level dependency meta but after this point npm3 and npm2 diverge significantly
Further down in the chain of requests we start to see this pattern, where we make a request of a package's metadata and then make a request for the package, and then repeat the process for all of the packages dependencies. This approach of walking through the dependencies really starts to add up because it looks like every batch of requests is dependent of the previous finishing, thus if a single request takes some time (like rx in this case) it pushes the entire request start time of many other packages.
The end result here, is that we end up with a very sparse dependent network requests that for packages with many dependencies such as yo, is very slow for the end user.
Closing thoughts and ideas.. π
npm3 under the hood is operating in a totally new way, and there is a cost for all of the awesome new features such as a deduped tree and a flat tree structure, however there is a pretty significant cost π° when it comes to install time. Topically it seems like investigating ways to prebundle on the server and or batch requests to the registry would boost perf significantly, there is a fixed cost per network request that npm is making and just the volume of said requests adds up quickly. Figuring out a way to cut this number would boost performance significantly π€ .
The other big takeaway here was how hard it was to get these kind of perf metrics out of node, one day this will be fixed when nodejs/node#2546 landsβ¦
Hopefully this was somewhat helpful and can help others who know the code way better than I look into the causes of these perf hits π π .
special thanks to @paulirish and @kdzwinel π
The text was updated successfully, but these errors were encountered: