Skip to content
This repository was archived by the owner on Aug 11, 2022. It is now read-only.
This repository was archived by the owner on Aug 11, 2022. It is now read-only.

investigating npm 3 perf #10380

@samccone

Description

@samccone

npm3 perf 🔍

TL;DR: We've got a network 📉 utilization regression that's at the root of the npm3 install performance. Here's the bird's 🐦 eye view:

How'd we get here? Read on…


After using npm3 for a while now, it is clear that there has been a slowdown in the time to install packages. Although a level of slowdown was to be expected given the new complexity around the flat tree and deduping dependencies, a 2 - 3x (and sometimes worse) slowdown across installs has become somewhat painful across the node community.

Instead of just yelling at the world, I decided to hop in and do some profiling work to better understand what is happening under the hood and observe (via profiling) the cause and perhaps just maybe some ideas to make it fast again. (yay OSS ✨)

Profiling NPM's JS Activity

To start this journey into answering the question of why something is slow, the first thing that I wanted to do was gather some numbers and some metrics. As anyone who has ever done or attempted to try to profile in node.js, you know that upon a basic google search you are immediately thrown into a dystopian world of outdated stackoverflow answers, gists that redirect to stackoverflow questions, that end in WIP, assorted comments buried deep within the threads on github, and enterprise debugging tools :( ….

However if you are persistent you eventually will stumble upon the two magical and useful things. First, the ability to get v8 logs out of node by passing the --prof flag which gives us a dandy v8 trace you can load in chrome://tracing… In our case this is not so useful.

The second thing that you will find is this repo node-inspector/v8-profiler which gives us exactly what we want… a CPU profile of our node process. A small patch then hooks in the instrumentation: npm 3 patch: (samccone@64cc5ca) npm 2 patch (kdzwinel@3978acc)

Profiling NPM's Network Activity

There was one major piece of the perf story that I was missing and that I knew was really important to get, on top of just the cpu profile, and that was the network activity. I had the hypothesis that one of the reasons why the new version of npm got slower was not due to a CPU bound issue #8826 (comment) but rather a change in how the requests to the npm registry were being made. To get a HAR dump of the process I got in touch with @kdzwinel who has been working on just this thing https://github.com/kdzwinel/betwixt . He was able to send over raw HAR dumps from both npm 2 and npm 3 🎉

For my test case I chose to use the package yo https://github.com/yeoman/yo as the example. For both npm2 and npm3 I made sure to install the package with no local npm cache, and also in forced http mode (due to a limitation around the current HAR dumping)

After gathering both of the results I was left with CPU profiles and the HAR dumps for each version of npm installing the same project under the same kind of constraints. It was now time to take a look at the numbers inside of chrome devtools profiler and a handy HAR viewer to try and prove my initial hypothesis about network perf vs cpu bound perf.

First let’s take a look at the CPU profile to see where time is being spent, and see if we can visually see a difference between what is going on in npm2 and npm3.

Results: NPM2 JS Activity

screen shot 2015-11-13 at 8 40 35 am

Results: NPM3 JS Activity

screen shot 2015-11-13 at 8 40 47 am

Well that is interesting, npm3 seems to be a much more sparse flame chart when installing the exact same package (yo), with large gaps in between brief periods of javascript execution. When we zoom in closer to get a look at exactly what is happening between the blank spots we get our first lead

screen shot 2015-11-13 at 8 56 44 am

It looks like each of the spikes in-between the blank space on the flame chart is when npm is handling the result from the registry. This is repeated over and over again until all of the packages have been resolved….

Now this pattern of downloading things, and moving on seems totally 100% reasonable, but still at this point I am not sure why there is a big difference between npm2 and npm3, why are there gaps in the timeline of npm3 and not npm2 what exactly is different?

At this point, I am pretty sure the answer to these questions is all within the HAR dump, so let's take a look.

Results: NPM2 Network Activity

screen shot 2015-11-13 at 9 31 39 am

Results: NPM3 Network Activity

screen shot 2015-11-13 at 9 31 28 am

Right off the bat you notice the difference in how the network requests are being made, instead of groups of parallel requests as in npm2, npm3 seems to be doing its requests in a serial fashion (thus taking around 4x as long), where it waits for previous requests to complete before continuing the chain of requests. I further confirmed this by zooming into the waterfall of requests in npm3 where you can pretty clearly see this exact stair stepping of dependant requests

On the initial request for yo everything lines up, we see a batch of requests for all of the root level dependency meta but after this point npm3 and npm2 diverge significantly

screenshot from 2015-11-13 09 29 48

Further down in the chain of requests we start to see this pattern, where we make a request of a package's metadata and then make a request for the package, and then repeat the process for all of the packages dependencies. This approach of walking through the dependencies really starts to add up because it looks like every batch of requests is dependent of the previous finishing, thus if a single request takes some time (like rx in this case) it pushes the entire request start time of many other packages.

screenshot from 2015-11-13 09 30 41

The end result here, is that we end up with a very sparse dependent network requests that for packages with many dependencies such as yo, is very slow for the end user.


Closing thoughts and ideas.. 💭

npm3 under the hood is operating in a totally new way, and there is a cost for all of the awesome new features such as a deduped tree and a flat tree structure, however there is a pretty significant cost 💰 when it comes to install time. Topically it seems like investigating ways to prebundle on the server and or batch requests to the registry would boost perf significantly, there is a fixed cost per network request that npm is making and just the volume of said requests adds up quickly. Figuring out a way to cut this number would boost performance significantly 🚤 .

The other big takeaway here was how hard it was to get these kind of perf metrics out of node, one day this will be fixed when nodejs/node#2546 lands…

Hopefully this was somewhat helpful and can help others who know the code way better than I look into the causes of these perf hits 🐎 📈 .


special thanks to @paulirish and @kdzwinel 💖

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions