Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

How to combat npm downtime as a community #4131

Closed
danjenkins opened this Issue · 17 comments

8 participants

Dan Jenkins Jarrod Overson Alex Kocharin Raynos (Jake Verbaten) Brian McKenna Demetri Mouratis Tim Oxley Alan Plum
Dan Jenkins

I wrote up a post yesterday on the Node.js google group but it seems there's more chance of a more in depth response by posting it here

It asks what can be done about npm's downtime, how do we make replication easier, if I wanted to contribute a public mirror what could be done to make those public mirrors easier to find

Looking for actions, there's been a lot of talk about what could be done and what could be done in the future within npm itself (such as artifacts on a cdn)

I can't see much public information on all of this, and what is out there is just conjecture

We all rely on npm because it's awesome, lets help sort the issue and come up with an action plan that people can contribute to

https://groups.google.com/d/msg/nodejs/oqefb3EYKRc/Vax-qCRT6rkJ

Jarrod Overson

+1 to making it simpler to mirror and replicate and to have mirrors be seamlessly incorporated into the npm bin so that fallbacks are used automatically.

There are plenty of companies and institutions who have the hardware and bandwidth to help with this and it is going to be more important as time goes on.

Dan Jenkins

So there's now scalenpm.org which is asking for funds to make the hosting on nodejitsu better etc, I fear this is heading down a path that it shouldn't and the solution is actually distributed, instead of npm just living on nodejitsu...

Alex Kocharin

I fear this is heading down a path that it shouldn't and the solution is actually distributed, instead of npm just living on nodejitsu

Agreed. There is http://npmjs.eu/ mirror already, and I hope a few others are coming. I'm a bit concerned about security stuff, but otherwise it's a way to go.

Raynos (Jake Verbaten)

We need a list of public mirrors.

and we need things like alias npm2='npm --registry known_good_mirror' all over twitter once npm goes down.

This solves the installation issue. At our company we have an internal mirror that I use whenever npm is down

When npm goes down we can't publish because its centralized, for decentralized publishing work on npmd

Alex Kocharin

We need a list of public mirrors.

Yeah okay, let's build it. I know npmjs.eu, any others?

Raynos (Jake Verbaten)

there is one in australia. We should probably have a page on npmjs.org itself. cc @isaacs

Raynos (Jake Verbaten)

@rvagg do you know what the australia mirror is ?

Demetri Mouratis

We need a list of public mirrors.

Yes.

and we need things like alias npm2='npm --registry known_good_mirror' all over twitter once npm goes down.

No. Npm itself needs to manage this dynamically. This is the fundamentally broken piece of npm as it exists now. See, e.g., every other package manager.

This solves the installation issue. At our company we have an internal mirror that I use whenever npm is down

Right, but how do you tell your npm clients that the main npm registry is broken/fixed?

Raynos (Jake Verbaten)

@dmourati agreed, it would be better if npm can manage multiple cascading registries.

Although personally I prefer to manually switch between npm and npm2 so I always know exactly where a module comes from and from which registry. So we just need a list of mirrors.

Tim Oxley
Collaborator

Related: deoxxa/npmrc#2

Dan Jenkins

My issue with all of this is the fact that you have to do this all yourself, I'd love for npm to just have some of this stuff in it, so it's just there for everyone who has node.. there are already other projects out there like npmd, along with however many of near enough the same thing. Not saying these aren't all good things, they are - and it's great people want to make things better and do it in a modular way - but I'd still love for some of this love to get put back into npm that everyone uses

Alan Plum

I've done a quick survey of the most prominent package managers. This seems to be a hard problem with no right answer:

  • bower mostly relies on GitHub directly, but there is apparently a way to make it use a specific mirror instead of the main repository.

  • jam allows specifying a list of mirrors to use instead of the main repository in its configuration file, but it also has a command-line switch to use a specific mirror.

  • component.js simply uses GitHub URLs directly for its dependencies.

  • Python's PyPI used to rely on sub-domains forwarding to mirrors (with an additional auto-discovery subdomain) but recently replaced the official mirror list with a CDN, citing security (can't issue SSL certs for third-party subdomains) and staleness (can't guarantee mirrors will be up to date) as major concerns.

  • Ruby's rubygems.org seems not to have any official mirror list, but there seems to be a way to delegate individual dependencies to third party sources in the Gemfile format. The SOP seems to be to host your own private mirror if you need the availability, but the official repository seems to use Cachefly (with the actual files being hosted on S3).

  • Perl's CPAN has an official mirror list that also tracks stats including the update frequency, the most recent update, uptime, etc. The client apparently downloads a list of mirrors the first time it is used and then uses that list to automatically pick a mirror each time it is used.

  • PHP's packagist.org seems not to have any official mirrors, but the composer.json format apparently has an elaborate way of defining multiple sources and ways to deal with them (supporting http(s), ssh and several vcses).

  • Java's maven apparently follows the same approach as packagist, except it explicitly lets you define which mirror to use for which repository id (having no experience with maven, I guess this is similar to how apt repositories are structured).

  • C#'s (or .Net's) NuGet again does not seem to have any official mirrors, but can be told to use a specific mirror when it is used from the shell. Apparently VisualStudio also has some way to do maintain a list of third-party sources (i.e. mirrors).

In a nutshell, the following approaches seem to have some merit:

  1. Maintaining an official list of mirrors

    This makes it trivial to find a fallback in case the main repo goes down, but needs some curation to make sure the mirrors are up-to-date and trustworthy (I don't know whether npm guarantees integrity yet, but this is where automated tamper-detection would be useful).

    This is also a prerequisite for the following:

    1.1. Automatic fail-over to mirrors

    This would allow npm to failover transparently in case the main repo goes down, without any intervention from the user. If this is a security concern, the behaviour could be toggled with a flag.

    1.2. Keep a local copy of the mirror list

    In order to failover when the main repo and the official list is down (although the probability of this happening could be reduced by putting a CDN in front of the official list), npm needs to know where to look. This list would need to be updated automatically but would eliminate the need for on-demand autodiscovery.

  2. Put the main repo on a CDN

    Except for publishing, the repo can be mostly read-only. Having a CDN in front of it could increase availability as well as reduce latency. Of course this still requires some redundancy as Amazon's outages show. In any case, this would not eliminate the need for mirrors, but it would significantly reduce it.

  3. Allow specifying a mirror explicitly

    @Raynos already addressed this with his npm2 alias example. Having this as a command-line option as well as configurable through npmrc takes care of the most common use cases for private mirrors.

  4. Have a sources or repositories field in package.json or npmrc

    The biggest question would probably be how to structure the field, maybe composer has the right idea. We probably don't want the package.json of a package published to npm to specify external repositories the way rubygems apparently allows it, so this should probably be a setting in nprmc, effectively overiding the mirror list above.

Demetri Mouratis

Nice progress. @pluma has the right outline. I should have been more explicit and said OS package managers: yum/apt. In my experience, both do the right things.

http://wiki.centos.org/PackageManagement/Yum/FastestMirror

http://mirrors.ubuntu.com/

Alan Plum

@dmourati as far as I can tell yum and apt both basically just use a mirror list with automatic failover. Any idea how they deal with the problem of stale or malicious/compromised mirrors?

Alex Kocharin

Any idea how they deal with the problem of stale or malicious/compromised mirrors?

In most OS package managers all packages are gpg-signed by distro maintainers, so malicious mirrors simply won't work.

Demetri Mouratis

Yea, GPG signed packages are the next logical requirement but let's table that and focus on the availability issue for now.

Dan Jenkins danjenkins closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.