This repository has been archived by the owner. It is now read-only.

Decentralized Module Resolving w/ proof of concept #29

Closed
formula1 opened this Issue Jan 24, 2016 · 49 comments

Comments

Projects
None yet
@formula1

formula1 commented Jan 24, 2016

Last conversation here - #26
I believe I was part of the problem and I'd like to get the thread started on the right foot again.

I'm basing most of this off the TC39 process which has been an effective means of getting features included into the w3c standards. Link here -https://tc39.github.io/process-document/

Purpose

  • Problem - modules are currently centralized to one location. With this has brought up problems such as
    • Naming taken by deprecated, abandoned, un-tested or purposefully empty repositories
    • Attempting to duplicate or reserve the data is an extreme task making the one server extremely important developing. I believe it was this year that one of the servers at npm went down causing fires and panic. I cried
  • Potential challengers
    • Mutability and Source of Truth - If the repo changes but the code doesn't this should be reflected. If there is a mutability issue, it can be resolved.
    • Maintaing Existing APIs - cli commands and possibly ability to require it as a module
    • Enabling servers to get started quickly - Ensuring someone that wants to be involved can get into it quickly
    • Standardization and Adapting - Ensuring changes in the software ecosystem can also be reflected here while maintaining backwards compatability and solid standards for internal use (Semver, package.json, etc)
    • Trust - Authentication, signing of packages, handing rougue machines. Something that likely others can speak more about.
  • Opportunities
    • Even a newbie can help node - Peer to Peer distribution allows for a newbie to give back to the community even if they aren't perfect coders (which few of us are)
    • People can see first hand of what node and the node community is capable of - Its one thing to 'hear' about all the awesome projects on hacker news but to see and be apart of one is a step further.
    • Individuals can create their own registries - Whether adding statistics, having a rigid universal testing suite or other, they don't have to depend on npm accepting their pull request. They can build their own.

Current Proposed Solution

This is a culmination of efforts from @joepie91 @ChALkeR @mbostock @scriptjs @formula1

For More Detailed Explaination

Overview
  • Enabling a Client to have Multiple Registries
    • Registries can advertise Responsibility, Statistics or etc - May choose to host a package or not by their own standards, May choose to offer developers statistics about downloads and queries. Ideas can come into fruition and possibly getting clients excited about developing for it.
    • Easy(ier?) Duplication gives backups to a client - By ensuring the delivery mechanism is not the package resolving mechanism, disk space will not be a heavy requirement. This allows for backups to be readilly available.
    • Rogue Registries can be removed - Likelyhood is minimal that npm would do anything absurd but there is a cost to trust. But allowing it to be removed easily, there is no issue.
  • Enabling a Client to have Multiple Distribution Mechanisms
    • Circumvents the need to upload to two locations - Client can receive direct from their wherever the author is writing to
    • Distributed amoungst Peers - Tends to be faster and also allows for individuals to give back to the community if they choose
    • Allow trusted download - They may choose to only download from one place they trust. This may be a mirror of another location
  • Enabling any Client to get even deeper into node
    • Peer to Peer -Clients can contribute to the peer to peer network giving back to the community even.
    • Inspiritaional - By seeing this sort of mechinism, it may inspire individuals to look and think differently about node and javascript.
    • Operating their own registry - Client seeing how simple a registry can be, may choose to start their own at minimal cost.
Psuedo Code

Installation

  • Client gives Semver to Registry and gets a Distribution Handle
  • Client figures out how to use the Distribution handle then uses it to download the package
  • Client Unpacks the download and moves it to the appropriate location

Publishing

  • Client gives handle or a tar to Registry
  • Registry figures out what it needs to do with the handle and Downloads it
  • Registry Unpacks it, Validates it, Indexes it then Allows their distribution method(s) to handle it
    • This may involve notifying distributors, which may or may not choose to mirror

Enabling a Client to Have Multiple Registries

  • Registries can advertise Responsibility, Statistics or etc - May choose to host a package or not by their own standards, May choose to offer developers statistics about downloads and queries. Ideas can come into fruition and possibly getting clients excited about developing for it.
  • Easy(ier?) Duplication gives backups to a client - By ensuring the delivery mechanism is not the package resolving mechanism, disk space will not be a heavy requirement. This allows for backups to be readilly available.
  • Rogue Registries can be added and removed - Likelyhood is minimal that npm would do anything absurd but there is a cost to trust.
Technologies To take inspiration from

Demo

https://github.com/formula1/decentralized-package-resolving-example

git clone https://github.com/formula1/decentralized-package-resolving-example
npm install
npm test

Its unoptimized, much of it is synchronous. But hey, I'd like to believe I'm paying my time and effort forward so that if this thing gets through I will feel like I was part of the solution. Every journey begins with a step I suppose.

PInging those who seemed interested: @scriptjs, @mikeal, @Martii, @ChALkeR, @joshmanders, @jasnell, @ashleygwilliams, @Qard, @bnoordhuis

@formula1 formula1 changed the title from Decentralized Module System w/ proof of concept to Decentralized Module Resolving w/ proof of concept Jan 24, 2016

@jasnell

This comment has been minimized.

Member

jasnell commented Jan 24, 2016

@formula1 ... thank you for engaging with a concrete proposal. This is exactly the kind of thing I was hoping for. It might take me a couple days to dig into the details but I promise I'll take a look.

@scriptjs

This comment has been minimized.

scriptjs commented Jan 24, 2016

@formula1 I'll take a look. I have not have the time to write anything formal after #26 was closed yesterday (after this discussion went south a second time) but had made a commitment to @jasnell yesterday to do so. I have a prototype for resolving and fetching to endpoints at the moment with providers and resolvers using semver. I am studying ied's caching atm which is robust.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 24, 2016

@scriptjs @formula1

If you are proposing any new schemes, note that they should support moderation. Trust does not work like this, even if you trust someone, if malware infects his computer or his accounts somehow become compromised (ref: my post), that could be used to replace some packages with malware.

Copy-pasting myself from #26 (comment):

Note that moderation of the registry is currenly needed, because there could be harmful packages. Also there could be (theoretical) situations when the whole registry must be stopped for moderation (I could describe such a situation a bit later), and that should be achievable. I am not saying that this restriction must be absolute, though.

So, please either make sure that your proposals support moderation (not necessary by one party, but by a limited amount of parties all of which have adequate support timings) or show me (and everyone else) how that would not be an issue.

@Qard

This comment has been minimized.

Member

Qard commented Jan 24, 2016

@othiym23 and I discussed a bit last night, the idea of a stripped down fetch + tar tool that wouldn't include any repository connection or dependency management. It'd just fetch a tar and unpack it, maybe running some very basic lifecycle scripts, like node-gyp, but probably without the configurability you have with npm. If you want more full-featured package management, you could just use the tool to fetch a more complete package manager.

https://twitter.com/stephenbelanger/status/691046744780599296

@jasnell

This comment has been minimized.

Member

jasnell commented Jan 24, 2016

Yep, I've been stewing in the same direction. It could be very minimal
without the more advanced layout and dependency algorithms, and it wouldn't
need much in the way of package.json smarts. Higher level package managers
can be hooked in somehow to provide the higher level functions. I like it.
On Jan 24, 2016 7:50 AM, "Stephen Belanger" notifications@github.com
wrote:

@othiym23 https://github.com/othiym23 and I discussed a bit last night,
the idea of a stripped down fetch + tar tool that wouldn't include any
repository connection or dependency management. It'd just fetch a tar and
unpack it, maybe running some very basic lifecycle scripts, like node-gyp,
but probably without the configurability you have with npm. If you want
more full-featured package management, you could just use the tool to fetch
a more complete package manager.

https://twitter.com/stephenbelanger/status/691046744780599296


Reply to this email directly or view it on GitHub
#29 (comment).

@scriptjs

This comment has been minimized.

scriptjs commented Jan 24, 2016

I am in favour of this approach as well.

@joepie91

This comment has been minimized.

joepie91 commented Jan 24, 2016

I've been contemplating a more decentralized implementation for NPM for the past few days. While I'd planned to keep thinking it over for a little while, it seems that now that the discussion is underway, I should probably share my ideas and how I believe they address some of the technical (and 'political') challenges that are posed by the aforementioned proposals.

There are still some 'gaps' in my proposal, given that I've not had enough time to think it over yet. Perhaps those can be filled in here.

The primary requirements

  • Deterministic package resolution, primarily due to sub-dependencies. If I publish a package foobar on a registry, and it relies on package baz, then I must be able to trust that any one person installing foobar will get the same (or a compatible) copy of baz that I have been developing against.
  • Tolerance for moving of the 'source location' - GitHub is not going to be around forever, and people do move around the primary location of their source code every once in a while. This makes hardcoding locations impractical.
  • Bandwidth-efficiency. By making it expensive to run a registry, this will self-select for commercial organizations who might have incentives that do not align with the best interests of the registry. It should be viable for non-commercial registry servers to exist, for example.
  • Resistance to 'corruption' - it should not be possible (or be very hard/expensive) for any one person or entity to 'subvert' the registry in some way that is not in the best interest of the registry from a technical perspective, for their own benefit - whether commercial, political, or otherwise.
  • Private registries, for internal/proprietary modules. I'm not a fan of proprietary code at all, but realistically, this is going to be a requirement.

How NPM solves these problems right now

  • Deterministic package resolution: One central registry that is 'authorative' where it concerns package names. NPM acts as the gatekeeper.
  • Tolerance for moving of the 'source location': The repository URL can be changed and updated in package.json. This does not affect the name in the registry. Additionally, deprecation notices can be used to inform users of name changes.
  • Bandwidth-efficiency: NPM does not solve this. They simply pay for the bandwidth used, and charge their (private) users.
  • Resistance to 'corruption': NPM does not currently solve this.
  • Private registries: NPM only solves this partially, by providing a (commercial) hosted private registry service, and the ability to configure the registry URL for other cases.

Where alternative solutions fail

  • Change to a different registry provider with the existing software: There is a considerable cost burden to running a registry, which makes this a very narrow pool of possible operators, all of which have commercial incentives that may not necessarily align with the best interests of the ecosystem. It also does not fix the fundamental monopoly issue, no matter who is 'in charge'.
  • Install from GitHub/BitBucket/etc.: Not viable. Software moves over time, and this makes it near impossible to specify subdependencies and be able to trust that they will remain working in the future. This also complicates matters when end users are on a restrictive connection (eg. behind the Great Firewall), and may not be able to access all of these sources.
  • Decentralized repository: The usual 'lack of a trust path' issues apply. Additionally, implementing moderation is hard. This is likely to take considerably more time to design and implement reliably than can be afforded in this case.
  • Ask multiple authorative servers: No deterministic package resolution.

My proposal

Core to my proposal is the splitting up of tasks:

  • Read servers: Read-only registry servers that accept requests for the serving of existing packages, and sometimes also serving tarballs. This is similar to how NPM mirrors work now, but I will get into details about the differences later.
  • Write server(s): One or more servers, controlled by a single entity, on a fully open-source software stack. These are the servers that actually authenticate users, accept package uploads, and decide what the registry looks like.
  • Distribution nodes: Nodes that serve the tarballs. These can be arbitrary systems, including clients, which can do tarball delivery through a P2P distribution mechanism like BitTorrent + DHT.

The distribution of tarballs could use a mechanism like webseeds, pointing at one or more registry servers that serve tarballs (or even generate such a webseed list on the fly), to decrease the latency for downloading tarballs while still offloading part of the distribution load to other peers.

All metadata and tarballs are cryptographically signed by the 'write servers'. I want to emphasize that UX is very important here, and it cannot become any significantly harder for end users to use NPM as a result of architectural changes.

Furthermore, for the absolute worst-case scenario, it should be possible for individual end users to install from a Git repository using semantic versioning specifications, referring to the tags on the repository to find compatible versions.

Threat models

  • Write server operator goes 'rogue': If a consensus is reached about this by read server operators (or the ecosystem at large), they can simply point their servers at a write server operated by a different entity.

    End users will transparently receive their metadata and tarballs from a new source, and could explicitly blacklist any sources if they so desire (since the source can be identified by the signature).

    This makes it very expensive for a write server operator to go rogue, and relatively cheap for the ecosystem to move away from an operator that has done so. It doesn't prevent this from occurring, it just changes the incentives so that it becomes unattractive for an operator to do so.

    If no consensus can be reached that the write server operator has gone rogue, but a subset of the users has issues with the moderation policy of the write server, they can fall back to using Git repositories with tags as absolute last measure. This has downsides (eg. no ability to move the sources), but is better than nothing.

  • Write server blocks end users from downloading: This has no effect. Package distribution is not handled by the write server. This removes a very big issue with current NPM, where critical Node infrastructure can be arbitrarily denied to any given user.

  • Write server blocks end users from uploading: See above 'goes rogue' section.

  • Write server goes down: Metadata and packages remain being distributed by read servers and distribution nodes as normal.

  • Read server operator goes rogue: The 'source' of the signed data would change. There are a number of possible solutions for resolving this:

    • Do not validate the source by default. This is probably a bad idea.
    • 'Pin' the source by default, and warn the user and ask for confirmation when it changes, explaining the risks.

    In all of these cases, a decision would have to be made between letting people configure their own read servers, or shipping with a set of rotated read servers. I have not yet weighed the pros and cons of this.

  • Read server goes down: Fallback to another read server. As each server just operates as a more or less 'stateless' cache, this should not be a concern.

How my proposal addresses the requirements

  • Deterministic package resolution: There is still an authorative registry, but that authorative registry can be more easily changed if absolutely necessary.
  • Tolerance for moving of the 'source location': This remains unchanged from current NPM. The registry metadata points at the authorative location.
  • Bandwidth-efficiency: P2P distribution, with webseeds as a fallback.
  • Resistance to 'corruption': Expensive to go rogue, cheap to mitigate. See 'threat models' section above.
  • Private registries: Tools like Sinopia can be used as normal. I have not taken this into account in the design yet, but can't see a reason why it wouldn't work. I have not yet considered whether better solutions exist.

Unsolved problems

  • How is authentication data transferred between write servers?

    When migrating to a different write server, that write server would need some way to transfer authentication data, but without exposing users. A possible approach for this would be keypair authentication (as the public key can be freely exposed), but this introduces a problem of access recovery.

    How would a write server decide to do a 'password reset' for somebody, when keypair cryptography is involved? Conversely, how can the new write server trust the old write server to be supplying the correct authentication data, rather than eg. 'backdoored' authentication data? Audits can be done (as the public keys are publicly exposed), but this can still miss things.

@scriptjs

This comment has been minimized.

scriptjs commented Jan 24, 2016

From what I see, there are few topics to look at this holistically:

Minimal viable client in node

I think if we were to agree on the notion @jasnell came up with yesterday and @Qard reiterated here today, we can have something tangible here soon. Along the lines of solving the broader issue of distributed package management the rest can fall into place. Someone can start a repo. Most of this code is already available in some form. This is essentially what bower was for the most part. This resolves the optics of NPMs bundling in node, NPM as an upstream, and opens node up to developer choice. We should agree on what it will handle but it should be as little as possible. Enough to bring in a client and build it if necessary.

Full featured clients

Let developers choose their full featured client based on its capabilities and features. NPM is currently what most use today but there are viable clients up and coming that are showing promise and will be competitive this year. We don't need to solve this here, let the community do its thing and build software that offers these choices.

Vetting/moderation of modules for inclusion in a registry

This is a process. I think a community process could be established in conjunction with nodesecurity.io. This can be phased in with a process of working through existing packages. Currently, there is a lot of cruft and a large percentage of the current registry is never accessed.

A public module publishing process might work where a module could move through a community vetting process. Vetted modules could be held in a central registry by the node foundation. This central registry is not one used for everyone to fetch modules, but used as a source of updates to public registries. Something suitable like Rackspace Cloud or S3 could be used with an API by the node foundation to host the central registry.

Public registries

Public registries should adhere to some basic standards set out by the node foundation for trust. Such as they will only host vetted modules and must maintain their sources with updates by syncing with the central registry. A public registry should not need to be more than a source of static files, whether that is a CDN or peer that persists the data to seed it. It is possible for public registry to provide a static endpoint and peer simultaneously. Dat as an example of a peer to peer system requires a key for the data to access the content.

Let those interested in hosting registries freely host if they can meet these basic standards. The node foundations can create an icon or something to verify their status as a registry host.

Private registries

Here again let the community do its thing. Whether you want to use your own git repos, sinopia, commercial provider, etc.

Module discovery

Currently there is npm for searching modules, but I would encourage any company to use the source of modules to create innovative solutions for search and discovery. The volume of metadata is also not large for todays search software like elasticsearch etc to build something interesting.

package.json

The package.json is something that needs to be under community control for minimum standardized metadata properties. Formal proposals should need to come to make changes to this standard that are in the best interests of the community.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 24, 2016

@scriptjs Sorry, but your post again does not look like a technical proposal to me.

I doubt that manual verification of all modules before publishing them will work — who is going to do that? I doubt that «trust» and «standards» would work that way you suppose they would: either you should «trust» a lot of public registries, or your proposal wouldn't be distributed enough — you should, in fact, not trust them but sign the packages and make sure that those public registries could not meddle into packages content. I do not see how fast content moderation of already published modules would work in your system, if there would be a lot of those public registries. I also do not see how the emergency switch (aka «turn the whole thing down») would work in your model. I also don't like the idea of recommending multiple clients or giving the user an early choice there.

@joepie91 proposal (with splitting that into three groups — a central/replicated auth, a few read servers and various distribution nodes would work better, I think. Perhaps it would be good to add a possibility of using other methods of delivery as «distribution nodes» — e.g. tarballs from GitHub (given that those are signed by the auth «write» server), etc. The bad thing about directly installing unpublished versions from GitHub is that the tags on GitHub are not immutable, so auth server should store the information about which versions (tarballs) are signed.

Emergency switch could be introduced on the client-side, so that it would check the flag with auth/read servers (@joepie91 has a bit more to say about that), perhaps with an opt-out on the client, but that opt-out should give the user some grave warnings and require some non-trivial actions.

As for the usability — the auth server could offer GitHub auth (as one of the possible login methods) and get public keys directly from the user account on GitHub.

I also think that replacing the bundled npm with a «minimal viable client» would not be a great solution atm — it could introduce more problems than it solves. If we replace it with something, let's make sure that the new solution is superior.

Private registries and private registries hostings would be on their own, and that is fine enough.

@scriptjs

This comment has been minimized.

scriptjs commented Jan 24, 2016

@ChALkeR This is not a technical proposal, you are correct. It is a high level view of the elements of a system beginning with what is packaged with node. It frames responsibilities for a system so we can continue a technical discussion on the same page as to how we see this as a whole and who might be responsible for what.

I think it is well understood that registries need to be read only and that signed packages are a prerequisite. What we have today, however, is also a number of packages and cruft in NPM that should have never reached a public registry in the first place, yet is there and persisted. There needs to be some gatekeeping.

Here, there is the notion of a central authority for the modules. Publishing a module for first time could place it into a vetting/review queue. Vetted/trusted modules enter central registry that is operated by the Node Foundation. The function of the central registry is only for public registries to sync their content. Public registries offer read only access to the signed tarballs. Every public registry serves the same content.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 24, 2016

@scriptjs Are you saying that someone out there should manually review all the diffs between all the versions of all modules in that sparkling registry? That just sounds as an ideal approach, but there is no such amount of reviewers time available to achieve that.

@joepie91

This comment has been minimized.

joepie91 commented Jan 24, 2016

Regarding the emergency switch, the sanest implementation of that would probably be for the write server to propagate a (signed) 'shutdown signal' to the read servers, and have the clients simply rely on the read servers to tell them whether they can install packages.

This shutdown signal could be used in cases where some kind of wide-spread security issue were to make package installations unsafe. Ideally, it'd never be needed.

  • Why not let the clients talk to the write server? Because that makes the write server a single point of failure.

  • What if somebody attacks the write server to prevent propagation of the shutdown signal? This signal would be sent once, or upon connection of a read server. As read servers are expected to be running continuously, it is reasonable for the read server daemon to refuse to start if the write server is unavailable.

  • Why does it matter whether the shutdown signal gets propagated?

    • If we assume that "no write server available" means all is fine, an attacker can intentionally prevent clients from being notified of a dangerous situation, by attacking the write server.
    • If we assume that "no write server is available" means that something is wrong and shut down installations accordingly, this makes the write server a single point of failure, and an attack towards it would effectively take down the entire infrastructure. This (partially) defeats the point of having multiple read servers, by giving up redundancy.

    In this context, "one read server less" is the least impactful failure mode for the Node.js ecosystem.

I should note that the above is a result of @ChALkeR explaining to me the need to have a 'killswitch' - perhaps he could elaborate on the kind of scenario he is envisioning.


As for mutability of Git tags - I'm afraid that this is inevitable, and given that Git tags are only meant as a last-resort installation method, this could be an acceptable tradeoff.

Whereas I initially thought that adding a hash to the installation URI would resolve the matter, this would remove the possibility of using semantic versioning ranges, which I personally consider to be an essential part of NPM's packaging model, and necessary to make Git tags an 'equivalent approach' to regular dependency specifications.

Enterprises that need an absolute guarantee of immutability (ie. not able to trust the registry host either), already need to check in node_modules or use something like Sinopia inbetween, so I'd imagine that the loss of immutability guarantees on a small number of packages would not be a big problem.


@scriptjs I don't feel that a 'walled garden' would be a viable solution. Part of the reason why the Node.js ecosystem is so useful, is because everybody can publish modules. Reviewing all modules would introduce a significant delay, as well as manpower requirements (further increasing the dependency on the registry operator, which is precisely what we don't want).

A better solution would be to have two after-the-fact review queues - one that every package goes through to filter out the obviously malicious stuff (this may already exist), and one queue that people can request their modules to be placed into, for more in-depth review that can cover matters like security and code quality (as time is available). Neither of these queues need to be visible to the public, nor do they need to happen before package publication.

@scriptjs

This comment has been minimized.

scriptjs commented Jan 25, 2016

@ChALkeR No. I am not speaking of a form of detailed review here but something that would prevent cruft or malicious code from entering the registry. Currently there is zero barrier to publishing anything even if done in error.

I think at the very least there could be a scan of the package by an automated tool or an approach that would generate an issue where we let the community examine new modules that will entering the registry over some days.

This could work using a graduated program so that new authors will be slowed while trust is built. A module passes, the author/publisher earns some level of trust. We let trusted authors publish freely.

@joepie91 This is not meant to deter publishing by anyone at all. Only to be proactive rather than reactive.

@joepie91

This comment has been minimized.

joepie91 commented Jan 25, 2016

This is not meant to deter publishing by anyone at all. Only to be proactive rather than reactive.

Regardless of whether it's meant to do that, it will do that. The core reason why NPM grows as quickly as it does, is that there are absolutely no barriers to publishing things. A delay in publishing modules will discourage people from publishing anything at all.

@ashleygwilliams

This comment has been minimized.

ashleygwilliams commented Jan 25, 2016

this is absolutely true @joepie91. i would also like to point out the extremely negative effect this will have on the beginner experience. at a moment where Node.js is trying to improve the barrier to entry for new developers this would be an absolutely devastating move. surely some of it would be addressable by documentation, but that is something the Node.js is already struggling with.

i hear the concerns on this thread but don't see how this furthers any of the goals Node.js currently has, especially considering that there is no acute need for this to happen.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 25, 2016

@scriptjs

I am not speaking of a form of detailed review here but something that would prevent cruft or something malicious from entering the registry.

Sorry, but those are two mutually exclusive statements. For example, C will publish a package that exports a list of colors, let's say colors.js : module.exports = {red: '#f00', green: '#0f0', blue: '#00f'} and will call that as v1.0.0. This would not be cruft. That package will get moderated and approved. Now C publishes a new 1.0.1 version of that package and includes malware into it. Would your manual review notice that? And note that the users would be harmed even more, because once you say «we review packages» — they would expect packages being more secure, but that's not going to happen.

I think at the very least there could be a scan of the package by an automated tool or an approach that would generate an issue where we let the community examine new modules that will entering the registry over some days.

Automatic scans on the server hinting potentially dangereous packages would be good, yes. Delaying new modules (and module updates) by several days would be inacceptable.

This could work using a graduated program so that new authors will be slowed while trust is built. A module passes, the author/publisher earns some level of trust. We let trusted authors publish freely.

I am not convinced that this model would work. No one is going to review even all the new packages by unpopular/new authors. Also, «trust» is not absolute even for very popular packages — so the stuff in the registry would still be potentially insecure, and it would be bad if you will make it appear as if it is secure.

@formula1

This comment has been minimized.

formula1 commented Jan 25, 2016

I'm not entirely sure I am understanding this.

  • Centralized Registry (aka NPM) - Aka Read Servers, Aka resolves a name to a magnet
  • Centralized Authentication (@ChALkeR's suggestion) - ensures that every write is signed. All packages can be traced back to a point of origin. Part of Write server. Write Server needs to implement oAuth of some sort
  • Write Servers - Authenticates writes through the authentication system, Notifies Registry of the write? and is available through distribution nodes.
  • Distribution nodes - duplicate the data from writes and are a point of access
    • These distribution nodes can then be duplicated through peer to peer easing the load

Questions

  • Centralized Registry - If I were to start a seperate Registry from NPMs that does not choose to duplicate its data, how would I be accessible to cli users? Am I forced to use a centralized Authentication service? Can I use more than one means of Authentication? Can I choose to deny publishing packages? Can a publisher directly contact me to take a namespace?
  • Centralized Authentication - This is more for @ChALkeR but perhaps you can enlighten me. Are we attempting to have a method of signing? Signing seems to be a red herring unless it goes back to the centralized approach where one authority verifies. But centralization is also a matter of trust. Bringing it somewhat back to whether we should trust a registry since these are the individuals who resolve our packages. And competitive landscapes are arguably more trustworthy than monopolized. I'm dumb sometimes
  • Write Servers - These servers authenticate uploads? These servers notify registries? These are registries? If I am a write server, why would I notify anyone? I feel as though theres something missing here that I am not understanding. These write servers seem to be Registry Servers.
  • Distribution Nodes - Sounds sweet. All for it!

Other points

  • CLI verifying Registry directed them to Correct Package -Registry returns a checksum, registry provides public-auth token.
  • Version Ranges - By allowing Package creators to notify of the registry as they update, this would allow the Registry to map versions to links. In the case with Git commits, the registry would be able to parse the branches and commits for hashes to resolve to.
  • Git Semantic Versioning - Major versioning can be based of branches, minor likely based off commits (which would need to be parsed). This would require a package to be strict about naming and commit messages when bumping versions. Versioning could then be discovered in a somewhat gross version of binary search.
  • The Registry would notify the cli how they need to download the package.
    • Package resolution can also have a protocol attribute which defines the method of aquiring the package
    • The cli may choose to have multiple clients that can download a package (or require to download one) such as http, git, tor or any other method that may come up in the future
    • I think this is a good stepping stone considering I doubt its efficient, reliable or effective to make gigantic steps very quickly.
@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 25, 2016

@formula1

Centralized Authentication (@ChALkeR's suggestion)

It's not mine, I'm talking about this proposal by @joepie91:

Write server(s): One or more servers, controlled by a single entity, on a fully open-source software stack. These are the servers that actually authenticate users, accept package uploads, and decide what the registry looks like.

@formula1

This comment has been minimized.

formula1 commented Jan 25, 2016

@ChALkeR Ah, Just shows how confused I am! I see he put a lot of though into it but I'm having a hard time visualizing this.

Deleted the phone comment

Perhaps This is how i can understand it

  • uploads to a git server
  • notifies 1 or more registies of the change
  • each registry creates a standardized magnet link and orders their own distribution node to share
  • cli can go to any registry to get the exact same form of magnet link

Is this correct?

@formula1

This comment has been minimized.

formula1 commented Jan 25, 2016

Looking into this deeper

npm is far more awesome then I even realized. Thought it was good before

Torrent Tracking can be handled within node (node-gyp not necessary)

Git Server can be handled within node (node-gyp not necessary)

So from we have here are the basics necessary for a test example

npm publish

From the client Cli

  • git add --all
  • git commit -m 'test case'
  • git remote add publish http://testcasegit.com
  • git push publish - do your normal pushing
  • npm publish - This then notifies all trusted regsitries of the user
    • Creates a torrent file - https://www.npmjs.com/package/create-torrent
      • announce - All registries trusted by this registry are added
      • name - author.package-name.version.tar or something else pretty unique
      • createdBy - This registries name or the author
    • the git publish url
    • the current branch your working in
    • the current hash your working in

From Registry

  • Authenticates the Author - A matter of choice but a good idea if the author wants to stick around
  • downloads the git
  • checks out the branch
  • retreives name and version
    • verifies that this version isn't already stored
  • torrent file along with current distributor nodes is added to a table with indexes related to package name and version
  • sends back possible Distributor Nodes to send to

From the CLI - This I would like to change to where the distributor node connects directly to a git server and creates it independently. However, I am not sure if that is possible

  • concat togethor all distributor nodes provided by each registry
    • async.each dirtributor nodes
      • connect to the node
        • announce the hash
          • It auto downloads
  • Ideally, this will cause each distributor to take a bigger chunk out of each past the first upload
  • on finish - disconnect from all distributor nodes

npm install

From CLI

  • npm install package_name@version
  • makes request to all trusted registries

From Registry

  • looks up package name + versioning
  • Finds the tor file and distributor nodes
  • sends back file along with distributor nodes to connect to

From CLI

  • ensure all registries resolve to the same hash
  • connects to all distributor nodes that the registries provided (these will then will likely be connected to peers)
  • announce hash
  • wait for install
  • read package.json
    • for each dependency, if its not installed, do it again
@mbostock

This comment has been minimized.

mbostock commented Jan 26, 2016

I’ve been thinking about this a little bit and experimenting with a decentralized package manager:

https://github.com/mbostock/crom

In a nutshell, it’s convenient (essential, even) to use names when I’m installing a new dependency, but subsequently I want my package manager to resolve these names explicitly as URLs. That way, when my users install my code, they can install the right thing without needing a centralized repository. So, this:

crom install d3-voronoi@0.2

Captures all this:

{
  "dependencies": [
    {
      "name": "d3-voronoi",
      "owner": "d3",
      "version": "0.2.1",
      "range": "0.2",
      "url": "https://github.com/d3/d3-voronoi",
      "releaseUrl": "https://github.com/d3/d3-voronoi/releases/tag/v0.2.1",
      "sha": "1eb846e5b81ea7e25dab3184fa777a8db325d01146cdae02aa589b2349d162b8"
    }
  ]
}

Note that Crom supports semantic versioning by capturing the desired version range and being able to query the package URL to discover the current list of releases.

Please see the Crom README for more details. I’d love to help with this issue if I can!

@Martii

This comment has been minimized.

Martii commented Jan 26, 2016

@mbostock
Would the GitHub retrieval be smart enough to not redownload a dependency e.g. check the current hash perhaps? npm has this current issue which is why we have been asking maintainers to publish to npmjs.com to speed things up. (but that doesn't always work for a suggestion)

@formula1
I appreciate the ping... seems like some solid info here... I'll probably only interject if there is something that I don't understand or need to add something.

@pluma

This comment has been minimized.

pluma commented Jan 26, 2016

@Martii GitHub seems to use S3 as a storage backend for binaries and S3 uses ETags (which may or may not be MD5 checksums depending on a number of factors), so it could use that instead of the sha hash. However it'd be necessary to follow the redirect to get the ETag.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 26, 2016

If we want to really improve the package management, the replacement should be superiour to what we have now. GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

@mbostock

This comment has been minimized.

mbostock commented Jan 26, 2016

GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

I’d rephrase this as “decentralized package managers” or “internet-based package managers” in the sense that the mutability is not specific to GitHub—the internet itself is mutable by default.

What about implementing immutability as a service on top of a decentralized package management system? So, package authors still publish wherever they want, while a third party is responsible for either storing the hashes of the package contents for verification, or the contents themselves for immutable snapshots. (The current npm registry could serve this purpose, for example.)

That way, the package dependencies and the package management system wouldn’t be strongly tied to one centralized service, and there could be several services that compete to provide such functionality, similar to CDNs.

@mbostock

This comment has been minimized.

mbostock commented Jan 26, 2016

@Martii It doesn’t look like GitHub includes any content hashes with release assets, though you can get the commit sha from the associated git tag. If there were demand, I expect GitHub would be receptive to exporting content hashes if it meant a substantial reduction in their traffic.

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 26, 2016

@mbostock

I’d rephrase this as “decentralized package managers”

I wouldn't. Above in this thread was given an example of an decentralized package manager that gurantees that.

while a third party is responsible for either storing the hashes of the package contents for verification, or the contents themselves for immutable snapshots. (The current npm registry could serve this purpose, for example.)

Yes, that's what is required. But we have to get that this secure, decentralized, and not being a single point of failure.

@formula1

This comment has been minimized.

formula1 commented Jan 27, 2016

@mbostock Awesome stuff! a couple things though

  • Resolving should probably be in a seperate module. Downloading as seperate modules.
    • Pipeline would likely be as follow (though not completely confident about it yet)
      • Resolve from trusted registries (or directly link it locally)
      • Download with one method of distribution - This should create a movable folder
      • folder resolver moves the folder to the correct location - (which would be done for all modules)
      • for each dependency of this folder, do it again
  • Could you also include a dummy test using something like pushover? I'd like to run it
  • We don't know the type of install it is - Could be a torrent file (@joepie91 made an excellent roadmap for how it should probably be done. I'll have to edit the main issue), Could be a tarball, repo, folder (see: https://docs.npmjs.com/cli/install). Your current method expects git repos, which may be accurate for most installs. But we must consider all possible.
    • Perhaps adding a key such as type or download-type would be better. This would also likely allow us to install
  • We also don't know who resolved the package
    • Example: I install from http://alternative-repos.com - This returns github repo and sha
    • I save the repo and sha
    • After publishing, someone else installs my repo
    • The github repo moved to bitbucket - Returns 404
    • What do we do?
      • If we knew who resolved the package, we can then ask that specific registry
      • One great thing about using torrents is that nothing ever trully dissappears (unless everyone who has downloaded the package either doesn't seed it or deletes it)

If you have any disagreements or additions, feel free to share.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Jan 27, 2016

GitHub-based package managers fail to gurantee the immutability of package versions by themselves — so either the hash or signature has to be put directly in the deps (which wouldn't work with semver), or there needs to be some server or a distributed system of servers (e.g. p2p) that gurantees that.

The point about semver is a good one. It could perhaps be solved by having authors sign packages with their gpg key but that requires they set up a key first. Probably annoying for Windows and OS X developers because they won't normally have gpg installed.

It also doesn't solve the issue of efficiently figuring out what the latest semver-compatible release is but one issue at a time, eh? :-)

@ChALkeR

This comment has been minimized.

Member

ChALkeR commented Jan 27, 2016

It could perhaps be solved by having authors sign packages with their gpg key but that requires they set up a key first.

That won't help, authors themselves might replace a version which they already published, and that's not good. I know developers who review the changes in their dependencies, and having non-immutable package versions would nullify that possibility.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Jan 27, 2016

Right, if that is what mean by 'immutable', then yeah, key signing won't help. Blockchain time!

@formula1

This comment has been minimized.

formula1 commented Jan 27, 2016

Why would they want non-imutable (mutable?) package versions?

Ok, lets say for instance...

  • I am a package creator that messed up big time
  • I change my version
    • In npms case, its fine since there is one source of truth
    • In case of tor, a registry would respond with a sha1 or magnet uri. In these cases, it would also be kind of ok as the old version woud simply be circumvented. Additionally, the registry would be able to tell a distributor to "distrust" an sha1, basically going to the psuedo code where the install couldn't go through the distributors and would result back to the registry to find out correct location.

Or every install will initially hit the registry despite having distributor credentials. The registry would be the source of truth in these cases.

Not sure how blockchain fits into this but it sounds like something I don't think I can handle. Figuring out tor in node is crazy enough as it is

@mbostock

This comment has been minimized.

mbostock commented Jan 29, 2016

I’ve not experimented with it yet, but IPFS looks like a potential candidate; it implements a distribution protocol designed for immutability.

@formula1

This comment has been minimized.

formula1 commented Jan 30, 2016

Looks sweet. That seem decent as one method of distributing for sure. I'm looking at your repo, do you think that inheritance is necessary? A few functions definitely need to be implemented but I I'm not sure if state is necessary

@formula1

This comment has been minimized.

formula1 commented Feb 9, 2016

Pinging

Before we begin, I think its important for all of us to understand what npm's goals for 2016. Heres a link

Recently Isaac gave a 2016 state of npm email which I am confident all of you have recieved. Under the Foundations header, he specifies that npm would best under a foundation umbrella, decentralized away from a single company product but rather a cooperative product. In order to create the best possible cli tool possible, feedback from the best in the business would be greatly appreciated. The fact that we have so many package managers out there proves how important package managers are and the fact that there are requirements that are not meant by others. Npm however is focusing currently on stability (which is very important) on their own product which makes huge changes much more difficult.

Recently, ftp and magnet uris (with special thanks to @feross for many awesome torrent tools) have recently been implemented as download mechanisms in my proof of concept by means as a plugin. I think this is a good spot to show that I am willing to put muscle into this if there can be a consesus of requirements for what is needed. I am fully aware my request for an audience may be ignored so I will do my best showing off what features has proven invaluable and showing what I am planning. I'm any efforts you provide will likely be much more worth than any ragtag repo I put togethor.

What seems important from your managers

  • Flat Directory Structure - npmd has the --greedy option, ied uses a CAS design (though that may change to a name/version structure), bower is always flat
  • Maximizing Speed - npmd caches requests like a boss, ied tries to limit requests as much as possible and acts in parallel. Resolving is likely something that must be
  • Immutability and Updating - Both apm and npm enforce the versions to always be incrementing. Torrent infoHashes are based off psuedo-unique checksums, so this can be handled.
  • Signing and Authentication - Apm and bower use github, npm uses name and password, etc. In my opinion, this should be registry independent with the willingness to use a 3rd party
  • publishing synchronization - bower is really good about this where they allow you to register a git url. Npm allows you to post a tarball or a tarball url, this is ok though two packages may have the same version but entirely different contents on different registries. Ideally, all published repos are git repos with specific tags that can be checked out. This is mutable but that is up to the registry to distrust an author and distributors will have the same package.
  • Application Environment and main - jspm, bower and apm are all different repositories because they are interacted with quite differently than other packages. Additionally tools like browserify can compile node modules into browser while System enables standard es6 imports.
  • Installing from arbitrary sources (and semver) - Bower is also very good at this. From Semver to svn, they've made great strides to attempt to support as much as possible. here are their examples. Jspm is interesting in that they alias npm and/or run a short command into a long one, this is likely going to be very important in the long run when resolving subdependencies that the client's regstries don't have and short run to support backwards compatability. Npm also works with orgs which likely will also be pretty useful (though possibly nitche).

Heres a few features that the proof of concept is meant to introduce

  • Registry independent - Registries only point the user to a manner to download a tarball
  • download independence - Can use a variety of Urls that

What I will be working on next

  • Publishing to Multiple independent Registries - Ensure each resolve to the exact same package
  • Synchronization with registry naming - Ensure no conflicts within the network
  • Enabling a client to become a distributor - registries double as a tracker
  • Subdependencies get resolved by the registries of the package and the clients
@sheerun

This comment has been minimized.

sheerun commented Feb 9, 2016

Bower recently introduced Pluggable Resolvers that allow implementing such functionality by 3rd party without modifying the core (decentralized resolving). I suggest @npm to implement similar feature.

@formula1

This comment has been minimized.

formula1 commented Feb 9, 2016

Honestly, used to see Bower just as clientside npm. Ever since getting into this deeply I'm becoming more and more impressed with it as I look into it.

@guybedford

This comment has been minimized.

guybedford commented Feb 10, 2016

An IPFS-based registry for jspm has been on my mind a lot recently as well here, which when coupled with the ability to sign packages (ideally even to a blockchain-style mechanism), seems like an ideal decentralized package management system. The stance jspm takes is that we can't assume that any specific implementation for distributed transport would work. It would be like betting specifically on Bitcoin and putting all your savings into it... we probably want to let a few systems fight it out before deciding on the "one". Am I right in thinking this is what the discussion here is all about? If so I'd be very interested to chat further as I've been dreaming about the mechanisms quite a bit.

@formula1

This comment has been minimized.

formula1 commented Feb 10, 2016

@guybedford Yes. You hit a major point right on the head. The manner which a package is recieved is not of the concern of the cli tool, only that it is recieved correctly. Bower (which likely has the most users here outside npm), Crom ( a PoC by @mbostock ) and my own PoC (which was heavily influenced by other peoples work) have implemented plugin systems to allow the use of arbitrary download mechanisms. I fully encourage you to chat, I think this can benefit all parties however I find myself speaking alone.

Another major point, which jspm does well, bower enabled and npmd does through cache primarilly, is how a package gets resolved. Who resolves the packages is up to the client to decide (though more likely than not npm will be used). But giving the client the ability to go through groups of registries to find a package is important. Jspm is a great example of this.

  • Client requests a human-readable string
  • CLI tool makes requests to servers or checks a local registry/cache file
  • Registry/Cache File resolves this human-readable string into a github request, an npm request, magnet uri, ipfs uri, ftp uri
  • CLI Tool now uses a distribution module to handle this uri

This requires decentralization and falbacks of regsitries. This already exists today in the npm, bower, apm and jspm little json file acting somewhat independent of one another. The difference here is that the CLI tool should be the same but the registries can be as different as they please. This enables the sort of Foundation Umbrella that Isaac was talking about, where there still can exist competition though we are all focused on accomplishing the same goal, a badass CLI Package Manager.

@guybedford

This comment has been minimized.

guybedford commented Feb 10, 2016

@formula1 great to hear that and I agree with all the points, I'm just still not sure I understand what the exact focus is here. Is the goal to ensure package managers provide open transport implementations so that an ecosystem of registry systems can develop? Or are you trying to ensure a common language for package managers to handle this transport layer specifically? Or are you looking to implement your own decentralized package transport system, or at least ensure the possibility for one exists? I don't think the idea of an ecosystem of registries is a good thing in its own right..... the best thing for users is one registry. But I do think the idea of a completely distributed, decentralized and secure hash-verified registry is an interesting thing to pursue.

@Martii

This comment has been minimized.

Martii commented Feb 10, 2016

@guybedford

Or are you trying to ensure a common language for package managers to handle this transport layer specifically?

Not sure this would happen but a consideration none-the-less. Part of the reason why our organization chose npm over the others is maintaining different .jsons adds more to the workflow not to mention everyone has a different nomenclature. Our organization definitely wants to minimize the impact of maintaining but still allow others to compete... which I believe is one the goals presented in this issue.

@formula1

This comment has been minimized.

formula1 commented Feb 10, 2016

The two main goals here is to

  • Provide Open Transport Implementation - Each has different advantages
    • can utilize peer to peer systems - torrent, ipfs, etc. This has the advantage of having mutable packages would be difficult, arguably zero downtime so long as a node is up, faster and distributed bandwith, anybody can get involved
    • can utilize arbitrary source control - Bower has implemented SVN and Mercurial along side our favorite git.
    • can utilize arbitrary protocols - If a webserver uses ftp anyway, why have to implement another server on another port? Authentication may be based on something completely independent of the cli tool but utilized through a plugin
    • can utilize other Package Managers
  • Enable Registry Ecosystem - Provide Open Semver to Transport Resolution Implementation - This is for its own purposes
    • Standardized dependency implementation - @Martii made the biggest point. There was a isomorphic package that I really liked that refused to support browserify about a year ago because they already were supporting component and bower and the extra step was more of a pain than it was helpful. The ecosystem has since changed but I believe this is still a pain point for developers
    • Without competition there is no alternative - Recently github had a changing of the guard which sparked fanatical backlash on hacker news. Github is not likely to change any time soon but if they were to become pay to play or something that would be of detriment to the node community, would we still be here? Without an alternative that would be a definitive no. With alternatives, the individual can decide (potentially causing splintering like iojs did to node).
    • Prioritized Resolution - In the event I have three registries I want my packages to be resolved by, I can then prioritize them. First would likely be my cache, next would likely be a private company server, next would be public registry. This is an obvious use case however there may be more arbitrary use cases.
    • More than one Source of truth - If two or more Registries were to be considered of equal weight and resolve the same version with different checksums we now can consider one of these registries have mutated or an incorrect package. Currently, there is only one source of truth which we rely on to trust
    • Companies only have to spend what they want - In the case distribution is handled by the developer and/or peer to peer, a company like npm no longer needs to keep every single package. They can now store only what they want whether its package.jsons or readmes. They can store the entire package if they desire as well

I present my own vision so

  • that I have a platform to stand on while speaking with seasoned developers who have been solving this problem far longer than I have
  • To promote a culture of proactivity when approaching this problem. If I just were to yell at people instead of build I solve very little except to make others uncomfortable
  • To have my own understanding of what I expect/want from a package manager - The ability to make arbitrary plugins, the ability to add and remove registries, the ability to publish to multiple registries through a common write server, etc.

Outside of attempting to that my own vision means only as much as others allow it to be. And I respect the progress and dedication others have made much more than my own, though I'm very excited about this opportunity to be apart of it and possible implementation.

@guybedford

This comment has been minimized.

guybedford commented Feb 10, 2016

Thanks for the explanation, I was just not entirely sure how it related to the repo here, so apologies if I was a little too direct. It would be nice to take the time to just mention a couple of points along the lines of these ideas, if it is not going off track from the discussion too much.

The problem of transport, version lookup, and secure hash validation is actually a completely orthogonal one to the consumption of packages. This is why npm as a company can move to be more of a package provider than entirely relying on being the creator of a CLI tool. npm works just as well for jspm packages as it does for npm-CLI packages, even though they make completely different assumptions. A single distribution system of a high quality is a really good thing for users, and much better than having many registries with varying reliability.

In terms of availability, we should make the distinction between another registry (say the difference between npm and GitHub) and mirrors. The way to tackle package reliability is by using npm mirrors. This is a solved problem through npmrc configuration to use an existing mirror. Creating a new registry that happens to have copies of packages is not something that should be justified from a reliability perspective.

jspm provides an open API for registries as it seems Bower does as well now. But we have to be very careful not to run away with creating lots of different registries. Imagine if you installed a new package and it and all its dependencies ended up using 5 different registry systems. That's just multiplied the chance that the install will not work. It is very important for npm to maintain the role of the single dominant system, that is a really good thing, and alternative transports should be avoided as much as possible and any alternative registries given extreme skepticism. If a new registry comes out and everyone gets excited, and then more registries come out, that would be a very bad path to go down. (although new package managers are a good path despite churn)

We're trained to think of a single monopoly as a bad thing, and it often is, but in the case of npm it really is not. If npm was owned by evil scheming capitalists (and the truth couldn't be further from this), even then the only fears one might possibly have are for performance, availability, privacy and security. Performance and availability have been shown to be a massive focus for npm. Privacy of my package usage data is perhaps not an important concern at least currently. Security - verifying that the hash of the package I requested is the hash of the package I got is the hash of the package that was published could be handled by verification tools around npm. Apart from that, we have absolutely nothing to worry about, so we can happily continue to use npm even if it gets taken over by those scheming capitalists - there is no need to feel that we should be decentralizing power away from npm.

So that brings me to the final point - why do we want any of these things at all!?

And on that note, I will just mention again that the space of decentralized package transport without a central authority using IPFS or similar technology, combined with say a blockchain for authoritative DNS-like package ownership is a fascinating problem, and if anyone is interested in working in this area, I'd always be interested to have a chat. jspm would certainly be open to adding a registry along these lines in due course, but this is a far-future to be researched, prototyped and experimented with, but certainly not rushed out to users.

@formula1

This comment has been minimized.

formula1 commented Feb 11, 2016

  • lookup is completely orthogonal one to the consumption of packages - When using a cli tool to download a package not available on npm but is available on bower or jspm can resolve to a github location. Should I implement 3 different config files inorder to ensure all dependencies are met? Is it not possible to instruct what registries to target within the package.json? See bower search. The cli tool is about developer experience and lookups are an aspect of that experience
  • Availability - theres an important difference that needs to be made clear abut distribution availability and lookup availability. Currently, by providing an npm mirror you are both providing lookup and distribution. And if I want to provide a mirror to npm, I would thus have to copy possibly terrabytes of data. If it is only lookup, mirroring npm would be far simpler.
  • Complexity when adding registries - More likely than not registries will not be added to most developers computers. Even if this is made, npm will still be go to because it is reliable and comfortable. If more registries are added, this would likely be unique to that specific development team similar to how npm enables registries currently in case a private server is desired. If more than one registry becomes common, more likely than not, we as a community have grown comfortable with it (similar to how there are bower.jsons, component.json and package.json). Even today, apm downloads their packages from their own registry but dependencies are installed from npm. This is indicative that in some environments, the environment will initially resolve from one registry (dedicated or not) and then dependencies will be resolved from others. The use cases already exist, should cli efforts stay splintered?
  • Paranoia - Honestly, I want to avoid this point at all costs because you are right, npm is made up of really good people. Even watching their team meetings I could tell they were organized but relatable just churning out code like the rest of us. Nonetheless, monopolies aren't created because people were forced to buy bad products, I would argue (with minimal proof) that monopolies exist because a company makes really great products with others unable to find a nitch. Npm will be the leader because they are a great team that churns out a great product. However:
    • Privacy - Is a competitive edge others can offer that does not exist. But, honestly, it probably isn't that important for the majority of us. We have Google, Yahoo and Duck Duck Go for search engines. To get into the registry game, I would have to create a cli tool (or build an addon hoping someone else would use it), scrape npm for repositories, handle publish events and hope that someone comes along.
    • Security - Impossible to verify when they are almost the sole source of truth. When a repo is specified on the package, we can manually check it or make 2 downloads or use github as the distribution server. This can be handled due to a pull request, but component actually had issues with rate limits when implementing github only.
    • Nothing else to worry about - I would argue the goal should always move towards decentralization than monopoly. Not because there is nothing else to worry about but because being overly dependent can lead to bad things. I think npm will likely always be the alpha, but it shouldn't be because no one else can join, it should be because no one else is as good.
  • Why do we want any of these things?
    • Competition in Registries
      • I would assume this is not a feature you would use. But this is a feature that has caused two (or more) closed threads, forms of implementations and associated with common internet arguments about control. Are customers never supposed to ask for features? Are we to assume that those that request this are caustic to our community or are they seeing something that we refuse to believe?
    • Open Transport Implementation - I'd prefer not repeating myself
  • IPFS/Torrent/BlockChain/Peer to Peer - Feel free to comment on implementation or research. I still don't understand the point of using blockchain here. However, all progress starts with now. At the very least, with the right explaination, I can look into it and implement it in a proof of concept. From there you can see what you like, what you don't and I can prepare for a pull request or you can reimplement it as you please. Leaving it off to some other time seems like how this would be swept under the rug
@joshgarde

This comment has been minimized.

joshgarde commented Jun 24, 2017

Has there been any progress on this ever since last year? I've been following the Node community's progress towards a more decentralized package management system for a few days now, but I haven't found anything that truly offers independent operation from NPM. I've researched IPFS and I believe it's the best way to jumpstart any distributed package management system.

I was thinking of simply forking Yarn or NPM and building this system on top of their foundations (why reinvent the wheel?). I have a few ideas for implementation, but it seems everyone here has their own proposals too. If anyone has an existing implementation of their proposals, I'd love to contribute some code towards it. I think this is a good project to purse further.

@joshgarde

This comment has been minimized.

joshgarde commented Jul 3, 2017

I've just joined the IPMJS project and I'll be helping build out an implementation of a decentralized package manager over there for anyone wanting to follow the progress of this.

@dominictarr

This comment has been minimized.

dominictarr commented Jul 3, 2017

@joshgarde try this: %f9xdqPtVRm8j5nNjnN5wVoJl5gHSxnhBEGoS3T8Vr1g=.sha256
basically, since npm@5 has just a URL and a integrity hash, if you have a local server and know the sha1, sha256 or sha512 you can generate a package-lock.json that npm@5 will happily install. I tested this, but am waiting for the various bugs in npm@5 to be fixed.
But this means you don't actually have to fork npm (yay, because maintaining your own fork will be hell)!

I think the same approach with yarn might be more complicated, yarn is slightly less explicit: https://yarnpkg.com/blog/2017/05/31/determinism/

npm can use only the package-lock as the source of truth in order to build the final dependency graph whereas Yarn needs the accompanying package.json to seed it.

^ the key line.

@joshgarde

This comment has been minimized.

joshgarde commented Jul 3, 2017

My goal isn't just to have the ability to mirror NPM packages onto an IPFS network, but to take NPM's core functionality and decentralize it - downloading packages, uploading packages, and searching packages. Adding on to the core functionality: establish trust between package distributors and users, and offer protection against malicious packages.

@Trott

This comment has been minimized.

Member

Trott commented Nov 5, 2017

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

(Aside: This repo is dormant and might be a candidate for archiving.)

@Trott Trott closed this Nov 5, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.