-
Notifications
You must be signed in to change notification settings - Fork 11
Categorizing by implementation #14
Comments
First of all, your work on understanding and categorizing package managers has been AMAZING! I had felt like we lacked a framework for tackling such a large topic but you’ve been able to break down all the elements of package management into something much more understandable and comparable across systems. Great work!
My concern with this method is that we would be translating a reference to a mutable resource (the current remote master) to an immutable one (the state of master at whatever time we run This is actually a general problem we have with any package manager that doesn’t keep an immutable reference. Most often, the resource is a URL to a tarball. In almost all cases there is a cache reference (git hash, etag, etc) but we’ll want to build a generic method of:
IMO, we should always hit this mechanism so that we catch mutations. We don’t want to do the work of converting these resources over and over again to IPFS/IPLD when not necessary but we also can’t cache something for any duration of time if the package manager assumes it’s talking about a live mutable reference. |
Re: Golang in the register-less category, there is a very recent proposal to add a notary for verifying module integrity: https://go.googlesource.com/proposal/+/master/design/25530-notary.md |
@lanzafame very interesting, that notary could effectively become an index, shifting Go into the Portable Registry category. Will be interesting to see if they stick with only allowing a single notary:
|
@mikeal agreed, there's going to be an ongoing challenge between supporting how communities currently manage their dependencies and encouraging them to change their tooling to be more predictable. When git tags are referenced it's probably ok to treat them as immutable even if they can always been force pushed and I believe specific git commit hashes are immutable (although can always be deleted entirely), it's really only git branches that are the problem. Golang has been the odd-one out here, as it's only just adding built-in support for declaring versions. Both SwiftPM and Carthage have built-in support for declaring versions, integrity checks and lockfiles, whilst there have been many different external tools built for Go to add that kind of support but none have gained enough popularity to change the behavior of the community as a whole. At this point I'd still recommend reducing the priority of attempting to completely put Go package management on IPFS, waiting until everything has settled down and the community has reached a consensus, hopefully that'll be towards the end of 2019 as Google starts to rollout more tooling like the Module index. In the short term we can still support Go package consumers by improving end user tooling like |
Some possible approaches for implementing IPFS support based on implementation category: File system basedApproach:Mirroring these registries into MFS and adding the root CID to dnslink/ipns then rsyncing updates on a regular basis along with transport plugins like https://github.com/JaquerEspeis/apt-transport-ipfs Problems
Database basedApproachIPFS support directly in mainline registries:
ProblemsRequires direct buy-in from package manager maintainers ApproachIPFS Wrappers for package manager clients:
Problems
ApproachHTTP proxy for package manager clients:
Problems
Git basedTODO |
@andrew What would you think about moving this, as an addendum to https://github.com/ipfs/package-managers/blob/master/glossary.md, into the docs directory? |
👍 |
This now lives in the docs folder of the repo: https://github.com/ipfs/package-managers/blob/master/docs/categories.md |
This comment (#14 (comment)) is really valuable and didn't make it into the docs. Think we can find a way for these useful thoughts to live on? |
@momack2 Do you mean what’s in the docs directory at https://github.com/ipfs/package-managers/blob/master/docs/categories.md or are you referring to a different piece of content? |
I'm specifically referencing the #14 (comment) with the approaches and problems for each area. AFAIK that isn't covered in the docs you linked. |
Separating comment #14 (comment) into its own document per @momack2 note in #14 (comment)
Separated this comment into a new document and referenced it in the docs index. @andrew -- feel free to append/amend as you see fit. |
A slightly different approach to categorizing package managers than outlined in the Glossary that I've been thinking about with regards to IPFS implementations specifically.
File-system based
This maps closely to the Multi-Registry category.
Many system package managers (APT, apk, RPM pacman, portage), plus some of the older language package managers (Maven, CPAN, CRAN) are literally a network attached folder full of files and other folders, often exposed over http, ftp etc.
Metadata is also stored as files so everything is quite self-contained and easily mirrored using rsync.
This style of registry maps nicely onto IPFS MFS and unixfs, it also seems like most sourcecode and binaries are stored within tar/zip/ar files to preserve any file permissions when downloaded over http, which conveniently means we don't need to wait for unixfs-v2 before implementing things.
Essentially, from IPFS point of view, all of these package managers end up having a very similar API (unixfs) that needs to be implemented, then the clients decide how they want to organise the files and metadata within that top level folder. A number of existing attempts take this approach (arch-mirror, apt-transport-ipfs, Gentoo-distfiles-IPFS)
Database based
This maps closely to the Centralized Registry category.
Many newer language package managers (rubygems, npm, Packagist, PyPI, NuGet, Cargo, bower) run a database-backed web application that handles authentication, uploading, provides APIs for the clients to list packages and versions and other metadata on demand.
Actual package contents is often hosted on s3 and requests to download packages may be proxied through the application to track download statistics, or redirected to a CDN link.
Mirroring these package managers usually requires either trawling package list APIs and recursively downloading packages and their metadata, or, if the registry provides it, downloading a dump of the registries database and running a copy of the web application or a slimmed down alternative that provides a similar API.
Unlike the file-system based package managers, almost every database based registry implements it's own, unique API for querying metadata and publishing packages.
This may explain why there are generally less public mirrors available for Centralized/database-backed package managers as there's a lot more overhead and complexity in keeping a mirror up to date than in the file-system based registries.
The one shared attribute they all share is that clients communicate with registries remotely via https urls, which can be proxied locally and backed by IPFS. A few existing implementations take this approach (npm-on-ipfs, dpip), as well as general purpose artifact stores like Sonatype Nexus and Artifactory.
Git based
This maps to the Portable Registry and the Registry-less categories.
Portable Registries
A number of Portable registries (Homebrew, CocoaPods) use Git (usually GitHub) as a database rather than a traditional centralized database, often as a way to avoid becoming a full time, on-call DBA for their community.
One notable exception is Cargo's main registry, https://crates.io, which has parts of both a portable registry in https://github.com/rust-lang/crates.io-index and central database in https://github.com/rust-lang/crates.io
With Homebrew, PRs can be opened directly on their GitHub repository database to add and update Formula files, which contain the metadata for a package, including links to the source and compiled binaries with integrity hashes.
With CocoaPods, you used to be able to send PRs to their GitHub repository database but after they merged a PR publishing a new version by someone who shouldn't have been able to, they implemented a separate web service called Trunk to handle adding new version to the git database.
Similarly with Crates.io, their GitHub repository database is updated automatically whenever someone publishes a new version via the crates.io website.
In all three of these cases, the end user only ever uses data from the latest commit that they have locally, tags and branches are not utilized and in the case of Cocoapods and Cargo, the history of the repository is not used for previous versions, in fact end users often do a shallow clone (
git clone --depth 1
so don't even have history data to go back on.Homebrew only keeps the latest version of a Formula in it's database, but does clone the full repository (243.22 MiB as of February 2019) so users can check out previous revisions of the repository to install old versions of a Formula, although it's not encouraged (Formula cannot declare a particular version of a dependency) because the speed of operations on the large repo are slow.
When it comes to mapping these registries onto IPFS, git is used both as a transport protocol and storage mechanism, git-remote-ipld should be able to help with integration.
There is a restriction that files stored in those repositories can't be larger than 2MiB but it doesn't look like any of those three databases store metadata files larger than a few Kb.
But none of the git repository databases actually contain the code of the releases, usually they have URIs (homebrew can reference SVN, CVS, Git and other protocols, not just http), which are hosted elsewhere, often GitHub, but could be arbitrary.
For IPFS to host both the registry and source code, each one of these remote URIs will need to be referenced either along side an IPFS CID (see #12) or loaded via some kind of http proxy extension added to the client.
Registry-less
Some package managers (Go, Carthage, Swiftpm) do away with hosted registries all together, instead preferring to declare dependencies as fully qualified http urls or shortcuts for GitHub urls (owner/repo-name for example).
These package managers often have support for checking to see if the url is a git repository and then querying for git tags as the list of published, named versions, and git commits as a full list of all possible versions (named by git commit sha and/or branch).
The end result is instead of a single registry, these package managers have thousands of single project registries, and rely on APIs in Git and/or GitHub for metadata on releases.
When it comes to mapping these package managers onto IPFS, putting any individual repository on IPFS is relatively easy with git-remote-ipld, again with the restriction of 2MiB files, luckily none of the registry-less package managers support sharing binaries in the same way.
Rather the problem comes when declaring dependencies, which are often simple URIs within source code:
this can be swapped out with an IPFS CID, as seen with gx :
But the original metadata of the package, such as upstream source and version number) is lost.
In addition, this import functionality is baked into the language implementation itself, making changes much more difficult than in external tooling.
The text was updated successfully, but these errors were encountered: