Permalink
Switch branches/tags
Nothing to show
Find file Copy path
e71fd13 Mar 25, 2018
2 contributors

Users who have contributed to this file

@rsc @h8liu
576 lines (502 sloc) 28.4 KB

Proposal: Versioned Go Modules

Author: Russ Cox
Last Updated: March 20, 2018
Discussion: https://golang.org/issue/24301

Abstract

We propose to add awareness of package versions to the Go toolchain, especially the go command.

Background

The first half of the blog post Go += Package Versioning presents detailed background for this change. In short, it is long past time to add versions to the working vocabulary of both Go developers and our tools, and this proposal describes a way to do that.

Semantic versioning is the name given to an established convention for assigning version numbers to projects. In its simplest form, a version number is MAJOR.MINOR.PATCH, where MAJOR, MINOR, and PATCH are decimal numbers. The syntax used in this proposal follows the widespread convention of adding a “v” prefix: vMAJOR.MINOR.PATCH. Incrementing MAJOR indicates an expected breaking change. Otherwise, a later version is expected to be backwards compatible with earlier versions within the same MAJOR version sequence. Incrementing MINOR indicates a significant change or new features. Incrementing PATCH is meant to be reserved for very small, very safe changes, such as small bug fixes or critical security patches.

The sequence of vgo-related blog posts presents more detail about the proposal.

Proposal

I propose to add versioning to Go using the following approach.

  1. Introduce the concept of a Go module, which is a group of packages that share a common prefix, the module path, and are versioned together as a single unit. Most projects will adopt a workflow in which a version-control repository corresponds exactly to a single module. Larger projects may wish to adopt a workflow in which a version-control repository can hold multiple modules. Both workflows will be supported.

  2. Assign version numbers to modules by tagging specific commits with semantic versions such as v1.2.0. (See the Defining Go Modules post for details, including how to tag multi-module repositories.)

  3. Adopt semantic import versioning, in which each major version has a distinct import path. Specifically, an import path contains a module path, a version number, and the the path to a specific package inside the module. If the major version is v0 or v1, then the version number element must be omitted; otherwise it must be included.

    The packages imported as my/thing/sub/pkg, my/thing/v2/sub/pkg, and my/thing/v3/sub/pkg come from major versions v1, v2, and v3 of the module my/thing, but the build treats them simply as three different packages. A program that imports all three will have all three linked into the final binary, just as if they were my/red/pkg, my/green/pkg, and my/blue/pkg or any other set of three different import paths.

    Note that only the major version appears in the import path: my/thing/v1.2/sub/pkg is not allowed.

  4. Explicitly adopt the “import compatibility rule”:

    If an old package and a new package have the same import path,
    the new package must be backwards compatible with the old package.

    The Go project has encouraged this convention from the start of the project, but this proposal gives it more teeth: upgrades by package users will succeed or fail only to the extent that package authors follow the import compatibility rule.

    The import compatibility rule only applies to tagged releases starting at v1.0.0. Prerelease (vX.Y.Z-anything) and v0.Y.Z versions need not follow compatibility with earlier versions, nor do they impose requirements on future versions. In contrast, tagging a commit vX.Y.Z for X ≥ 1 explicitly indicates “users can expect this module to be stable.”

    In general, users should expect a module to follow the Go 1 compatibility rules once it reaches v1.0.0, unless the module's documentation clearly states exceptions.

  5. Record each module's path and dependency requirements in a go.mod file stored in the root of the module's file tree.

  6. To decide which module versions to use in a given build, apply minimal version selection: gather the transitive closure of all the listed requirements and then remove duplicates of a given major version of a module by keeping the maximum requested version, which is also the minimum version satisfying all listed requirements.

    Minimal version selection has two critical properties. First, it is trivial to implement and understand. Second, it never chooses a module version not listed in some go.mod file involved in the build: new versions are not incorporated simply because they have been published. The second property produces high-fidelity builds and makes sure that upgrades only happen when developers request them, never unexpectedly.

  7. Define a specific zip file structure as the “interchange format” for Go modules. The vast majority of developers will work directly with version control and never think much about these zip files, if at all, but having a single representation enables proxies, simplifies analysis sites like godoc.org or continuous integration, and likely enables more interesting tooling not yet envisioned.

  8. Define a URL schema for fetching Go modules from proxies, used both for installing modules using custom domain names and also when the $GOPROXY environment variable is set. The latter allows companies and individuals to send all module download requests through a proxy for security, availability, or other reasons.

  9. Allow running the go command in file trees outside GOPATH, provided there is a go.mod in the current directory or a parent directory. That go.mod file defines the mapping from file system to import path as well as the specific module versions used in the build. See the Versioned Go Commands post for details.

  10. Disallow use of vendor directories, except in one limited use: a vendor directory at the top of the file tree of the top-level module being built is still applied to the build, to continue to allow self-contained application repositories. (Ignoring other vendor directories ensures that Go returns to builds in which each import path has the same meaning throughout the build and establishes that only one copy of a package with a given import path is used in a given build.)

The “Tour of Versioned Go” blog post demonstrates how most of this fits together to create a smooth user experience.

Rationale

Go has struggled with how to incorporate package versions since goinstall, the predecessor to go get, was released eight years ago. This proposal is the result of eight years of experience with goinstall and go get, careful examination of how other languages approach the versioning problem, and lessons learned from Dep, the experimental Go package management tool released in January 2017.

A few people have asked why we should add the concept of versions to our tools at all. Packages do have versions, whether the tools understand them or not. Adding explicit support for versions lets tools and developers communicate more clearly when specifying a program to be built, run, or analyzed.

At the start of the process that led to this proposal, almost two years ago, we all believed the answer would be to follow the package versioning approach exemplified by Ruby's Bundler and then Rust's Cargo: tagged semantic versions, a hand-edited dependency constraint file known as a manifest, a machine-generated transitive dependency description known as a lock file, a version solver to compute a lock file satisfying the manifest, and repositories as the unit of versioning. Dep, the community effort led by Sam Boyer, follows this plan almost exactly and was originally intended to serve as the model for go command integration. Dep has been a significant help for Go developers and a positive step for the Go ecosystem.

Early on, we talked about Dep simply becoming go dep, serving as the prototype of go command integration. However, the more I examined the details of the Bundler/Cargo/Dep approach and what they would mean for Go, especially built into the go command, a few of the details seemed less and less a good fit. This proposal adjusts those details in the hope of shipping a system that is easier for developers to understand and to use.

Semantic versions, constraints, and solvers

Semantic versions are a reasonable convention for specifying software versions, and version control tags written as semantic versions have a clear meaning, but the semver spec critically does not prescribe how to build a system using them. What tools should do with the version information? Dave Cheney's 2015 proposal to adopt semantic versioning was eventually closed exactly because, even though everyone agreed semantic versions seemed like a good idea, we didn't know the answer to the question of what to do with them.

The Bundler/Cargo/Dep approach is one answer. Allow authors to specify arbitrary constraints on their dependencies. Build a given target by collecting all its dependencies recursively and finding a configuration satisfying all those constraints.

Unfortunately, the arbitrary constraints make finding a satisfying configuration very difficult. There may be many satisfying configurations, with no clear way to choose just one. For example, if the only two ways to build A are by using B 1 and C 2 or by using B 2 and C 1, which should be preferred, and how should developers remember? Or there may be no satisfying configuration. Also, it can be very difficult to tell whether there are many, one, or no satisfying configurations: allowing arbitrary constraints makes version solving problem an NP-complete problem, equivalent to solving SAT. In fact, most package managers now rely on SAT solvers to decide which packages to install. But the general problem remains: there may be many equally good configurations, with no clear way to choose between them, there may be a single best configuration, or there may be no good configurations, and it can be very expensive to determine which is the case in a given build.

This proposal's approach is a new answer, in which authors can specify only limited constraints on dependencies: only the minimum required versions. Like in Bundler/Cargo/Dep, this proposal builds a given target by collecting all dependencies recursively and then finding a configuration satisfying all constraints. However, unlike in Bundler/Cargo/Dep, the process of finding a satisfying configuration is trivial. As explained in the minimal version selection post, a satisfying configuration always exists, and the set of satisfying configurations forms a lattice with a unique minimum. That unique minimum is the configuration that uses exactly the specified version of each module, resolving multiple constraints for a given module by selecting the maximum constraint, or equivalently the minimum version that satisfies all constraints. That configuration is trivial to compute and easy for developers to understand and predict.

Build Control

A module's dependencies must clearly be given some control over that module's build. For example, if A uses dependency B, which uses a feature of dependency C introduced in C 1.5, B must be able to ensure that A's build uses C 1.5 or later.

At the same time, for builds to remain predictable and understandable, a build system cannot give dependencies arbitrary, fine-grained control over the top-level build. That leads to conflicts and surprises. For example, suppose B declares that it requires an even version of D, while C declares that it requires a prime version of D. D is frequently updated and is up to D 1.99. Using B or C in isolation, it's always possible to use a relatively recent version of D (D 1.98 or D 1.97, respectively). But when A uses both B and C, a SAT solver-based build silently selects the much older (and buggier) D 1.2 instead. To the extent that SAT solver-based build systems actually work, it is because dependencies don't choose to exercise this level of control. But then why allow them that control in the first place?

Although the hypothetical about prime and even versions is clearly unlikely, real problems do arise. For example, issue kubernetes/client-go#325 was filed in November 2017, complaining that the Kubernetes Go client pinned builds to a specific version of gopkg.in/yaml.v2 from September 2015, two years earlier. When a developer tried to use a new feature of that YAML library in a program that already used the Kubernetes Go client, even after attempting to upgrade to the latest possible version, code using the new feature failed to compile, because “latest” had been constrained by the Kubernetes requirement. In this case, the use of a two-year-old YAML library version may be entirely reasonable within the context of the Kubernetes code base, and clearly the Kubernetes authors should have complete control over their own builds, but that level of control does not make sense to extend to other developers' builds. The issue was closed after a change in February 2018 to update the specific YAML version pinned to one from July 2017. But the issue is not really “fixed”: Kubernetes still pins a specific, increasingly old version of the YAML library. The fundamental problem is that the build system allows the Kubernetes Go client to do this at all, at least when used as a dependency in a larger build.

This proposal aims to balance allowing dependencies enough control to ensure a successful build with not allowing them so much control that they break the build. Minimum requirements combine without conflict, so it is feasible (even easy) to gather them from all dependencies, and they make it impossible to pin older versions, as Kubernetes does. Minimal version selection gives the top-level module in the build additional control, allowing it to exclude specific module versions or replace others with different code, but those exclusions and replacements only apply when found in the top-level module, not when the module is a dependency in a larger build.

A module author is therefore in complete control of that module's build when it is the main program being built, but not in complete control of other users' builds that depend on the module. I believe this distinction will make this proposal scale to much larger, more distributed code bases than the Bundler/Cargo/Dep approach.

Ecosystem Fragmentation

Allowing all modules involved in a build to impose arbitrary constraints on the surrounding build harms not just that build but the entire language ecosystem. If the author of popular package P finds that dependency D 1.5 has introduced a change that makes P no longer work, other systems encourage the author of P to issue a new version that explicitly declares it needs D < 1.5. Suppose also that popular package Q is eager to take advantage of a new feature in D 1.5 and issues a new version that explicitly declares it needs D ≥ 1.6. Now the ecosystem is divided, and programs must choose sides: are they P-using or Q-using? They cannot be both.

In contrast, being allowed to specify only a minimum required version for a dependency makes clear that P's author must either (1) release a new, fixed version of P; (2) contact D's author to issue a fixed D 1.6 and then release a new P declaring a requirement on D 1.6 or later; or else (3) start using a fork of D 1.4 with a different import path. Note the difference between a new P that requires “D before 1.5” compared to “D 1.6 or later.” Both avoid D 1.5, but “D before 1.5” explains only which builds fail, while “D 1.6 or later” explains how to make a build succeed.

Semantic Import Versions

The example of ecosystem fragmentation in the previous section is worse when it involves major versions. Suppose the author of popular package P has used D 1.X as a dependency, and then popular package Q decides to update to D 2.X because it is a nicer API. If we adopt Dep's semantics, now the ecosystem is again divided, and programs must again choose sides: are they P-using (D 1.X-using) or Q-using (D 2.X-using)? They cannot be both. Worse, in this case, because D 1.X and D 2.X are different major versions with different APIs, it is completely reasonable for the author of P to continue to use D 1.X, which might even continue to be updated with features and bug fixes. That continued usage only prolongs the divide. The end result is that a widely-used package like D would in practice either be practically prohibited from issue version 2 or else split the ecosystem in half by doing so. Neither outcome is desirable.

Rust's Cargo makes a different choice from Dep. Cargo allows each package to specify whether a reference to D means D 1.X or D 2.X. Then, if needed, Cargo links both a D 1.X and a D 2.X into the final binary. This approach works better than Dep's, but users can still get stuck. If P exposes D 1.X in its own API and Q exposes D 2.X in its own API, then a single client package C cannot use both P and Q, because it will not be able to refer to both D 1.X (when using P) and D 2.X (when using Q). The dependency story in the semantic import versioning post presents an equivalent scenario in more detail. In that story, the base package manager starts out being like Dep, and the -fmultiverse flag makes it more like Cargo.

If Cargo is one step away from Dep, semantic import versioning is two steps away. In addition to allowing different major versions to be used in a single build, semantic import versioning gives the different major versions different names, so that there's never any ambiguity about which is meant in a given program file. Making the import paths precise about the expected semantics of the thing being imported (is it v1 or v2?) eliminates the possibility of problems like those client C experienced in the previous example.

More generally, in semantic import versioning, an import of my/thing asks for the semantics of v1.X of my/thing. As long as my/thing is following the import compatibility rule, that's a well-defined set of functionality, satisfied by the latest v1.X and possibly earlier ones (as constrained by go.mod). Similarly, an import of my/thing/v2 asks for the semantics of v2.X of my/thing, satisfied by the latest v2.X and possibly earlier ones (again constrained by go.mod). The meaning of the imports is clear, to both people and tools, from reading only the Go source code, without reference to go.mod. If instead we followed the Cargo approach, both imports would be my/thing, and the meaning of that import would be ambiguous from the source code alone, resolved only by reading go.mod.

Our article “About the go command” explains:

An explicit goal for Go from the beginning was to be able to build Go code using only the information found in the source itself, not needing to write a makefile or one of the many modern replacements for makefiles. If Go needed a configuration file to explain how to build your program, then Go would have failed.

It is an explicit goal of this proposal's design to preserve this property, to avoid making the general semantics of a Go source file change depending on the contents of go.mod. With semantic import versioning, if go.mod is deleted and recreated from scratch, the effect is only to possibly update to newer versions of imported packages, but still ones that are still expected to work, thanks to import compatibility. In contrast, if we take the Cargo approach, in which the go.mod file must disambiguate between the arbitrarily different semantics of v1 and v2 of my/thing, then go.mod becomes a required configuration file, violating the original goal.

More generally, the main objection to adding /v2/ to import paths is that it's a bit longer, a bit ugly, and it makes explicit a semantically important detail that other systems abstract away, which in turn induces more work for authors, compared to other systems, when they change that detail. But all of these were true when we introduced goinstall's URL-like import paths, and they've been a clear success. Before goinstall, programmers wrote things like import "igo/set". To make that import work, you had to know to first check out github.com/jacobsa/igo into $GOPATH/src/igo. The abbreviated paths had the benefit that if you preferred a different version of igo, you could check your variant into $GOPATH/src/igo instead, without updating any imports. But the abbreviated imports also had the very real drawbacks that a build trying to use both igo/set variants could not, and also that the Go source code did not record anywhere exactly which igo/set it meant. When goinstall introduced import "github.com/jacobsa/igo/set" instead, that made the imports a bit longer and a bit ugly, but it also made explicit a semantically important detail: exactly which igo/set was meant. The longer paths created a little more work for authors compared to systems that stashed that information in a single configuration file. But eight years later, no one notices the longer import paths, we've stopped seeing them as ugly, and we now rely on the benefits of being explicit about exactly which package is meant by a given import. I expect that once /v2/ elements in import paths are common in Go source files, the same will happen: we will no longer notice the longer paths, we will stop seeing them as ugly, and we will rely on the benefits of being explicit about exactly which semantics are meant by a given import.

Update Timing & High-Fidelity Builds

In the Bundler/Cargo/Dep approach, the package manager always prefers to use the latest version of any dependency. These systems use the lock file to override that behavior, holding the updates back. But lock files only apply to whole-program builds, not to newly imported libraries. If you are working on module A, and you add a new requirement on module B, which in turn requires module C, these systems will fetch the latest of B and then also the latest of C. In contrast, this proposal still fetches the latest of B (because it is what you are adding to the project explicitly, and the default is to take the latest of explicit additions) but then prefers to use the exact version of C that B requires. Although newer versions of C should work, it is safest to use the one that B did. Of course, if the build has a different reason to use a newer version of C, it can do that. For example, if A also imports D, which requires a newer C, then the build should and will use that newer version. But in the absence of such an overriding requirement, minimal version selection will build A using the exact version of C requested by B. If, later, a new version of B is released requesting a newer version of C, then when A updates to that newer B, C will be updated only to the version that the new B requires, not farther. The minimal version selection blog post refers to this kind of build as a “high-fidelity build.”

Minimal version selection has the key property that a recently-published version of C is never used automatically. It is only used when a developer asks for it explicitly. For example, the developer of A could ask for all dependencies, including transitive dependencies, to be updated. Or, less directly, the developer of B could update C and release a new B, and then the developer of A could update B. But either way, some developer working on some package in the build must take an explicit action asking for C to be updated, and then the update does not take effect in A's build until a developer working on A updates some dependency leading to C. Waiting until an update is requested ensures that updates only happen when developers are ready to test them and deal with the possibility of breakage.

Many developers recoil at the idea that adding the latest B would not automatically also add the latest C, but if C was just released, there's no guarantee it works in this build. The more conservative position is to avoid using it until the user asks. For comparison, the Go 1.9 go command does not automatically start using Go 1.10 the day Go 1.10 is released. Instead, users are expected to update on their own schedule, so that they can control when they take on the risk of things breaking. The reasons not to update automatically to the latest Go release applies even more to individual packages: there are more of them, and most are not tested for backwards compatibility as extensively as Go releases are.

If a developer does want to update all dependencies to the latest version, that's easy: go get -u. We may also add a go get -p that updates all dependencies to their latest patch versions, so that C 1.2.3 might be updated to C 1.2.5 but not to C 1.3.0. If the Go community as a whole reserved patch versions only for very safe or security-critical changes, then that -p behavior might be useful.

Compatibility

The work in this proposal is not constrained by the compatibility guidelines at all. Those guidelines apply to the language and standard library APIs, not tooling. Even so, compatibility more generally is a critical concern. It would be a serious mistake to deploy changes to the go command in a way that breaks all existing Go code or splits the ecosystem into module-aware and non-module-aware packages. On the contrary, we must make the transition as smooth and seamless as possible.

Module-aware builds can import non-module-aware packages (those outside a tree with a go.mod file) provided they are tagged with a v0 or v1 semantic version. They can also refer to any specific commit using a “pseudo-version” of the form v0.0.0-yyyymmddhhmmss-commit. The pseudo-version form allows referring to untagged commits as well as commits that are tagged with semantic versions at v2 or above but that do not follow the semantic import versioning convention.

Module-aware builds can also consume requirement information not just from go.mod files but also from all known pre-existing version metadata files in the Go ecosystem: GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, and vendor/vendor.json.

Existing tools like dep should have no trouble consuming Go modules, simply ignoring the go.mod file. It may also be helpful to add support to dep to read go.mod files in dependencies, so that dep users are unaffected as their dependencies move from dep to the new module support.

Implementation

A prototype of the proposal is implemented in a fork of the go command called vgo, available using go get -u golang.org/x/vgo. We will refine this implementation during the Go 1.11 cycle and merge it back into cmd/go in the main repository.

The plan, subject to proposal approval, is to release module support in Go 1.11 as an optional feature that may still change. The Go 1.11 release will give users a chance to use modules “for real” and provide critical feedback. Even though the details may change, future releases will be able to consume Go 1.11-compatible source trees. For example, Go 1.12 will understand how to consume the Go 1.11 go.mod file syntax, even if by then the file syntax or even the file name has changed. In a later release (say, Go 1.12), we will declare the module support completed. In a later release (say, Go 1.13), we will end support for go get of non-modules. Support for working in GOPATH will continue indefinitely.

Open issues (if applicable)

We have not yet converted large, complex repositories to use modules. We intend to work with the Kubernetes team and others (perhaps CoreOS, Docker) to convert their use cases. It is possible those conversions will turn up reasons for adjustments to the proposal as described here.