Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: 'go install' should install executables in module mode outside a module #40276

Open
jayconrod opened this issue Jul 17, 2020 · 5 comments
Labels
Projects
Milestone

Comments

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Jul 17, 2020

Authors: Jay Conrod, Daniel Martí

Last Updated: 2020-08-10

Design doc: CL 243077
Comments on the CL are preferred over comments on this issue.

Abstract

Authors of executables need a simple, reliable, consistent way for users to
build and install exectuables in module mode without updating module
requirements in the current module's go.mod file.

Background

go get is used to download and install executables, but it's also responsible
for managing dependencies in go.mod files. This causes confusion and
unintended side effects: for example, the command
go get golang.org/x/tools/gopls builds and installs gopls. If there's a
go.mod file in the current directory or any parent, this command also adds a
requirement on the module golang.org/x/tools/gopls, which is usually not
intended. When GO111MODULE is not set, go get will also run in GOPATH mode
when invoked outside a module.

These problems lead authors to write complex installation commands such as:

(cd $(mktemp -d); GO111MODULE=on go get golang.org/x/tools/gopls)

Proposal

We propose augmenting the go install command to build and install packages
at specific versions, regardless of the current module context.

go install golang.org/x/tools/gopls@v0.4.4

To eliminate redundancy and confusion, we also propose deprecating and removing
go get functionality for building and installing packages.

Details

The new go install behavior will be enabled when an argument has a version
suffix like @latest or @v1.5.2. Currently, go install does not allow
version suffixes. When a version suffix is used:

  • go install runs in module mode, regardless of whether a go.mod file is
    present. If GO111MODULE=off, go install reports an error, similar to
    what go mod download and other module commands do.
  • go install acts as if no go.mod file is present in the current directory
    or parent directory.
  • No module will be considered the "main" module.
  • Errors are reported in some cases to ensure that consistent versions of
    dependencies are used by users and module authors. See Rationale below.
    • Command line arguments must not be meta-patterns (all, std, cmd)
      or local directories (./foo, /tmp/bar).
    • Command line arguments must refer to main packages (executables). If a
      argument has a wildcard (...), it will only match main packages.
    • Command line arguments must refer to packages in one module at a specific
      version. All version suffixes must be identical. The versions of the
      installed packages' dependencies are determined by that module's go.mod
      file (if it has one).
    • If that module has a go.mod file, it must not contain directives that
      would cause it to be interpreted differently if the module were the main
      module. In particular, it must not contain replace or exclude
      directives.

If go install has arguments without version suffixes, its behavior will not
change. It will operate in the context of the main module. If run in module mode
outside of a module, go install will report an error.

With these restrictions, users can install executables using consistent commands.
Authors can provide simple installation instructions without worrying about
the user's working directory.

With this change, go install would overlap with go get even more, so we also
propose deprecating and removing the ability for go get to install packages.

  • In Go 1.16, when go get is invoked outside a module or when go get is
    invoked without the -d flag with arguments matching one or more main
    packages, go get would print a deprecation warning recommending an
    equivalent go install command.
  • In a later release (likely Go 1.17), go get would no longer build or install
    packages. The -d flag would be enabled by default. Setting -d=false would
    be an error. If go get is invoked outside a module, it would print an error
    recommending an equivalent go install command.

Examples

# Install a single executable at the latest version
$ go install example.com/cmd/tool@latest

# Install multiple executables at the latest version
$ go install example.com/cmd/...@latest

# Install at a specific version
$ go install example.com/cmd/tool@v1.4.2

Current go install and go get functionality

go install is used for building and installing packages within the context of
the main module. go install reports an error when invoked outside of a module
or when given arguments with version queries like @latest.

go get is used both for updating module dependencies in go.mod and for
building and installing executables. go get also works differently depending
on whether it's invoked inside or outside of a module.

These overlapping responsibilities lead to confusion. Ideally, we would have one
command (go install) for installing executables and one command (go get) for
changing dependencies.

Currently, when go get is invoked outside a module in module mode (with
GO111MODULE=on), its primary purpose is to build and install executables. In
this configuration, there is no main module, even if only one module provides
packages named on the command line. The build list (the set of module versions
used in the build) is calculated from requirements in go.mod files of modules
providing packages named on the command line. replace or exclude directives
from all modules are ignored. Vendor directories are also ignored.

When go get is invoked inside a module, its primary purpose is to update
requirements in go.mod. The -d flag is often used, which instructs go get
not to build or install packages. Explicit go build or go install commands
are often better for installing tools when dependency versions are specified in
go.mod and no update is desired. Like other build commands, go get loads the
build list from the main module's go.mod file, applying any replace or
exclude directives it finds there. replace and exclude directives in other
modules' go.mod files are never applied. Vendor directories in the main module
and in other modules are ignored; the -mod=vendor flag is not allowed.

The motivation for the current go get behavior was to make usage in module
mode similar to usage in GOPATH mode. In GOPATH mode, go get would download
repositories for any missing packages into $GOPATH/src, then build and install
those packages into $GOPATH/bin or $GOPATH/pkg. go get -u would update
repositories to their latest versions. go get -d would download repositories
without building packages. In module mode, go get works with requirements in
go.mod instead of repositories in $GOPATH/src.

Rationale

Why can't go get clone a git repository and build from there?

In module mode, the go command typically fetches dependencies from a
proxy. Modules are distributed as zip files that contain sources for specific
module versions. Even when go connects directly to a repository instead of a
proxy, it still generates zip files so that builds work consistently no matter
how modules are fetched. Those zip files don't contain nested modules or vendor
directories.

If go get cloned repositories, it would work very differently from other build
commands. That causes several problems:

  • It adds complication (and bugs!) to the go command to support a new build
    mode.
  • It creates work for authors, who would need to ensure their programs can be
    built with both go get and go install.
  • It reduces speed and reliability for users. Modules may be available on a
    proxy when the original repository is unavailable. Fetching modules from a
    proxy is roughly 5-7x faster than cloning git repositories.

Why can't vendor directories be used?

Vendor directories are not included in module zip files. Since they're not
present when a module is downloaded, there's no way to build with them.

We don't plan to include vendor directories in zip files in the future
either. Changing the set of files included in module zip files would break
go.sum hashes.

Why can't directory replace directives be used?

For example:

replace example.com/sibling => ../sibling

replace directives with a directory path on the right side can't be used
because the directory must be outside the module. These directories can't be
present when the module is downloaded, so there's no way to build with them.

Why can't module replace directives be used?

For example:

replace example.com/mod v1.0.0 => example.com/fork v1.0.1-bugfix

It is technically possible to apply these directives. If we did this, we would
still want some restrictions. First, an error would be reported if more than one
module provided packages named on the command line: we must be able to identify
a main module. Second, an error would be reported if any directory replace
directives were present: we don't want to introduce a new configuration where
some replace directives are applied but others are silently ignored.

However, there are two reasons to avoid applying replace directives at all.

First, applying replace directives would create inconsistency for users inside
and outside a module. When a package is built within a module with go build or
go install, only replace directives from the main module are applied, not
the module providing the package. When a package is built outside a module with
go get, no replace directives are applied. If go install applied replace
directives from the module providing the package, it would not be consistent
with the current behavior of any other build command. To eliminate confusion
about whether replace directives are applied, we propose that go install
reports errors when encountering them.

Second, if go install applied replace directives, it would take power away
from developers that depend on modules that provide tools. For example, suppose
the author of a popular code generation tool gogen forks a dependency
genutil to add a feature. They add a replace directive pointing to their
fork of genutil while waiting for a PR to merge. A user of gogen wants to
track the version they use in their go.mod file to ensure everyone on their
team uses a consistent version. Unfortunately, they can no longer build gogen
with go install because the replace is ignored. The author of gogen might
instruct their users to build with go install, but then users can't track the
dependency in their go.mod file, and they can't apply their own require and
replace directives to upgrade or fix other transitive dependencies. The author
of gogen could also instruct their users to copy the replace directive, but
this may conflict with other require and replace directives, and it may
cause similar problems for users further downstream.

Why report errors instead of ignoring replace?

If go install ignored replace directives, it would be consistent with the
current behavior of go get when invoked outside a module. However, in
#30515 and related discussions, we found that
many developers are surprised by that behavior.

It seems better to be explicit that replace directives are only applied
locally within a module during development and not when users build packages
from outside the module. We'd like to encourage module authors to release
versions of their modules that don't rely on replace directives so that users
in other modules may depend on them easily.

If this behavior turns out not to be suitable (for example, authors prefer to
keep replace directives in go.mod at release versions and understand that
they won't affect users), then we could start ignoring replace directives in
the future, matching current go get behavior.

Should go.sum files be checked?

Because there is no main module, go install will not use a go.sum file to
authenticate any downloaded module or go.mod file. The go command will still
use the checksum database (sum.golang.org) to
authenticate downloads, subject to privacy settings. This is consistent with the
current behavior of go get: when invoked outside a module, no go.sum file is
used.

The new go install command requires that only one module may provide packages
named on the command line, so it may be logical to use that module's go.sum
file to verify downloads. This avoids a problem in
#28802, a related proposal to verify downloads
against all go.sum files in dependencies: the build can't be broken by one bad
go.sum file in a dependency.

However, using the go.sum from the module named on the command line only
provides a marginal security benefit: it lets us authenticate private module
dependencies (those not available to the checksum database) when the module on
the command line is public. If the module named on the command line is private
or if the checksum database isn't used, then we can't authenticate the download
of its content (including the go.sum file), and we must trust the proxy. If
all dependencies are public, we can authenticate all downloads without go.sum.

Why require a version suffix when outside a module?

If no version suffix were required when go install is invoked outside a
module, then the meaning of the command would depend on whether the user's
working directory is inside a module. For example:

go install golang.org/x/tools/gopls

When invoked outside of a module, this command would run in GOPATH mode,
unless GO111MODULE=on is set. In module mode, it would install the latest
version of the executable.

When invoked inside a module, this command would use the main module's go.mod
file to determine the versions of the modules needed to build the package.

We currently have a similar problem with go get. Requiring the version suffix
makes the meaning of a go install command unambiguous.

Why not a -g flag instead of @latest?

To install the latest version of an executable, the two commands below would be
equivalent:

go install -g golang.org/x/tools/gopls
go install golang.org/x/tools/gopls@latest

The -g flag has the advantage of being shorter for a common use case. However,
it would only be useful when installing the latest version of a package, since
-g would be implied by any version suffix.

The @latest suffix is clearer, and it implies that the command is
time-dependent and not reproducible. We prefer it for those reasons.

Compatibility

The go install part of this proposal only applies to commands with version
suffixes on each argument. go install reports an error for these, and this
proposal does not recommend changing other functionality of go install, so
that part of the proposal is backward compatible.

The go get part of this proposal recommends deprecating and removing
functionality, so it's certainly not backward compatible. go get -d commands
will continue to work without modification though, and eventually, the -d flag
can be dropped.

Parts of this proposal are more strict than is technically necessary (for
example, requiring one module, forbidding replace directives). We could relax
these restrictions without breaking compatibility in the future if it seems
expedient. It would be much harder to add restrictions later.

Implementation

We expect the implementation of this feature to be fairly small. Jay Conrod
will prepare a CL for Go 1.16, early in cycle.

CL 203279 was an early prototype that
implemented an equivalent flag. The only functionality missing in that CL is
error reporting for multiple modules on the command line and for replace and
exclude directives.

Appendix: FAQ

Why not apply replace directives from all modules?

In short, replace directives from different modules would conflict, and
that would make dependency management harder for most users.

For example, consider a case where two dependencies replace the same module
with different forks.

// in example.com/mod/a
replace example.com/mod/c => example.com/fork-a/c v1.0.0

// in example.com/mod/b
replace example.com/mod/c => example.com/fork-b/c v1.0.0

Another conflict would occur where two dependencies pin different versions
of the same module.

// in example.com/mod/a
replace example.com/mod/c => example.com/mod/c v1.1.0

// in example.com/mod/b
replace example.com/mod/c => example.com/mod/c v1.2.0

To avoid the possibility of conflict, the go command ignores replace
directives in modules other than the main module.

Modules are intended to scale to a large ecosystem, and in order for upgrades
to be safe, fast, and predictable, some rules must be followed, like semantic
versioning and import compatibility.
Not relying on replace is one of these rules.

How can module authors avoid replace?

replace is useful in several situations for local or short-term development,
for example:

  • Changing multiple modules concurrently.
  • Using a short-term fork of a dependency until a change is merged upstream.
  • Using an old version of a dependency because a new version is broken.
  • Working around migration problems, like golang.org/x/lint imported as
    github.com/golang/lint. Many of these problems should be fixed by lazy
    module loading (#36460).

replace is safe to use in a module that is not depended on by other modules.
It's also safe to use in revisions that aren't depended on by other modules.

  • If a replace directive is just meant for temporary local development by one
    person, avoid checking it in. The -modfile flag may be used to build with
    an alternative go.mod file. See also
    #26640 a feature request for a
    go.mod.local file containing replacements and other local modifications.
  • If a replace directive must be checked in to fix a short-term problem,
    ensure at least one release or pre-release version is tagged before checking
    it in. Don't tag a new release version with replace checked in (pre-release
    versions may be okay, depending on how they're used). When the go command
    looks for a new version of a module (for example, when running go get with
    no version specified), it will prefer release versions. Tagging versions lets
    you continue development on the main branch without worrying about users
    fetching arbitrary commits.
  • If a replace directive must be checked in to solve a long-term problem,
    consider solutions that won't cause issues for dependent modules. If possible,
    tag versions on a release branch with replace directives removed.

When would go install be reproducible?

The new go install command will build an executable with the same set of
module versions on every invocation if both the following conditions are true:

  • A specific version is requested in the command line argument, for example,
    go install example.com/cmd/foo@v1.0.0.
  • Every package needed to build the executable is provided by a module required
    directly or indirectly by the go.mod file of the module providing the
    executable. If the executable only imports standard library packages or
    packages from its own module, no go.mod file is necessary.

An executable may not be bit-for-bit reproducible for other reasons. Debugging
information will include system paths (unless -trimpath is used). A package
may import different packages on different platforms (or may not build at all).
The installed Go version and the C toolchain may also affect binary
reproducibility.

What happens if a module depends on a newer version of itself?

go install will report an error, as go get already does.

This sometimes happens when two modules depend on each other, and releases
are not tagged on the main branch. A command like go get example.com/m@master
will resolve @master to a pseudo-version lower than any release version.
The go.mod file at that pseudo-version may transitively depend on a newer
release version.

go get reports an error in this situation. In general, go get reports
an error when command line arguments different versions of the same module,
directly or indirectly. go install doesn't support this yet, but this should
be one of the conditions checked when running with version suffix arguments.

Appendix: usage of replace directives

In this proposal, go install would report errors for replace directives in
the module providing packages named on the command line. go get ignores these,
but the behavior may still surprise module authors and users. I've tried to
estimate the impact on the existing set of open source modules.

  • I started with a list of 359,040 main packages that Russ Cox built during an
    earlier study.
  • I excluded packages with paths that indicate they were homework, examples,
    tests, or experiments. 187,805 packages remained.
  • Of these, I took a random sample of 19,000 packages (about 10%).
  • These belonged to 13,874 modules. For each module, I downloaded the "latest"
    version go get would fetch.
  • I discarded repositories that were forks or couldn't be retrieved. 10,618
    modules were left.
  • I discarded modules that didn't have a go.mod file. 4,519 were left.
  • Of these:
    • 3982 (88%) don't use replace at all.
    • 71 (2%) use directory replace only.
    • 439 (9%) use module replace only.
    • 27 (1%) use both.
    • In the set of 439 go.mod files using module replace only, I tried to
      classify why replace was used. A module may have multiple replace
      directives and multiple classifications, so the percentages below don't add
      to 100%.
    • 165 used replace as a soft fork, for example, to point to a bug fix PR
      instead of the original module.
    • 242 used replace to pin a specific version of a dependency (the module
      path is the same on both sides).
    • 77 used replace to rename a dependency that was imported with another
      name, for example, replacing github.com/golang/lint with the correct path,
      golang.org/x/lint.
    • 30 used replace to rename golang.org/x repos with their
      github.com/golang mirrors.
    • 11 used replace to bypass semantic import versioning.
    • 167 used replace with k8s.io modules. Kubernetes has used replace to
      bypass MVS, and dependent modules have been forced to do the same.
    • 111 modules contained replace directives I couldn't automatically
      classify. The ones I looked at seemed to mostly be forks or pins.

The modules I'm most concerned about are those that use replace as a soft fork
while submitting a bug fix to an upstream module; other problems have other
solutions that I don't think we need to design for here. Modules using soft fork
replacements are about 4% of the the modules with go.mod files I sampled (165
/ 4519). This is a small enough set that I think we should move forward with the
proposal above.

@jayconrod jayconrod added this to the Proposal milestone Jul 17, 2020
@gopherbot gopherbot added the Proposal label Jul 17, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 17, 2020

Change https://golang.org/cl/243077 mentions this issue: design: add 40276-go-get-b.md

@peebs
Copy link

@peebs peebs commented Jul 17, 2020

Thanks for the proposal!

  • go get acts as if no go.mod file is present in the current directory or
    parent directory.
  • No module will be considered the "main" module.

Would it also be accurate to say that the go.mod being used as the main module is that of the module being built rather than no "main" module? Erroring on replaces is then simply a consequence of the implementation preventing replaces from being respected because we consider the module the "main". If the intention was not to have a main module, one might expect replaces to be ignored just like go get does today outside of a module. Regardless on how go get -b and go get outside a module end up unifying, it seems that the intention of go get -b today is to build reproducible binaries using the packages module as the main module or fail if unable to do so. I know the outcome is the same here, but I think the stated intention helps a user know what to expect of this mode.

  • The -u flag may be used together with -b. As usual, -u upgrades modules
    providing packages imported directly or indirectly by packages named on the
    command line

go get is used to download and install executables, but it's also responsible
for managing dependencies in go.mod files. This causes confusion and
unintended side effects

Since go get -b is a distinct mode of go get that is explicitly is not for managing dependencies of a local go.mod, my instinct here is that -u shouldn't be an option as it further confuses the distinction between using go get for building binaries vs managing dependencies. I at least don't know a scenario where I would want to build a project with updated dependencies and not view and work with the go.mod afterword. For something like that I would always have the repo cloned already as I am now building previously unused combination of dependencies with that project and would probably also want to run tests and maybe downgrade certain dependencies if necessary.

@jayconrod
Copy link
Contributor Author

@jayconrod jayconrod commented Jul 17, 2020

Would it also be accurate to say that the go.mod being used as the main module is that of the module being built rather than no "main" module? Erroring on replaces is then simply a consequence of the implementation preventing replaces from being respected because we consider the module the "main". If the intention was not to have a main module, one might expect replaces to be ignored just like go get does today outside of a module. Regardless on how go get -b and go get outside a module end up unifying, it seems that the intention of go get -b today is to build reproducible binaries using the packages module as the main module or fail if unable to do so. I know the outcome is the same here, but I think the stated intention helps a user know what to expect of this mode.

That's not quite the intent: there will actually be no main module.

In an earlier iteration on #30515, I suggested that go get -b would act the same as go get, but it would ignore the go.mod in the current directory: it would ignore replace. Many people found that unintuitive though. It seemed like we eventually had consensus that go get -b should report errors if there's more than one module on the command line or if there are replace directives. That way, you'll build the same binary whether it's: using go build inside the module providing the executable, using go get outside any module, or using go get -b anywhere.

Since go get -b is a distinct mode of go get that is explicitly is not for managing dependencies of a local go.mod, my instinct here is that -u shouldn't be an option as it further confuses the distinction between using go get for building binaries vs managing dependencies. I at least don't know a scenario where I would want to build a project with updated dependencies and not view and work with the go.mod afterword. For something like that I would always have the repo cloned already as I am now building previously unused combination of dependencies with that project and would probably also want to run tests and maybe downgrade certain dependencies if necessary.

There is some use for -u here: it lets you build and install an executable with updated dependencies. go get -u can already do that when run outside a module. I don't expect it's very common, but we get it for free, so I don't think there's a good reason to break it.

@peebs
Copy link

@peebs peebs commented Jul 17, 2020

had consensus that go get -b should report errors if there's more than one module on the command line or if there are replace directives. That way, you'll build the same binary whether it's: using go build inside the module providing the executable, using go get outside any module, or using go get -b anywhere.

I think we are on the same page here then regardless!

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Aug 7, 2020
@jayconrod jayconrod changed the title proposal: cmd/go: 'go get' flag to install executables in module mode outside a module proposal: cmd/go: 'go install' should install executables in module mode outside a module Aug 10, 2020
@jayconrod
Copy link
Contributor Author

@jayconrod jayconrod commented Aug 10, 2020

Based on discussion in https://groups.google.com/g/golang-tools/c/BRCgqwWLwoY/m/pKuttL9cAwAJ and on Slack, I'd like to take this in a different direction, so I've updated CL 243077 and the copy above.

Instead of adding go get -b, go install would have the proposed functionality when invoked with arguments with a version suffix. For example the command below would install gopls v0.4.4. It could be run from any directory and would ignore the module in the current directory if there is one.

go install golang.org/x/tools/gopls@v0.4.4

go get would no longer build or install packages. It would only be used for changing dependencies.

PTAL and comment on CL 243077 if you have any thoughts. This would be a significant change, but it would be good to have a clear separation of responsibility for go get and go install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.