Skip to content

Commit

Permalink
cmd/go: validate pseudo-versions against module paths and revision me…
Browse files Browse the repository at this point in the history
…tadata

Previously, most operations involving pseudo-versions allowed any
arbitrary combination of version string and date, and would resolve to
the underlying revision (typically a Git commit hash) as long as that
revision existed.

There are a number of problems with that approach:

• The pseudo-version participates in minimal version selection. If its
  version prefix is inaccurate, the pseudo-version may appear to have
  higher precedence that the releases that follow it, effectively
  “pinning” the module to that commit. For release tags, module
  authors are the ones who make the decision about release tagging;
  they should also have control over the pseudo-version precedence
  within their module.

• The commit date within the pseudo-version provides a total order
  among pseudo-versions. If it is not accurate, the pseudo-version
  will sort into the wrong place relative to other commits with the
  same version prefix.

To address those problems, this change restricts the pseudo-versions
that the 'go' command accepts, rendering some previously
accepted-but-not-canonical versions invalid. A pseudo-version is now
valid only if all of:

1. The tag from which the pseudo-version derives points to the named
   revision or one of its ancestors as reported by the underlying VCS
   tool, or the pseudo-version is not derived from any tag (that is,
   has a "vX.0.0-" prefix before the date string and uses the lowest
   major version appropriate to the module path).

2. The date string within the pseudo-version matches the UTC timestamp
   of the revision as reported by the underlying VCS tool.

3. The short name of the revision within the pseudo-version (such as a
   Git hash prefix) is the same as the short name reported by the
   underlying cmd/go/internal/modfetch/codehost.Repo. Specifically, if
   the short name is a SHA-1 prefix, it must use the same number of
   hex digits (12) as codehost.ShortenSHA1.

4. The pseudo-version includes a '+incompatible' suffix only if it is
   needed for the corresponding major version, and only if the
   underlying module does not have a go.mod file.

We believe that all releases of the 'go' tool have generated
pseudo-versions that meet these constraints. However, a few
pseudo-versions edited by hand or generated by third-party tools do
not. If we discover invalid-but-benign pseudo-versions in widely-used
existing dependencies, we may choose to add a whitelist for those
specific path/version combinations.

―

To work around invalid dependencies in leaf modules, users may add a
'replace' directive from the invalid version to its valid equivalent.
Note that the go command's go.mod parser automatically resolves commit
hashes found in 'replace' directives to the appropriate
pseudo-versions, so in most cases one can write something like:

	replace github.com/docker/docker v1.14.0-0.20190319215453-e7b5f7dbe98c => github.com/docker/docker e7b5f7dbe98c

and then run any 'go' command (such as 'go list' or 'go mod tidy') to
resolve it to an appropriate pseudo-version. Note that the invalid
version will still be used in minimal version selection, so this use
of 'replace' directives is an incomplete workaround.

―

One of the common use cases for higher-than-tagged pseudo-versions is
for projects that do parallel development on release branches. For
example, if a project cuts a 'v1.2' release branch at v1.2.0, they may
want future commits on the main branch to show up as pre-releases for
v1.3.0 rather than for v1.2.1 — especially if v1.2.1 is already tagged
on the release branch. (On the other hand, a backport of a patch to
the v1.2 branch should not show up as a pre-release for v1.3.0.)

To address this use-case, module authors can make use of our existing
support for pseudo-versions derived from pre-release tags: if the
author adds an explicit pre-release tag (such as 'v1.3.0-devel') to
the first commit after the branch, then the pseudo-versions for that
commit and its descendents will be derived from that tag and will sort
appropriately in version selection.

―

Updates #27171
Fixes #29262
Fixes #27173
Fixes #32662
Fixes #32695

Change-Id: I0d50a538b6fdb0d3080aca9c9c3df1040da1b329
Reviewed-on: https://go-review.googlesource.com/c/go/+/181881
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
  • Loading branch information
Bryan C. Mills committed Jun 21, 2019
1 parent 851616d commit 1803ab1
Show file tree
Hide file tree
Showing 20 changed files with 1,170 additions and 201 deletions.
58 changes: 58 additions & 0 deletions doc/go1.13.html
Expand Up @@ -161,6 +161,64 @@ <h2 id="tools">Tools</h2>
TODO
</p>

<h3 id="modules">Modules</h3>

<h4 id="version-validation">Version validation</h4><!-- CL 181881 -->

<p>
When extracting a module from a version control system, the <code>go</code>
command now performs additional validation on the requested version string.
</p>

<p>
The <code>+incompatible</code> version annotation bypasses the requirement
of <a href="/cmd/go/#hdr-Module_compatibility_and_semantic_versioning">semantic
import versioning</a> for repositories that predate the introduction of
modules. The <code>go</code> command now verifies that such a version does not
include an explicit <code>go.mod</code> file.
</p>

<p>
The <code>go</code> command now verifies the mapping
between <a href="/cmd/go#hdr-Pseudo_versions">pseudo-versions</a> and
version-control metadata. Specifically:
<ul>
<li>The version prefix must be derived from a tag on the named revision or
one of its ancestors, or be of the form <code>vX.0.0</code>.</li>

<li>The date string must match the UTC timestamp of the revision.</li>

<li>The short name of the revision must use the same number of characters as
what the <code>go</code> command would generate. (For SHA-1 hashes as used
by <code>git</code>, a 12-digit prefix.)</li>
</ul>
</p>

<p>
If the main module directly requires a version that fails the above
validation, a corrected version can be obtained by redacting the version to
just the commit hash and re-running a <code>go</code> command such as <code>go
list -m all</code> or <code>go mod tidy</code>. For example,
<pre>require github.com/docker/docker v1.14.0-0.20190319215453-e7b5f7dbe98c</pre>
can be redacted to
<pre>require github.com/docker/docker e7b5f7dbe98c</pre>
which resolves to
<pre>require github.com/docker/docker v0.7.3-0.20190319215453-e7b5f7dbe98c</pre>
</p>

<p>
If the main module has a transitive requirement on a version that fails
validation, the invalid version can still be replaced with a valid one through
the use of a <a href="/cmd/go/#hdr-The_go_mod_file"><code>replace</code>
directive</a> in the <code>go.mod</code> file of
the <a href="/cmd/go/#hdr-The_main_module_and_the_build_list">main module</a>.
If the replacement is a commit hash, it will be resolved to the appropriate
pseudo-version. For example,
<pre>replace github.com/docker/docker v1.14.0-0.20190319215453-e7b5f7dbe98c => github.com/docker/docker e7b5f7dbe98c</pre>
resolves to
<pre>replace github.com/docker/docker v1.14.0-0.20190319215453-e7b5f7dbe98c => github.com/docker/docker v0.7.3-0.20190319215453-e7b5f7dbe98c</pre>
</p>

<h3 id="compiler">Compiler toolchain</h3>

<p><!-- CL 170448 -->
Expand Down
2 changes: 1 addition & 1 deletion src/cmd/go/internal/modconv/convert_test.go
Expand Up @@ -128,7 +128,7 @@ func TestConvertLegacyConfig(t *testing.T) {

{
// golang.org/issue/24585 - confusion about v2.0.0 tag in legacy non-v2 module
"github.com/fishy/gcsbucket", "v0.0.0-20150410205453-618d60fe84e0",
"github.com/fishy/gcsbucket", "v0.0.0-20180217031846-618d60fe84e0",
`module github.com/fishy/gcsbucket
require (
Expand Down
18 changes: 5 additions & 13 deletions src/cmd/go/internal/modfetch/cache.go
Expand Up @@ -216,29 +216,21 @@ func (r *cachingRepo) Latest() (*RevInfo, error) {
return &info, nil
}

func (r *cachingRepo) GoMod(rev string) ([]byte, error) {
func (r *cachingRepo) GoMod(version string) ([]byte, error) {
type cached struct {
text []byte
err error
}
c := r.cache.Do("gomod:"+rev, func() interface{} {
file, text, err := readDiskGoMod(r.path, rev)
c := r.cache.Do("gomod:"+version, func() interface{} {
file, text, err := readDiskGoMod(r.path, version)
if err == nil {
// Note: readDiskGoMod already called checkGoMod.
return cached{text, nil}
}

// Convert rev to canonical version
// so that we use the right identifier in the go.sum check.
info, err := r.Stat(rev)
if err != nil {
return cached{nil, err}
}
rev = info.Version

text, err = r.r.GoMod(rev)
text, err = r.r.GoMod(version)
if err == nil {
checkGoMod(r.path, rev, text)
checkGoMod(r.path, version, text)
if err := writeDiskGoMod(file, text); err != nil {
fmt.Fprintf(os.Stderr, "go: writing go.mod cache: %v\n", err)
}
Expand Down
32 changes: 24 additions & 8 deletions src/cmd/go/internal/modfetch/codehost/codehost.go
Expand Up @@ -79,14 +79,16 @@ type Repo interface {
// nested in a single top-level directory, whose name is not specified.
ReadZip(rev, subdir string, maxSize int64) (zip io.ReadCloser, actualSubdir string, err error)

// RecentTag returns the most recent tag at or before the given rev
// with the given prefix. It should make a best-effort attempt to
// find a tag that is a valid semantic version (following the prefix),
// or else the result is not useful to the caller, but it need not
// incur great expense in doing so. For example, the git implementation
// of RecentTag limits git's search to tags matching the glob expression
// "v[0-9]*.[0-9]*.[0-9]*" (after the prefix).
RecentTag(rev, prefix string) (tag string, err error)
// RecentTag returns the most recent tag on rev or one of its predecessors
// with the given prefix and major version.
// An empty major string matches any major version.
RecentTag(rev, prefix, major string) (tag string, err error)

// DescendsFrom reports whether rev or any of its ancestors has the given tag.
//
// DescendsFrom must return true for any tag returned by RecentTag for the
// same revision.
DescendsFrom(rev, tag string) (bool, error)
}

// A Rev describes a single revision in a source code repository.
Expand All @@ -105,6 +107,20 @@ type FileRev struct {
Err error // error if any; os.IsNotExist(Err)==true if rev exists but file does not exist in that rev
}

// UnknownRevisionError is an error equivalent to os.ErrNotExist, but for a
// revision rather than a file.
type UnknownRevisionError struct {
Rev string
}

func (e *UnknownRevisionError) Error() string {
return "unknown revision " + e.Rev
}

func (e *UnknownRevisionError) Is(err error) bool {
return err == os.ErrNotExist
}

// AllHex reports whether the revision rev is entirely lower-case hexadecimal digits.
func AllHex(rev string) bool {
for i := 0; i < len(rev); i++ {
Expand Down
105 changes: 88 additions & 17 deletions src/cmd/go/internal/modfetch/codehost/git.go
Expand Up @@ -10,6 +10,7 @@ import (
"io"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"sort"
"strconv"
Expand Down Expand Up @@ -318,7 +319,7 @@ func (r *gitRepo) stat(rev string) (*RevInfo, error) {
hash = rev
}
} else {
return nil, fmt.Errorf("unknown revision %s", rev)
return nil, &UnknownRevisionError{Rev: rev}
}

// Protect r.fetchLevel and the "fetch more and more" sequence.
Expand Down Expand Up @@ -378,17 +379,30 @@ func (r *gitRepo) stat(rev string) (*RevInfo, error) {

// Last resort.
// Fetch all heads and tags and hope the hash we want is in the history.
if err := r.fetchRefsLocked(); err != nil {
return nil, err
}

return r.statLocal(rev, rev)
}

// fetchRefsLocked fetches all heads and tags from the origin, along with the
// ancestors of those commits.
//
// We only fetch heads and tags, not arbitrary other commits: we don't want to
// pull in off-branch commits (such as rejected GitHub pull requests) that the
// server may be willing to provide. (See the comments within the stat method
// for more detail.)
//
// fetchRefsLocked requires that r.mu remain locked for the duration of the call.
func (r *gitRepo) fetchRefsLocked() error {
if r.fetchLevel < fetchAll {
// TODO(bcmills): should we wait to upgrade fetchLevel until after we check
// err? If there is a temporary server error, we want subsequent fetches to
// try again instead of proceeding with an incomplete repo.
r.fetchLevel = fetchAll
if err := r.fetchUnshallow("refs/heads/*:refs/heads/*", "refs/tags/*:refs/tags/*"); err != nil {
return nil, err
return err
}
r.fetchLevel = fetchAll
}

return r.statLocal(rev, rev)
return nil
}

func (r *gitRepo) fetchUnshallow(refSpecs ...string) error {
Expand All @@ -411,7 +425,7 @@ func (r *gitRepo) fetchUnshallow(refSpecs ...string) error {
func (r *gitRepo) statLocal(version, rev string) (*RevInfo, error) {
out, err := Run(r.dir, "git", "-c", "log.showsignature=false", "log", "-n1", "--format=format:%H %ct %D", rev, "--")
if err != nil {
return nil, fmt.Errorf("unknown revision %s", rev)
return nil, &UnknownRevisionError{Rev: rev}
}
f := strings.Fields(string(out))
if len(f) < 2 {
Expand Down Expand Up @@ -648,7 +662,7 @@ func (r *gitRepo) readFileRevs(tags []string, file string, fileMap map[string]*F
return missing, nil
}

func (r *gitRepo) RecentTag(rev, prefix string) (tag string, err error) {
func (r *gitRepo) RecentTag(rev, prefix, major string) (tag string, err error) {
info, err := r.Stat(rev)
if err != nil {
return "", err
Expand Down Expand Up @@ -681,7 +695,7 @@ func (r *gitRepo) RecentTag(rev, prefix string) (tag string, err error) {

semtag := line[len(prefix):]
// Consider only tags that are valid and complete (not just major.minor prefixes).
if c := semver.Canonical(semtag); c != "" && strings.HasPrefix(semtag, c) {
if c := semver.Canonical(semtag); c != "" && strings.HasPrefix(semtag, c) && (major == "" || semver.Major(c) == major) {
highest = semver.Max(highest, semtag)
}
}
Expand Down Expand Up @@ -716,12 +730,8 @@ func (r *gitRepo) RecentTag(rev, prefix string) (tag string, err error) {
}
defer unlock()

if r.fetchLevel < fetchAll {
// Fetch all heads and tags and see if that gives us enough history.
if err := r.fetchUnshallow("refs/heads/*:refs/heads/*", "refs/tags/*:refs/tags/*"); err != nil {
return "", err
}
r.fetchLevel = fetchAll
if err := r.fetchRefsLocked(); err != nil {
return "", err
}

// If we've reached this point, we have all of the commits that are reachable
Expand All @@ -738,6 +748,67 @@ func (r *gitRepo) RecentTag(rev, prefix string) (tag string, err error) {
return tag, err
}

func (r *gitRepo) DescendsFrom(rev, tag string) (bool, error) {
// The "--is-ancestor" flag was added to "git merge-base" in version 1.8.0, so
// this won't work with Git 1.7.1. According to golang.org/issue/28550, cmd/go
// already doesn't work with Git 1.7.1, so at least it's not a regression.
//
// git merge-base --is-ancestor exits with status 0 if rev is an ancestor, or
// 1 if not.
_, err := Run(r.dir, "git", "merge-base", "--is-ancestor", "--", tag, rev)

// Git reports "is an ancestor" with exit code 0 and "not an ancestor" with
// exit code 1.
// Unfortunately, if we've already fetched rev with a shallow history, git
// merge-base has been observed to report a false-negative, so don't stop yet
// even if the exit code is 1!
if err == nil {
return true, nil
}

// See whether the tag and rev even exist.
tags, err := r.Tags(tag)
if err != nil {
return false, err
}
if len(tags) == 0 {
return false, nil
}

// NOTE: r.stat is very careful not to fetch commits that we shouldn't know
// about, like rejected GitHub pull requests, so don't try to short-circuit
// that here.
if _, err = r.stat(rev); err != nil {
return false, err
}

// Now fetch history so that git can search for a path.
unlock, err := r.mu.Lock()
if err != nil {
return false, err
}
defer unlock()

if r.fetchLevel < fetchAll {
// Fetch the complete history for all refs and heads. It would be more
// efficient to only fetch the history from rev to tag, but that's much more
// complicated, and any kind of shallow fetch is fairly likely to trigger
// bugs in JGit servers and/or the go command anyway.
if err := r.fetchRefsLocked(); err != nil {
return false, err
}
}

_, err = Run(r.dir, "git", "merge-base", "--is-ancestor", "--", tag, rev)
if err == nil {
return true, nil
}
if ee, ok := err.(*RunError).Err.(*exec.ExitError); ok && ee.ExitCode() == 1 {
return false, nil
}
return false, err
}

func (r *gitRepo) ReadZip(rev, subdir string, maxSize int64) (zip io.ReadCloser, actualSubdir string, err error) {
// TODO: Use maxSize or drop it.
args := []string{}
Expand Down
14 changes: 12 additions & 2 deletions src/cmd/go/internal/modfetch/codehost/vcs.go
Expand Up @@ -347,7 +347,7 @@ func (r *vcsRepo) fetch() {
func (r *vcsRepo) statLocal(rev string) (*RevInfo, error) {
out, err := Run(r.dir, r.cmd.statLocal(rev, r.remote))
if err != nil {
return nil, vcsErrorf("unknown revision %s", rev)
return nil, &UnknownRevisionError{Rev: rev}
}
return r.cmd.parseStat(rev, string(out))
}
Expand Down Expand Up @@ -392,7 +392,7 @@ func (r *vcsRepo) ReadFileRevs(revs []string, file string, maxSize int64) (map[s
return nil, vcsErrorf("ReadFileRevs not implemented")
}

func (r *vcsRepo) RecentTag(rev, prefix string) (tag string, err error) {
func (r *vcsRepo) RecentTag(rev, prefix, major string) (tag string, err error) {
// We don't technically need to lock here since we're returning an error
// uncondititonally, but doing so anyway will help to avoid baking in
// lock-inversion bugs.
Expand All @@ -405,6 +405,16 @@ func (r *vcsRepo) RecentTag(rev, prefix string) (tag string, err error) {
return "", vcsErrorf("RecentTag not implemented")
}

func (r *vcsRepo) DescendsFrom(rev, tag string) (bool, error) {
unlock, err := r.mu.Lock()
if err != nil {
return false, err
}
defer unlock()

return false, vcsErrorf("DescendsFrom not implemented")
}

func (r *vcsRepo) ReadZip(rev, subdir string, maxSize int64) (zip io.ReadCloser, actualSubdir string, err error) {
if r.cmd.readZip == nil {
return nil, "", vcsErrorf("ReadZip not implemented for %s", r.cmd.vcs)
Expand Down

0 comments on commit 1803ab1

Please sign in to comment.