Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: make package lookups in vendor directories case-insensitive on case-sensitive filesystems #38342

Open
carnott-snap opened this issue Apr 10, 2020 · 38 comments

Comments

@carnott-snap
Copy link

@carnott-snap carnott-snap commented Apr 10, 2020

What version of Go are you using (go version)?

$ go version
1.13.7

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/username/Library/Caches/go-build"
GOENV="/Users/username/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/username/Documents/src/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13.7/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13.7/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/77/06148cr15lvb1jbd9slc58v80000gq/T/go-build264165664=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

go mod vendor

What happened?

When vendoring on MacOS, files are written with case sensitivity to a case insensitive filesystem. This can cause two issues, but is only present for vendoring, not the module cache, since modules use ! prefixes to make the module cache case insensitive.

Collision

Because import paths are case sensitive, in the event that two packages share case insensitive , but not case sensitive, prefixes collision can occur. This is infrequent due to alllowercaseoneword package name conventions, and github username and repo name fields being case insensitive (but with case metadata), however the it is a bug and I can provide code samples to produce compilation failures.

Portability

This issue was discovered when sharing the vendor directory between systems that have different case sensitivity (MacOS to Linux) and does not require package collision to reproduce.

Given two modules share a case insensitive prefix, say github.com/Golang/exp and github.com/golang/tools. On MacOS, they will both be written to github.com/[Gg]olang and the first writer will pick the case. Builds work fine because both github.com/Golang and github.com/golang exist, but when you mount the vendor directory in a docker container github.com/Golang stops existing. This also happens for other data sharing mechasisms between filesystems like cp, tar, git, etc.

@jayconrod jayconrod added this to the Backlog milestone Apr 10, 2020
@jayconrod jayconrod added the Vendoring label Apr 10, 2020
@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 10, 2020

We probably can't escape paths in the vendor directory the same we do in the module cache. That would break compatibility with GOPATH projects, which is still an important requirement.

However, we should provide a clear diagnostic when this comes up. A module that names itself Github.com/something/something could cause confusion.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 10, 2020

I had not fully considered the ramifications of GOPATH, but on further review, I think the problem is worse, not better. It seems like the current implementation of GOPATH and vendor violate the language specification. (Please correct me if import paths are actually case insensitive.)

First, I can create a valid package, not even module, that hard breaks with GOPATH on a case insensitive file system. This can be done by file name collisions or folder collisions, repro available upon request. Is this not a bug that should be fixed?

Second the import path acts as both an identifier and access control. Any paths that contain case can be exploited to inject code or break packages. This is doubly scary for hosting providers and module proxies that are case sensitive, since username collision is possible.


This level of divergence from the spec seems problematic; and while I agree that fixing things now is not going to be trivial, because we intend to drag vendor forward with modules, something should be done to fix this.

If we intend to be backwards compatible we need a new directory, and a resolution scheme, assuming both exist. For the former, I propose vendor/v2/ following the modules pattern. This has the distinct disadvantages that packages with import statements starting with v2, e.g. "v2/path/to/package", would be clobbered, but the likelihood of this is low, and we could detect before writing anything and warn. An alternative would be to use a character that is disallowed in import paths like vendor/!v2/.

Next we need a resolution/migration scheme. I think it is sane to have all new clients write to the new path, I will assume vendor/v2, and prefer it when present, but still respect the semantics of the old path. This will allow old code to keep working, while new code can benefit from the fix. It may also be useful to backport minimal vendor support, as was done with modules, to the past few releases, if that is not too much work. Considering the problems below, I think it is sane to exclude GOPATH` clients from writing the new format, but they should be able to read it.

Should we fix GOPATH? I think the correct answer is yes, if possible, but if we are committed to sunset it in go1.15, the value add is limited. We will also be limited by what VCS systems provide. Testing indicates that Git cannot represent case on insensitive filesystems, so we may be limited to fixing this exclusive outside repos: e.g. !github.com/!embedded/!case!sensitivity/CaseInsensitive/Part. I feel this is still important, but weakens the value proposition.

Assuming we do, a similar migration approach can be applied, this time to $GOPATH/src. Pick the name you prefer, $GOPATH/src/v2, $GOPATH/src/!v2, or $GOPATH/srcv2, and apply package level resolution.


EDIT: GOPATH is not worth fixing.

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 10, 2020

I don't think any functional change (aside from improved diagnostics) should be made here.

A clarification though: neither GOPATH nor modules are part of the Go language specification. They're just part of cmd/go. Other build tools (Bazel, Blaze, Buck) have their own conventions for mapping import paths to packages.

GOPATH has a lot of problems. Unpredictable versions, case-insensitive collisions, unicode normalization, symbolic link cycles, and other issues have always been there. Modules set out to solve these problems and many others. Since GOPATH will be deprecated eventually (not in 1.15, but eventually), let's focus on making improvements to modules.

Vendor directories the only mechanism that packages in GOPATH can use to ensure they have the correct versions of their dependencies. So let's just make sure go mod vendor constructs a correct, compatible vendor directory. If it's unable to do that safely, we should explain that clearly.

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 10, 2020

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 10, 2020

A clarification though: neither GOPATH nor modules are part of the Go language specification.

That is fair, I do frequently conflate the two. Is there anywhere that defines the specification for import paths, as defined by cmd/go?

GOPATH has a lot of problems.

This seems reasonable; I will strike through that part of my previous comment, as it is not as important to me either.

This may be a different discussion, but it seems well past the point that GOPATH should be deprecated. Not sunset, with support removed, just not the default.

Vendor directories the only mechanism that packages in GOPATH can use to ensure they have the correct versions of their dependencies.

Does that not speak to the importance of making vendor describe modules correctly? If we do, that gives people who cannot move to modules some recourse to consume correct versions. Currently, they get a similar flawed interface to GOPATH.

So let's just make sure go mod vendor constructs a correct, compatible vendor directory.

I should get clarification: is the vendor directory intended to be portable? (If I generated it on one filesystem and transmit it to another, cp; git; docker; etc., should it continue to work?) If the answer is no, this simplifies the problem, and should just be documented. The main reason for this portability is because the developer and build systems have different credentials. (See App Engine private modules for an example of how this gets messy.) If vendor is not portable, we should expose something that is. E.g. one resolution proposed internally, was to inject the module cache, not vendor directory, since that is (at least more) portable.

Assuming vendor is not a stopgap solution, (like GOPATH) something that seems increasingly likely based on the changes in 1.14, if something can exist in a module, it must be able to exist in the vendor directory, otherwise the vendor directory is not a suitable mirror, and you loose all the build reproducibility guarantees.

If it's unable to do that safely, we should explain that clearly.

This seems like a really jarring ux:

  • go build works fine
  • go mod vendor breaks saying that somebody else created their modules in a way that is incompatible with your system
    • or go mod vendor works, with an easy to miss warning
    • go build fails thereafter, currently with a cryptic error

There are also edge cases to consider:

  • What is the right course of action if this happens to two modules that share a path prefix?
    • Submit a change to one?
    • Which one is right?
    • What if they fundamentally disagreement on what the correct resolution is?
  • How should a (GitHub) organisation migrate casing?
    • Even though everything maintains compatibility, the fact that both of these modules exist breaks things:
      • "github.com/Org/one"
      • "github.com/org/two"
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 10, 2020

Builds work fine because both github.com/Golang and github.com/golang exist, but when you mount the vendor directory in a docker container github.com/Golang stops existing.

That sounds like a bad interaction of a number of factors:

  1. Mounting a case-insensitive filesystem in a Docker container should arguably preserve the fact that it is case-insensitive, allowing the build to complete successfully. (That sounds like a Docker bug that you may want to file separately.)

  2. go mod vendor does not report an error if there is a collision on a case-differing prefix. It probably should.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 10, 2020

How should a (GitHub) organisation migrate casing?

Don't. Stick to the original case, forever. (We need a better story for that, but that's the answer today.)

@bcmills bcmills changed the title cmd/go: vendor directory is case sensitive cmd/go: 'go mod vendor' does not fail if two packages have a case-mismatched prefix Apr 10, 2020
@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 10, 2020

  1. Mounting a case-insensitive filesystem in a Docker container should arguably preserve the fact that it is case-insensitive, allowing the build to complete successfully. (That sounds like a Docker bug that you may want to file separately.)

The issue here is not about the mounting the vendor directory, that works correctly per docker/for-mac#320, but ADDing the vendor directory via Dockerfile. Currently this is implemented roughly as a copy, thus remains faithful to what is on disk. I can follow upstream, but it feels like the response will be "Go collides the directories, that is impossible to undo".

  1. go mod vendor does not report an error if there is a collision on a case-differing prefix. It probably should.

To confirm, go mod vendor would report an error and fail to generate a vendor directory?

Don't. Stick to the original case, forever. (We need a better story for that, but that's the answer today.)

I really dislike the message we are sending. Organizations and repos store more than Go code. There are a myriad of reasons that rebranding occurs. I do not think it is our place as a language to dictate such conventions.

This is not the case today, you can have two modules that share different case import paths. The problem comes about because the vendor spec is based on GOPATH, which has this bug. Also, this completely ignores the fact that either repo fixing things is a breaking change, based on the interface that github exposes.

Furthermore we are colliding separate identities on hosting providers that are case sensitive. This seems like a security issue.


EDIT: Reversed my stance on the upstream ticket. Not unwilling, just failed to provide full context.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 11, 2020

To confirm, go mod vendor would report an error and fail to generate a vendor directory?

Yes. I would argue that it there is no portable alternative:

  • If generated on a case-insensitive filesystem, the case-collapsed vendor directory would not work if the repo is checked out into a case-sensitive one (including whenever the module is used in GOPATH mode), as you have observed here.
  • If generated on a case-sensitive filesystem, the case-insensitive collision would make the vendor directory unpack strangely — or fail to unpack altogether — on a case-insensitive one.
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 11, 2020

I really dislike the message we are sending. Organizations and repos store more than Go code. There are a myriad of reasons that rebranding occurs. I do not think it is our place as a language to dictate such conventions.

I did say we need a better story on that. Feel free to open a separate issue. 🙂

(It's closely related to #30831.)

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 11, 2020

What is the right course of action if this happens to two modules that share a path prefix?

Submit a change to fix one (or both).

If they disagree on the domain name portion of the path, use all-lower-case: that is the overwhelming convention, and domain names are explicitly case-insensitive.

If they disagree on other portions... find the person or organization who owns the common prefix and ask them which variant is canonical. (Or — better still! — look at existing users and try to do whatever will be least disruptive for them.)

If you can't get the owners of those dependencies to agree, then drop one or the other dependency. (Do you really want to depend on a module whose owner is willing to leave users in a broken state over a path-casing disagreement?)

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 13, 2020

I agree today we do not do the right thing, and a sane first step could be to make go mod vendor break, but the consequences of that are non-trivial and place the burden on the user to make upstream PRs or fork. I get that it is easier to maintain the status quo, but if a technical solution exists (see my v2 proposal), should we not pursue it? This seems doubly important because vendor (seems like it) is a long-term solution, yet is incompatible with all module import paths. I have also yet to hear any arguments against my proposal, just alternatives that shift the problem to the user.

Also, sorry for not being more clear: the intent of those questions was more rhetorical, to illustrate that there are a myriad of edge cases to this approach. And while your responses are sane, they illustrate that resolving this, while maintaining the status quo is non-trivial. Would it not be easier to fix the bug, and keep the ux?

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 13, 2020

Case-mismatched collisions are rare in practice. vendor directories are more common, but still relatively rare. So the combination of those two cases is rarer still.

Have you run into many of these issues in practice? github.com/Sirupsen/logrus is the only one that comes to mind for me, and that happened long enough ago that it's generally easy to fix by upgrading to some other existing version of whatever dependency was importing the old path.

Addressing such a rare use-case does not seem worth the complexity of defining and maintaining a new, case-encoding vendor format. That is: one argument against a new format is that its benefits would not outweigh the intrinsic costs of having a new format.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 13, 2020

I am sorry for driving this, but think some of this discussion seems to be missing the forest for the trees; the core of my concerns centre around a few points:

  • Is the vendor directory intended to be portable?
    • Share between GOOS via mount
    • Share between filesystems via VCS
  • Should the vendor directory represent every module path?
  • Is there a security issue with allowing case based path collisions?
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 14, 2020

  • Is the vendor directory intended to be portable?
    • Share between GOOS via mount

Yes, but only for mount implementations that make sense. (A case-insensitive filesystem should remain case-insensitive when mounted from another OS, including via a network share, VM, or container.)

  • Share between filesystems via VCS

Yes. That's why go mod vendor should reject structures that aren't representable in a case-insensitive filesystem, even if the filesystem happens to be case-sensitive.

  • Should the vendor directory represent every module path?

Not necessarily: the vendor directory contains packages, not complete modules.

  • Is there a security issue with allowing case based path collisions?

Not to my knowledge, but if you believe you've found one please send a report per https://golang.org/security.

I believe that the only hazards today are:

  1. a vendor directory that builds one one filesystem might fail to build on another, and
  2. a vendor directory with mismatched cases might be used for imports of those packages in GOPATH mode on one filesystem, but not be used for those packages (falling back to whatever version is in GOPATH) on another.
@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 14, 2020

Yes, but only for mount implementations that make sense. (A case-insensitive filesystem should remain case-insensitive when mounted from another OS, including via a network share, VM, or container.)

Do we expect to support things like Dockerfile ADD, cp, or other copy style transfers?

  • Share between filesystems via VCS

Yes. That's why go mod vendor should reject structures that aren't representable in a case-insensitive filesystem, even if the filesystem happens to be case-sensitive.

So, vendoring github.com/... and GitHub.com/... should break everywhere? I like that it is consistent, but is going to be noisy, and the only fix (for github.com) is to make breaking changes to one of the two modules. I really dislike that we are requiring breaking changes to fix. We should also assess all the published modules on proxy.golang.org to see impact.

Not necessarily: the vendor directory contains packages, not complete modules.

Since vendor can shadow modules, I think this is deeply problematic. If we allow vendor to shadow modules, as in go1.14, it should support all possible packages, otherwise we can expect undefined behaviour and irreproducible builds.

Or phrased as a hazard:
3. a vendor directory cannot represent every possible package on case insensitive file systems.

2. a `vendor` directory with mismatched cases might be used for imports of those packages in `GOPATH` mode on one filesystem, but not be used for those packages (falling back to whatever version is in `GOPATH`) on another.

This was not something I considered, since module vendoring completely shadows and verifies this via vendor/module.txt. Does Go intend to support partial vendoring, or is it just legacy of what GOPATH allows?

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 15, 2020

We should also assess all the published modules on proxy.golang.org to see impact.

This is actually more widespread than I thought. Based upon all the modules in index.golang.org, I found nearly a thousand unique prefixes with case based module collisions.0 Since this contains all modules, the numbers are an upper bound, but manual review indicated that there were several high value orgs on github: github.com/ibm, github.com/microsoft, and github.com/nytimes. (Their repos reads as more of a coordination issue than a true migration effort.)

stdout.json contains map["prefix"][]"module/import/path". I have filtered out pre-release and pseudo versions, e.g. ^v[0-9]+.[0-9]+.[0-9]+; this is intended as a strong signal for the presence of a go.mod, but not fool proof. Prefix is also guaranteed to be the shortest shared substring. I do not have usage statistics, so it is hard to see what the exact fallout would be.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 15, 2020

vendor directories are more common, but still relatively rare.

I should have clarified this earlier, but this statement, to my visibility, is patently untrue. Problems with go-get and auth, see #26232, or appengine private module container builds, lead to vendoring being very common place within private source. This may not be visible to the open-source community, but the Go community certainly uses it.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 16, 2020

I discussed this with @jayconrod, @matloob, and @rsc this afternoon, and here's what we came up with.

As demonstrated in https://play.golang.org/p/_KZirMBFC1l, we already disallow case-insensitive collisions for individual package paths (at least in module mode). So we know that each individual package has at most one canonical casing.

Furthermore, in module mode with -mod=vendor, we know that every imported package is listed (with its canonical casing) in vendor/modules.txt. That gives us a way to determine the appropriate import path for the directory — an operation we assume to be possible within various tools, including cmd/go itself.

Similarly, given the absence of package collisions we can implement a “case-insensitive” lookup in a case-collapsed vendor directory. If the directory we expect is missing, we can progressively Stat each prefix of that directory. If any element is missing, we can perform a Readdir in the parent directory to find one or more case-equivalent substitutes. When we reach the end of the path, at most one of the case-equivalent candidates can contain a Go package (and thus .go source files), so that candidate must be the final case-insensitive location.

Given the relative rarity of these collisions, we don't have a concrete plan to implement this approach in the foreseeable future, but if you'd like to contribute an implementation (with tests!) we'd be happy to review.

@bcmills bcmills changed the title cmd/go: 'go mod vendor' does not fail if two packages have a case-mismatched prefix cmd/go: make package lookups in vendor directories case-insensitive on case-sensitive filesystems Apr 16, 2020
@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 16, 2020

Interesting, I was not aware that the toolchain disallowed case-insensitive collisions for individual package paths.

  • How long has this been a rule?
  • This present a problem for case sensitive hosting providers and module proxies, where example.com/Foo and example.com/foo are different users.
  • Since the module namespace is effectively case insensitive, why do they even support case? It would make the module cache and proxy implementations much simpler.

[I]f you'd like to contribute an implementation

I am happy to, can you point me at the affected symbols/packages? (Unless you think grep will be good enough.) Considering it is a bug, would it be a candidate for backport and inclusion in a point release?

Just stating the failure modes and outcomes this fix will provide for confirmation:

  • Resolves package collision security issues, because imported packages cannot collide.
  • Resolves prefix collision build issues, because we can disentangle via vendor/modules.txt and guessing.
  • Does not resolve case sensitive provider collisions, because the toolchain disallows this.
@thaJeztah
Copy link
Contributor

@thaJeztah thaJeztah commented Apr 17, 2020

As demonstrated in https://play.golang.org/p/_KZirMBFC1l, we already disallow case-insensitive collisions for individual package paths (at least in module mode). So we know that each individual package has at most one canonical casing.

Playing with some variations; this seems to work though; https://play.golang.org/p/SIFKKkS9XHa (perhaps intentional?).

(arrived here because I saw this issue linked to docker/for-mac#320)

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 17, 2020

Playing with some variations; this seems to work though; https://play.golang.org/p/SIFKKkS9XHa (perhaps intentional?).

Yes, this is expected: (excepting my outstanding concerns)

  • example.com/foo/example.com/Foo collide because they are the same case insensitive package import path.
    • This is not a prevalent issue because most hosting providers, say GitHub, have case insensitive users and repository namespaces, but allow case in metadata.
      owned by a different user, from being imported.
  • example.com/foo and example.com/Foo/bar do not collide and thus can be mapped to the virtual, case insensitive, runtime import map.
    • E.g. you cannot create github.com/foo and github.com/Foo or github.com/foo/bar and github.com/foo/Bar.
  • example.com/foo/bar and example.com/Foo/bar collide because they are the same case insensitive package import path.
    • This is the issue I brought up about case sensitive hosting providers.
      • This is more a theoretical concern, since I have not found a VCS host that provides this feature.
    • Module proxies must support the full case sensitive namespace, so they may be impacted.
      • I am referring to the <meta name="go-import" content="example.com mod https://example.com/foo/bar"> tag.
    • The issue is occlusion, not collision: example.com/foo/bar prevents example.com/Foo/bar,
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 18, 2020

can you point me at the affected symbols/packages? (Unless you think grep will be good enough.)

There is a block that would need to change for the forward lookup:

// -mod=vendor is special.
// Everything must be in the main module or the main module's vendor directory.
if cfg.BuildMod == "vendor" {
mainDir, mainOK := dirInModule(path, targetPrefix, ModRoot(), true)
vendorDir, vendorOK := dirInModule(path, "", filepath.Join(ModRoot(), "vendor"), false)
if mainOK && vendorOK {
return module.Version{}, "", &AmbiguousImportError{importPath: path, Dirs: []string{mainDir, vendorDir}}
}
// Prefer to return main directory if there is one,
// Note that we're not checking that the package exists.
// We'll leave that for load.
if !vendorOK && mainDir != "" {
return Target, mainDir, nil
}
readVendorList()
return vendorPkgModule[path], vendorDir, nil
}

And one for the reverse lookup:

if _, ok := vendorPkgModule[pkg]; !ok {
return "", fmt.Errorf("directory %s is not a package listed in vendor/modules.txt", absDir)
}

(There may be other places, too, but those are the entry-points that come to mind.)

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 18, 2020

Considering it is a bug, would it be a candidate for backport and inclusion in a point release?

Probably not. We've only had one report of an affected user (you), and there seems to be a relatively straightforward workaround: namely, build in module mode without -mod=vendor (or, for go 1.14 modules using a 1.14.* toolchain, with -mod=mod).

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 18, 2020

  • How long has this been a rule?

Looks like CL 7314104, early 2013.

  • Since the module namespace is effectively case insensitive, why do they even support case? It would make the module cache and proxy implementations much simpler.

I'm not sure! Perhaps to avoid module adopters accidentally breaking GOPATH-mode users by adding imports with the wrong casing?

@arnottcr
Copy link
Contributor

@arnottcr arnottcr commented Apr 21, 2020

I am happy to do the review as part of a CL, but thought that an early pass here may be easier, since it is currently a single insertion.

To pin down requirements, I would like to confirm what behaviours we need to support. My assumption was that we needed to allow for the application of any casing strategy, (alllowercase, CamelCase, ALLUPPERCASE, etc.), since some case insensitive file systems may not even store case metadata, meaning that checking vendor/modules.txt could miss the actual folder on disk. This will come at a perf cost, but only for the insensitive -> sensitive case. It should also safe to check them all, since we know that go build will fail if we see any import path collisions. Thus, while certainly naive and likely inelegant, the following meets those requirements, and appears to be sufficient:

diff --git a/src/cmd/go/internal/modload/import.go b/src/cmd/go/internal/modload/import.go
index 162c29d2a6..d2d1fa1f1a 100644
--- a/src/cmd/go/internal/modload/import.go
+++ b/src/cmd/go/internal/modload/import.go
@@ -9,11 +9,13 @@ import (
 	"fmt"
 	"go/build"
 	"internal/goroot"
+	"math"
 	"os"
 	"path/filepath"
 	"sort"
 	"strings"
 	"time"
+	"unicode"
 
 	"cmd/go/internal/cfg"
 	"cmd/go/internal/load"
@@ -337,6 +339,24 @@ func dirInModule(path, mpath, mdir string, isLocal bool) (dir string, haveGoFile
 		dir = mdir
 	} else if mpath == "" { // vendor directory
 		dir = filepath.Join(mdir, path) 
+		if _, err := os.Stat(string(path)); err != nil { // this is only needed when the import path does not exist in the vendor directory because of case manipulation
+			rpath := []rune(path)
+			bits := int(math.Pow(2, float64(len(rpath))))
+			for i := 0; i < bits; i++ {
+				for j := 0; j < len(rpath); j++ {
+					if i>>j%2 == 1 {
+						rpath[j] = unicode.ToUpper(rpath[j])
+					} else {
+						rpath[j] = unicode.ToLower(rpath[j])
+					}
+				}
+				cdir := filepath.Join(mdir, string(rpath))
+				if _, err := os.Stat(cdir); err == nil {
+					dir = cdir
+					break
+				}
+			}
+		}
 	} else if len(path) > len(mpath) && path[len(mpath)] == '/' && path[:len(mpath)] == mpath {
 		dir = filepath.Join(mdir, path[len(mpath)+1:])
 	} else {

After this is merged, and GOPATH is sunset, there is also an open question of if the toolchain, specifically go mod vendor should intentionally collide case, say using alllowercase, because case sensitive -> insensitive transfers will cause vcs issues. This seems like a separate issue, so I am happy to punt, but thought it was worth noting.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 21, 2020

Instead of probing each sequence of runes, I think we should use ioutil.ReadDir and look for a case-equivalent path element. That narrows the search space from 2N possible variants to just M subdirectories, where generally M ≪ 2N, and also avoids the need to consider letters that have more than two equivalent runes (such as k, K, and K).

If need be, we could cache those reads (perhaps in a sync.Map).

Using the actual directories found, we may end up with more than one possible search path, but that's ok: we can keep all possible search paths and simply try to extend each candidate for every element. (There will typically be only one possible extension, but may in general be multiples — particularly if the vendor directory has been modified multiple times by users who have checked out the same version-control repo on systems with varying case-sensitivity.)

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 21, 2020

I would like to clarify what we intent to support:

  • We currently support vendor directories that were generated on the same case filesystem. This happy path should be kept as fast as it is today.
  • Clearly we intend support case insensitive -> sensitive mappings. This includes collisions where the correct casing is listed in vendor/modules.txt and ones where the filesystem chooses an arbitrary casing. This will always (?) result in a single directory containing all possible casings of an import path (prefix).
  • Case sensitive -> insensitive seems out of scope, because it breaks at the VCS layer.
  • Any cases involving more than one search path should indicate bespoke or maliciously crafted vendor directories, both of which seem problematic to explicitly support. I am fine breaking on these cases, is that ok?
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 21, 2020

Do version control systems report changes due to directory collapse? It would be interesting to see what happens if you run go mod vendor on a case-sensitive filesystem, then check out the repo on a case-insensitive one, add a case-colliding dependency, re-run go mod vendor, and then merge the resulting change.

I suspect that that can result in a mixed-case vendor directory, which we should probably support.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 21, 2020

I was working off the understanding that Git fails on collisions. This is not true. For directories, it happily collides them without notifying the user what happened. For file collisions, the action succeeds, but prints a warning:

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'Main.go'
  'main.go'

Thankfully the toolchain prevents file collisions from being too much of an issue on case sensitive file systems:

package main: case-insensitive file name collision: "Main.go" and "main.go"

Unfortunately, the lack of any warning for folders means we can be left with a very segregated vendor directory: (from b512ce0e728f45799c94fed649aa31e5, see commit history for details)

vendor/:
github.com  modules.txt

vendor/github.com:
carnott-snap  CARNOTT-SNAP

vendor/github.com/carnott-snap:
f3de8292c626470d94bea7bcec9c6e18

vendor/github.com/carnott-snap/f3de8292c626470d94bea7bcec9c6e18:
f3de8292c626470d94bea7bcec9c6e18.go  go.mod

vendor/github.com/CARNOTT-SNAP:
dfaf18ff9f58481bbab4c26ca49ad74e  f3de8292c626470d94bea7bcec9c6e18

vendor/github.com/CARNOTT-SNAP/dfaf18ff9f58481bbab4c26ca49ad74e:
dfaf18ff9f58481bbab4c26ca49ad74e.go  go.mod

vendor/github.com/CARNOTT-SNAP/f3de8292c626470d94bea7bcec9c6e18:
new.go

I think the prospect of having to merge different directories together, is enough work to justify considering the alternate vendor directory spec, but regardless further discussion is warranted.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 22, 2020

I think we can and should support such a directory.

I have an algorithm in mind, but I ran out of bandwidth for the day. I'll try to post a sketch later this week.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 22, 2020

I agree we should support this ux, checking in vendor portably0, but I think we should be careful about complexity. Your proposal involves finding all the equivalent directories under Unicode case-folding, then lacing them together, something that has a large number of edge cases. Introducing a new vendor spec that solves this with a trivial case insensitive algorithm will involve more code living in this repo, but would be easier to maintain. I am open to anything, but would prefer we target the simplest option.

0] There may be an argument to never checking in vendor, but I think that ship has sailed, and it would speak to not promising that vendor is portable, which is what this issue seeks to fix.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 22, 2020

However, we should provide a clear diagnostic when this comes up. A module that names itself Github.com/something/something could cause confusion.

For the record, the toolchain enforces that domains are all.lower.case, thus Github.com/something/something will not build.

/cc @jayconrod

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 23, 2020

For the record, the toolchain enforces that domains are all.lower.case, thus Github.com/something/something will not build.

Right. That's good. Less opportunity for this problem to happen.

I'm going to back away from this discussion for now. I'll keep following, but at the moment I don't have bandwidth to do research that would be useful for active discussion.

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 23, 2020

I'm going to back away from this discussion for now. I'll keep following, but at the moment I don't have bandwidth to do research that would be useful for active discussion.

Totally fair, my goal was just to close that thread, not unnecessarily drag you back into this; sorry for the hassle.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 23, 2020

Ok, I think I failed to understand the original comment about merging multiple directories. I don't think that we should support merging two distinct case-equivalent directories that both contain .go files for nominally the same package. Instead, we should reject such a situation.

However, I would argue that such a directory represents a defect in the VCS, not in the go command. if the VCS knows that the filesystem is case-insensitive (as it should), and it processes a file addition within a colliding directory, it should by default case-normalize the directory to match the existing non-empty directory, rather than the empty one.

So, let's reject that case explicitly, and make this work for the reasonable one: https://play.golang.org/p/FKDojBNKpgx

@carnott-snap
Copy link
Author

@carnott-snap carnott-snap commented Apr 24, 2020

Sounds pragmatic; I am onboard. How smart should the toolchain be about detecting and altering on issues? I have a fix that is pretty simple, but it is going to yield in cryptic errors. My only fear is that plumbing the correct error messages may not be possible or at least feasible.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 24, 2020

How smart should the toolchain be about detecting and altering on issues?

Good question.

The most conservative approach would be to do a full filesystem walk as soon as we load vendor/modules.txt to check for consistency, but that's also the slowest.

The fastest approach would be to only report an ambiguous package if the exact-case lookup for that package fails, but that would mean that we rarely (if ever) actually diagnose the problem in practice: the lookup won't fail on case-insensitive filesystems (because the directories are merged), and likely won't fail on case-sensitive ones in a lot of cases either (because the case-sensitive lookup will find a package, just an incomplete one).

On the other hand, those cases will usually result in a build break — they only won't break the build if the case-collided files contain unused declarations (which don't matter anyway) and/or init functions (the absence of which would be much more concerning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.