Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Ensure we're correct on case insensitivity (for github...?) #433

Closed
fabulous-gopher opened this issue Apr 21, 2017 · 11 comments · Fixed by #1079
Closed

Ensure we're correct on case insensitivity (for github...?) #433

fabulous-gopher opened this issue Apr 21, 2017 · 11 comments · Fixed by #1079

Comments

@fabulous-gopher
Copy link

From @sdboyer on December 13, 2016 0:31

GitHub thinks github.com/Sirupsen/logrus is the same as github.com/sirupsen/logrus. We need to make sure we handle this...correctly. Ish. Sort of. Ugh.

Copied from original issue: sdboyer/gps#116

@ibrasho
Copy link
Collaborator

ibrasho commented Jul 8, 2017

I'll try to summarize the discussion that happened in Slack:

Sam detailed the following solution:

  1. the system that deduces repo/project roots from import paths needs to be expanded to return some kind of informational structure of case sensitivity for different elements of the paths
    the entry point to this deduction system is here: https://github.com/sdboyer/gps/blob/master/source_manager.go#L429, which quickly passes off to https://github.com/sdboyer/gps/blob/master/deduce.go#L562
    all (or at least almost all) of the work of root deduction is centralized in that deduce.go file
    yes. for the full solution, the return type of that initial method would have to change from a ProjectRoot (which is just type ProjectRoot string) to something that also contains the case sensitivity info
  1. the system that manages repositories on disk needs to learn how to read those case sensitivity info structures, and we'll need a new trie implementation that works with them
    the entry point to this is https://github.com/sdboyer/gps/blob/master/source.go#L61; it's responsible for creating sourceGateway, which are the objects i mentioned that we're "very careful to ensure there's only one of, per repository" called by almost all of the other SourceManager methods in order to do their work
    these methods are both fairly complicated b/c they have to block as little as possible under high concurrency, but also avoid duplicating work in a way that could result in duplicate sourceGateways being created
    and all three of these https://github.com/sdboyer/gps/blob/master/source.go#L55-L57 are where the lookups actually happen - those keys would need to be made selectively case insensitive
    and i don't even want to reach into the solver, because frankly it's everywhere
  1. the solver itself will need to stop using simple map[string]<...> types, and use a custom map that, again, works with these case sensitivity info structures

but the risk and complexity of that is considerable, so we'll likely have to settle for some harm reductive approaches in the meantime

@ibrasho
Copy link
Collaborator

ibrasho commented Jul 8, 2017

The following was also proposed:

assume that everything up to the root of the project is case insensitive, and that everything after it is case sensitive
so yeah, i'm now intrigued by the idea that, maybe, we just internally normalize to lower case
(for only the project root portion of an import path)
[...] i also think it would probably work more often than the current system does
i think we don't actually touch the solver at all - if your codebase imports two paths that differ only in case, then we don't try to normalize that - we just (optionally) warn
all we have to do is make sure that nothing blows up catastrophically when there are case-only differences in a repo and the user is on a case-insensitive filesystem, as with your bug

@ibrasho
Copy link
Collaborator

ibrasho commented Jul 8, 2017

The issue I found with these solutions is that the solver will select 2 (most likely different) versions for each of github.com/Sirupsen/logrus and github.com/sirupsen/logrus. It also requires that we switch the source cache lock to be at the repository level, instead of a global lock.

The issue I see here is not orchestrating the source gateways but ensuring that the solver understands that github.com/Sirupsen/logrus and github.com/sirupsen/logrus are the same package.

@ibrasho
Copy link
Collaborator

ibrasho commented Jul 8, 2017

I'm not sure if the compiler is case-sensitive wrt import paths, but I'd assume it relies on the file system for that.

We can consider the lower-case path the canonical import path for packages on Github (and other sources with the same issue) but the compiler will still expect to find 2 directories github.com/sirupsen/logrus and github.com/Sirupsen/logrus on the file system if it's case-sensitive and both are imported in the codebase.

This introduces another issue, ensuring that the package is vendored twice on case-sensitive file systems (for both cases).

An alternative is to just fail when a package is detected with 2 different paths (different letter-casing). This is not a solution, but it's the simplest thing we can do for the time being. 😞

@sdboyer
Copy link
Member

sdboyer commented Jul 12, 2017

aaaghhhh... yes, good point. we'd still need to record both/all casings in the lock, then determine at vendor write time whether the fs is case sensitive, and if it is, write them all instead of just the canonical lowercase one.

that potentially breaks other parts of the model, too - particularly related to vendor verification.

so yeah, the simplest thing right now probably is detecting the situation and erroring out earlier 😢

@ibrasho
Copy link
Collaborator

ibrasho commented Jul 16, 2017

@sdboyer gps.ValidateParams (in #697) seems like a good place for a check like this, right?

@sdboyer
Copy link
Member

sdboyer commented Jul 18, 2017

@ibrasho hmm...they could probably rely on the same code, but this case is narrower, as we wouldn't be constructing a whole SolveParameters.

@edrex
Copy link

edrex commented Jul 22, 2017

I just discovered some things:

  • go build errors out when both cases are present on a case-sensitive filesystem (ext4): XXXXX: case-insensitive import collision: "github.com/sixgill/service/vendor/github.com/Sirupsen/logrus" and "github.com/sixgill/service/vendor/github.com/sirupsen/logrus"

  • AFAICT this results in an impass: go won't build with multiple casings in your project code or vendor packages on a case sensitive filesystem, but you need both casings to build.

Correction: go build only errors out if there are imports with multiple casings.

We are using glide's aliasing feature to automatically rewrite imports in vendor using the old casing to the new one.

It's not clear how to solve this in dep without rewriting vendor package imports.

Anyone else have insights/experimental results?

@edrex
Copy link

edrex commented Jul 22, 2017

What I did (not proud of it):

Added a gofmt to rewrite vendor packages:

vendor: $(BIN)/dep Gopkg.lock Gopkg.toml
	$(BIN)/dep ensure
	gofmt -w -r '"github.com/Sirupsen/logrus" -> "github.com/sirupsen/logrus"' $(shell find vendor -name \*.go -type f -print)

Edit: What we actually ended up with:

vendor: $(BIN)/dep Gopkg.lock Gopkg.toml
	$(BIN)/dep ensure -v
	# TODO(eric): kill it with fire once https://github.com/golang/dep/issues/433 is resolved
	# Workaround for OSX sed behaving differently
	case "${PLATFORM}" in \
		Darwin) find vendor -type f -name "*.go" -print0 | xargs -0 sed -i '' 's/Sirupsen\/logrus/sirupsen\/logrus/g' ;;\
		*)      find vendor -type f -name "*.go" -print0 | xargs -0 sed -i    's/Sirupsen\/logrus/sirupsen\/logrus/g' ;;\
	esac ;\
	touch $@

@sdboyer
Copy link
Member

sdboyer commented Jul 26, 2017

thanks for the added info, @edrex!

It's not clear how to solve this in dep without rewriting vendor package imports.

😱 😱 😱 💥 💥 💀

go build errors out when both cases are present on a case-sensitive filesystem (ext4):

honestly, i'm not even sure if this makes it worse or better. probably better, but it's still kind of horrifying. i need to ponder some more.

@sdboyer
Copy link
Member

sdboyer commented Aug 4, 2017

OK, I've mused on this now, and it's clear to me that there's just no way we can fix this without directly rewriting import paths. That's something we won't do. However, the current situation creates totally worthless errors - panics, even - and needs to be improved. That, we CAN do.

First, for context - confirming what @edrex discovered, the go toolchain disallows having two imports in a single program that differ only by case. While it's not actually in the spec, the toolchain disallowing it is good enough for me to treat it as a rule. To treat it as a rule, we need to enforce it in the solver itself. That means this is going to be a fairly tricky, or at least arduous, implementation.

Here's a quick sketch of what I think will be needed:

  1. Borrow the toolchain's toFold() implementation
  2. Modify the appropriate structures - maybe completeDep or workingConstraint? - to track the difference between an original ProjectRoot and the folded name
  3. In structures like the selected and unselected queues, key on the folded name
  4. Ensure that all prints still use the unfolded name
  5. Introduce satisfiability checks (satisfy.go) to ensure that
    i. all imports of the considered tree, if they point to a project that's already selected (folded), have the exact same casing as the already-selected project
    ii. because two case-differing imports could be in the unselected queue at once, we also have to verify during the initial (project-level) visit that name being considered is identical to any of its already-selected folded equivalents (if there are any).

There's also some things to do in the SourceManager: as soon as a string or ProjectRoot enters one of the methods, we need to immediately fold it and put it into some kind of 2-tuple that stores both pieces, then store/key on that as appropriate. tbh I can't even fully think about all the places this will change things - but there are a lot.

asf-stripe added a commit to stripe/veneur that referenced this issue Sep 13, 2017
Since we can't rely on sub-packages of logrus, just define a
plausible-enough null logger for benchmarks.
asf-stripe added a commit to stripe/veneur that referenced this issue Sep 14, 2017
Since we can't rely on sub-packages of logrus, just define a
plausible-enough null logger for benchmarks.
asf-stripe added a commit to stripe/veneur that referenced this issue Sep 18, 2017
Since we can't rely on sub-packages of logrus, just define a
plausible-enough null logger for benchmarks.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants