Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add satisfiability check for case variants #1079
What does this do / why do we need it?
This introduces a new satisfiability check in the solver that ensures we don't have any import paths that vary only by path. It's actually really just
The effect of this is that dep will only allow one case-variant of any given import path in a solution/a
What should your reviewer look out for in this PR?
still need a handful more tests to cover the combinations
Do you need help or clarification on anything?
Which issue(s) does this PR fix?
several, at least.
referenced this pull request
Aug 29, 2017
ugh lol this is metastasizing a bit (as solver checks kinda tend to) - i realized that there are more cases to address slightly differently:
i'm gonna punt on the first one, because it's kinda vanishingly unlikely to happen right now, and i can't think of a way to do it without inducing a relatively expensive validation check in the main solving loop. a good solution down the road might be to precalculate error conditions like that in
"as needed" may also include up in dep, outside of the solver, for case-variant imports within the root project. alternately, and probably preferably, it might also be checked as part of
the second item, though, does need to be done now, and kinda needs its own failure type. it needs to be clearly opinionated about the fact that the dependers are doing the wrong thing, whereas
ok, this has grown more. in the course of getting tests for the alternate failure type (also now implemented) where we can unambiguously infer canonical import path information from the internal import path patterns (num 2 above), i ran into difficulties getting the harnesses to behave. so, now the gps solver testing harness has effectively been made case-insensitive for the "root portion" (up to the first
this is a significant choice, because it's now basically just dumb to not follow suit in the real
in looking at all that, i also realized that we have a nasty bug - on a case-insensitive filesystem, having case-variant project roots results in a situation where there are multiple
changed the title from
[WIP] Add satisfiability check for case variants
Add satisfiability check for case variants
Sep 4, 2017
referenced this pull request
Sep 5, 2017
i'm backing away from the HFS analogy, as it isn't great. at the very least, it's misleading, as the operations we perform aren't terribly analogous to the ones performed by filesystems.
in any case, i think we're now at a stable spot with this. the core checks are in place in the solver itself, both the testing harness and the
in general, the new satisfiability check is just a big win. there are really no cases that were working before that it cuts out - it just prevents dep from accepting solutions which already weren't going to result in a compilable build. the only drawback is the performance cost: we now have to perform case folding on each external import root at each step in the solver. we don't have benchmarks (there's another TODO #896) to know the actual impact of that, and it's all masked now anyway by the constant-factor costs of network and disk interaction.
i'm slightly less bullish on treating root portions of paths as case-insensitive. we are, of course, within the bounds of what the compiler enforces by doing it, but we're also reinterpreting that logic in a different domain (local disk vs. network). but this PR is effectively deciding that all code hosts treat the root portion of the code they host as case-insensitive. even if it's considered bad practice to vary only by case, i can't imagine all code hosts actually enforce this as a rule in the way that we now effectively assume they do.
the reasons to do this anyway, despite that risk, are:
as the code is currently written, the risks are:
I'm very curious if this has ever happened intentionally. I can't think of any example, and so far, neither has Twitter. Also, if what dmorsing is saying is correct, this situation will already cause the Go compiler to fail to build, which would mean it's not necessary for dep to solve it. (In other words, if it's a build failure, it's unlikely to be the case with any packages that exist today, and if it happens tomorrow for a new package, well, even without dep, it'd still provide a build failure). We should confirm that, though!
As for the case folding, which is what you pinged me about: My reading of the Unicode docs implies that this isn't a terrible idea (albeit not recommended):
In this case, the folded "text" that we're storing is actually the key in a key/value store, whereas the docs seem to be written with the "value" mostly in mind. (For example, according to the Unicode docs, we shouldn't fold the source code itself and store that - even aside from the fact that it'd completely break compilation in Go).
That said, I'm not wholly convinced that the folding is necessary on-disk. Presumably the contents would be identical in both directories. While that's a minor waste of space, vendoring itself is a solution that only makes sense of we treat disk space as a resource abundant enough not to require minor optimizations. And from a version control perspective, git will deduplicate the files, so the amount of additional space needed in the repository is negligible. (I'm less familiar with other version control systems, to be honest, but I think most should handle this reasonably).
As you mentioned, the logic for the solver (ensuring that these versions are treated identically) has to exist anyway, so other than saving a few bytes on disk, I don't see a strong benefit to dropping that piece of information. (That is, the information of which casing was used to access a library at the time it was vendored by dep).
awesome, thanks for taking the time on this!
it's kinda the other way around, actually. in order to produce a depgraph that the compiler will find acceptable, this PR introduces checks that make it impossible for case-only differences in import paths to exist in any solution that it finds. (that was the original goal of this PR; everything related to filesystems and storage is actually just a knock-on effect of addressing this original problem in the solver). otherwise, dep will pick out a set of dependencies that won't actually work, and won't even be writable on a case-insensitive filesystem (e.g. #797).
people end up having to resolve this crap manually, which has been an arduously difficult process for a number of users already. these changes can't fix it for them, but it at least tells people which of their dependencies are using problematic imports and need to be fixed, as well as attempting to find versions of those dependencies that don't have a problem.
i should have an example of what the
coooool cool good, ok. yes, that makes a lot of sense, and assuages my concerns.
it may well not be. i opted for this approach mostly because uniformity seemed beneficial. but, some things to clarify:
indeed, i think the "avoiding waste" is not a terribly good argument for doing the on-disk folding. the crucial requirement here is rather that we strictly control there being only a single object (a
but doing that doesn't necessarily entail using a folded case on the filesystem itself - only that in the in-memory maps, we keep the folded case on hand for lookup purposes, so that subsequent calls to fetch the
the network activity is a tad more concerning than the disk usage, but still probably negligible. however, it may become more of a pain in the future - e.g., under #431, it may become useful for people to forcefully clear the caches for a particular dependency. (hopefully not, but...) in such cases, it seems to me it might be easier that if we generally follow the pattern of keying on the case-folded-form everywhere, we might avoid gotchas in that arena.
just to be totally clear, that's not actually the moment we're talking about here. what appears in your
here's some sample output from the tests introduced in the PR:
there's a more verbose failure message that gets dumped than the one in the tracer, but...well, yeah, suggestions on the wording of these failure messages is also welcome
the rabbit hole kinda keeps going down with this one. one thing, for example, that i need to look at - must import comments be byte-literal matches, or do they case fold as well? this could end up mattering, as these casing rules start spreading their tendrils through dep :(
ugh...actually, so, that latest change just unconditionally always operates on the folded form of URLs when interacting with remote services. that's truly assuming that they're case-insensitive, rather than the weaker case where they can be case-sensitive, but still disallow case-only variations in the data they host. the latter seems much safer.
need to make another change tomorrow to accommodate that.
This was referenced
Sep 11, 2017
OK, those issues are now ameliorated - we don't case-fold what we write to disk, but we do case-fold in memory. This being case sensitivity, I imagine there's still gremlins running around somewhere, but I think they've been banished sufficiently far underground that we won't hear from them for a while.