cmd/go: go get -u behaves as if -f is always provided (for non-custom import paths) #20039

Open
dmitshur opened this Issue Apr 19, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@dmitshur
Member

dmitshur commented Apr 19, 2017

This is an issue since 2015. I discovered it on January 11, 2017, but I'm only getting around to reporting it now. I have a suggestion for resolution.

cmd/go documents the flag -f for use with go get -u at https://golang.org/cmd/go/#hdr-Download_and_install_packages_and_dependencies:

The -f flag, valid only when -u is set, forces get -u not to verify that each package has been checked out from the source control repository implied by its import path. This can be useful if the source is a local fork of the original.

However, for packages with a non-custom import path (custom import path being one that is defined by HTML meta tags, see isCustom), go get -u now behaves as if -f is always provided.

For custom import paths (e.g., rsc.io/pdf), there is no such issue.

What version of Go are you using (go version)?

$ go version
go version go1.8.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/tmp/fresh"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/tw/kgz4v2kn4n7d7ryg5k_z3dk40000gn/T/go-build606008153=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Everything works as expected for custom import paths:

$ export GOPATH=/tmp/fresh
$ go get -u rsc.io/pdf
$ cd $GOPATH/src/rsc.io/pdf
$ git remote -v
origin	https://github.com/rsc/pdf (fetch)
origin	https://github.com/rsc/pdf (push)
$ git remote set-url origin https://github.com/wrong/repo
$ go get -u rsc.io/pdf
package rsc.io/pdf: rsc.io/pdf is a custom import path for https://github.com/rsc/pdf,
but /tmp/fresh/src/rsc.io/pdf is checked out from https://github.com/wrong/repo
$ echo $?
1
$ go get -u -f rsc.io/pdf
$ echo $?
0

(For demonstration purposes, assume that https://github.com/wrong/repo contains arbitrary git history, such that it's git pull --ff-only-compatible with the repo I'm trying to update. I don't want to spend time creating GitHub repos that have this property, but it's possible.)

However, this is what happens for a non-custom import path:

$ go get -u github.com/gorilla/mux
$ cd $GOPATH/src/github.com/gorilla/mux
$ git remote -v
origin	https://github.com/gorilla/mux (fetch)
origin	https://github.com/gorilla/mux (push)
$ git remote set-url origin https://github.com/wrong/repo
$ go get -u github.com/gorilla/mux
$ echo $?
0
$ go get -u -f github.com/gorilla/mux
$ echo $?
0

What did you expect to see?

$ git remote set-url origin https://github.com/wrong/repo
$ go get -u github.com/gorilla/mux
package github.com/gorilla/mux: github.com/gorilla/mux is an import path for https://github.com/gorilla/mux,
but /tmp/fresh/src/github.com/gorilla/mux is checked out from https://github.com/wrong/repo
$ echo $?
1
$ go get -u -f github.com/gorilla/mux
$ echo $?
0

What did you see instead?

$ git remote set-url origin https://github.com/wrong/repo
$ go get -u github.com/gorilla/mux
$ echo $?
0
$ go get -u -f github.com/gorilla/mux
$ echo $?
0

Fix

This regression was introduced while resolving another issue. I have a suggestion for how to fix it so both things are fixed. Details will follow in next comment.

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Apr 19, 2017

Member

History

Go 1.4 added import path repo URL checking in addition to the import path comment. It was very strict, performing a simple URL string equality check.

Issue #10952 was opened. It quoted a forwarded message from golang-dev mailing list that described the original problem:

This gets a number of user/developers who expect to be able to update
their package originally checked out with git: by using go get. An an
example is here[2].

[2] https://groups.google.com/forum/#!topic/golang-nuts/Zi-Kw2tbif8/discussion

The proposed fix by @rsc was to relax things so that non-custom import path doesn't get a custom import path check unless it has an import comment. The issue was closed by CL 10693.

Later, #10952 was reopened as #16471, that suggested that the fix in CL 10693 was not optimal, and suggested a better fix. This new issue #16471 was then closed by CL 31658, which changed the behavior so that the remote URL check applied only to custom import paths.

This issue is essentially a reopen of both those issues. I think I have a suggested fix for the original problem that was reported in #10952. Let me explain the problem as I understand it.

Problem

Let's consider GitHub repos, since this is where the issue comes up most often.

GitHub supports cloning repos via HTTPS and SSH. The latter can be expressed as a normal URL (ssh://github.com/user/repo) or more commonly as a SCP-style address (git@github.com:user/repo).

A weird/unfortunate thing about GitHub repos is that the remote URL they're cloned from can have an optional ".git" suffix. In other words, these two remote URLs are equivalent:

  • git clone https://github.com/user/repo
  • git clone https://github.com/user/repo.git

When doing go get github.com/user/repo for the first time, cmd/go will use https://github.com/user/repo as the default remote URL (without suffix).

When visiting https://github.com/user/repo in your browser, GitHub's UI usually displays clone URLs with the optional ".git" suffix:

image

Next, some people prefer to use SSH URLs instead of HTTPS for cloning their git repositories for various reasons. This is what the original post in https://groups.google.com/forum/#!topic/golang-nuts/Zi-Kw2tbif8/discussion was asking about.

The change in CL 10693 effectively allowed using SSH URLs instead of HTTPS, by making remote URL checking apply only to packages with an import path comment (at the time). Later, it got changed in CL 31658 to apply to custom import paths only.

I think we should allow SSH URLs, but I believe there is a better way to do it without completely removing the remote URL check for non-custom import paths.

Proposed Fix

I think we should fix this issue, as well as the original issue from #10952, by applying the remote URL check to all repositories, but:

  1. Ignore schema during URL comparison.

    These three remote URLs should be considered equivalent and valid for import path github.com/user/repo:

    • https://github.com/user/repo
    • ssh://github.com/user/repo
    • git@github.com:user/repo


    I've implemented this functionality in EqualRepoURLs.

  2. (Optional) Consider ignoring ".git" suffix for "github.com/xxx/yyy" remote URLs.

    If we do this, these URLs would be considered equivalent and valid for import path github.com/user/repo:

    • https://github.com/user/repo
    • https://github.com/user/repo.git
    • ssh://github.com/user/repo
    • ssh://github.com/user/repo.git
    • git@github.com:user/repo
    • git@github.com:user/repo.git


    I would really really prefer NOT to do this step, and instead tell people not to use the unneeded ".git" suffix. go get doesn't set it, so it only happens when people modify the remote URL manually. I outlined my rationale for why I think this should not be done in shurcooL/gostatus#37 (comment).

    However, I recognize that if we don't do this, some people may complain and report bugs (as has happened to me in my tools here, here, here). I don't want this fix to get reverted because of that.

  3. When displaying error message about mismatching URL, format the expected repo URL using the schema and URL that the user has. Doing this is critical to support users who use non-default schema, otherwise the error message is confusing.

    For example, the following error message is confusing because it's hard to see where the difference in URL is:

    remote URL doesn't match repo URL inferred from import path:
               (actual) git@github.com:wrong/repo.git
             (expected) https://github.com/user/repo
    

    This error message is much more clear, because it prints the expected repo URL using same scheme and address format (SCP-style address) as user's incorrect remote URL:

    remote URL doesn't match repo URL inferred from import path:
               (actual) git@github.com:wrong/repo.git
             (expected) git@github.com:user/repo
    

    (You can look at shurcooL/gostatus#37 or shurcooL/Go-Package-Store#65 for real-world examples where this happened users of my tools, before I implemented improved printing of expected URL in shurcooL/gostatus#39.)

    I've implemented this functionality in FormatRepoURL.

I'm happy to work on a CL if this gets approval.

Member

dmitshur commented Apr 19, 2017

History

Go 1.4 added import path repo URL checking in addition to the import path comment. It was very strict, performing a simple URL string equality check.

Issue #10952 was opened. It quoted a forwarded message from golang-dev mailing list that described the original problem:

This gets a number of user/developers who expect to be able to update
their package originally checked out with git: by using go get. An an
example is here[2].

[2] https://groups.google.com/forum/#!topic/golang-nuts/Zi-Kw2tbif8/discussion

The proposed fix by @rsc was to relax things so that non-custom import path doesn't get a custom import path check unless it has an import comment. The issue was closed by CL 10693.

Later, #10952 was reopened as #16471, that suggested that the fix in CL 10693 was not optimal, and suggested a better fix. This new issue #16471 was then closed by CL 31658, which changed the behavior so that the remote URL check applied only to custom import paths.

This issue is essentially a reopen of both those issues. I think I have a suggested fix for the original problem that was reported in #10952. Let me explain the problem as I understand it.

Problem

Let's consider GitHub repos, since this is where the issue comes up most often.

GitHub supports cloning repos via HTTPS and SSH. The latter can be expressed as a normal URL (ssh://github.com/user/repo) or more commonly as a SCP-style address (git@github.com:user/repo).

A weird/unfortunate thing about GitHub repos is that the remote URL they're cloned from can have an optional ".git" suffix. In other words, these two remote URLs are equivalent:

  • git clone https://github.com/user/repo
  • git clone https://github.com/user/repo.git

When doing go get github.com/user/repo for the first time, cmd/go will use https://github.com/user/repo as the default remote URL (without suffix).

When visiting https://github.com/user/repo in your browser, GitHub's UI usually displays clone URLs with the optional ".git" suffix:

image

Next, some people prefer to use SSH URLs instead of HTTPS for cloning their git repositories for various reasons. This is what the original post in https://groups.google.com/forum/#!topic/golang-nuts/Zi-Kw2tbif8/discussion was asking about.

The change in CL 10693 effectively allowed using SSH URLs instead of HTTPS, by making remote URL checking apply only to packages with an import path comment (at the time). Later, it got changed in CL 31658 to apply to custom import paths only.

I think we should allow SSH URLs, but I believe there is a better way to do it without completely removing the remote URL check for non-custom import paths.

Proposed Fix

I think we should fix this issue, as well as the original issue from #10952, by applying the remote URL check to all repositories, but:

  1. Ignore schema during URL comparison.

    These three remote URLs should be considered equivalent and valid for import path github.com/user/repo:

    • https://github.com/user/repo
    • ssh://github.com/user/repo
    • git@github.com:user/repo


    I've implemented this functionality in EqualRepoURLs.

  2. (Optional) Consider ignoring ".git" suffix for "github.com/xxx/yyy" remote URLs.

    If we do this, these URLs would be considered equivalent and valid for import path github.com/user/repo:

    • https://github.com/user/repo
    • https://github.com/user/repo.git
    • ssh://github.com/user/repo
    • ssh://github.com/user/repo.git
    • git@github.com:user/repo
    • git@github.com:user/repo.git


    I would really really prefer NOT to do this step, and instead tell people not to use the unneeded ".git" suffix. go get doesn't set it, so it only happens when people modify the remote URL manually. I outlined my rationale for why I think this should not be done in shurcooL/gostatus#37 (comment).

    However, I recognize that if we don't do this, some people may complain and report bugs (as has happened to me in my tools here, here, here). I don't want this fix to get reverted because of that.

  3. When displaying error message about mismatching URL, format the expected repo URL using the schema and URL that the user has. Doing this is critical to support users who use non-default schema, otherwise the error message is confusing.

    For example, the following error message is confusing because it's hard to see where the difference in URL is:

    remote URL doesn't match repo URL inferred from import path:
               (actual) git@github.com:wrong/repo.git
             (expected) https://github.com/user/repo
    

    This error message is much more clear, because it prints the expected repo URL using same scheme and address format (SCP-style address) as user's incorrect remote URL:

    remote URL doesn't match repo URL inferred from import path:
               (actual) git@github.com:wrong/repo.git
             (expected) git@github.com:user/repo
    

    (You can look at shurcooL/gostatus#37 or shurcooL/Go-Package-Store#65 for real-world examples where this happened users of my tools, before I implemented improved printing of expected URL in shurcooL/gostatus#39.)

    I've implemented this functionality in FormatRepoURL.

I'm happy to work on a CL if this gets approval.

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Apr 20, 2017

Member

I agree with pretty much everything you've described here. 1 and 3 seem like net wins.

As far as 2:

Consider ignoring ".git" suffix for "github.com/xxx/yyy" remote URLs.

What about always ignoring .git suffixes? I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway. I don't think being lax here could cause any issues for people trying to publish two packages or two repos called foo and foo.git that were different, as that would make little sense to begin with.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains. Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

Member

mvdan commented Apr 20, 2017

I agree with pretty much everything you've described here. 1 and 3 seem like net wins.

As far as 2:

Consider ignoring ".git" suffix for "github.com/xxx/yyy" remote URLs.

What about always ignoring .git suffixes? I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway. I don't think being lax here could cause any issues for people trying to publish two packages or two repos called foo and foo.git that were different, as that would make little sense to begin with.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains. Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

@dmitshur dmitshur referenced this issue in shurcooL/gostatus Apr 20, 2017

Closed

Relax remote comparison #41

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Apr 20, 2017

Member

What about always ignoring .git suffixes?

That is an option, but it seems worse than ignoring ".git" suffixes for "github.com/xxx/yyy" URLs only.

I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway.

I'd prefer to do things correctly. And this issue is about fixing the lax.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains.

The simplest solution is not to do anything special regarding ".git" suffixes.

When fetching new packages using go get, it doesn't add the ".git" suffix. So as long as the user doesn't manually modify the remote URL to be different, there shouldn't be a problem. Most people who I've seen doing this were doing it because they didn't know it was okay not to leave out ".git" suffix or that it was causing issues. Once they learned that, they fixed their remote URLs with agreement that it's best thing to do in long term (despite the annoyance of updating many repos).

I know that GitHub does not allow the repository name to end with ".git" at this time, so it's relatively safe to treat ".git" suffix as harmless for "github.com/xxx/yyy" URLs:

image

However, if you also ignore ".git" suffix for non-github.com URLs, that means it will apply to other VCSes, including Bazaar, Mercurial, and Subversion. It's possible some code host will not have such rule, so example-mercurial-host.com/user/repo and example-mercurial-host.com/user/repo.git may be very different repos.

If you still want not to verify that each package has been checked out from the source control repository implied by its import path, the -f flag lets you do that.

Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

There shouldn't be any issues with doing go get or go get -u on packages on self-hosted GitLab instances. If there are, please report it as a separate issue.

Member

dmitshur commented Apr 20, 2017

What about always ignoring .git suffixes?

That is an option, but it seems worse than ignoring ".git" suffixes for "github.com/xxx/yyy" URLs only.

I know this is technically incorrect and could make the tool lax when it shouldn't be, but the tool is way more lax at the moment anyway.

I'd prefer to do things correctly. And this issue is about fixing the lax.

I'm simply trying to keep the rules simpler and not dependant on hard-coding github.com and other well-known domains.

The simplest solution is not to do anything special regarding ".git" suffixes.

When fetching new packages using go get, it doesn't add the ".git" suffix. So as long as the user doesn't manually modify the remote URL to be different, there shouldn't be a problem. Most people who I've seen doing this were doing it because they didn't know it was okay not to leave out ".git" suffix or that it was causing issues. Once they learned that, they fixed their remote URLs with agreement that it's best thing to do in long term (despite the annoyance of updating many repos).

I know that GitHub does not allow the repository name to end with ".git" at this time, so it's relatively safe to treat ".git" suffix as harmless for "github.com/xxx/yyy" URLs:

image

However, if you also ignore ".git" suffix for non-github.com URLs, that means it will apply to other VCSes, including Bazaar, Mercurial, and Subversion. It's possible some code host will not have such rule, so example-mercurial-host.com/user/repo and example-mercurial-host.com/user/repo.git may be very different repos.

If you still want not to verify that each package has been checked out from the source control repository implied by its import path, the -f flag lets you do that.

Always ignoring .git would also make it work for self-hosted GitLab instances, for example.

There shouldn't be any issues with doing go get or go get -u on packages on self-hosted GitLab instances. If there are, please report it as a separate issue.

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Apr 23, 2017

Member

I agree with you then - especially given how go get is supposed to work with repos that aren't git and that it already has the -f option.

I still think this could cause people reporting bugs like "go get fails due to .git suffix" on this issue tracker. Perhaps the error it gives in that case could be improved, to tell the user to either fix the remote URL or use -f?

Member

mvdan commented Apr 23, 2017

I agree with you then - especially given how go get is supposed to work with repos that aren't git and that it already has the -f option.

I still think this could cause people reporting bugs like "go get fails due to .git suffix" on this issue tracker. Perhaps the error it gives in that case could be improved, to tell the user to either fix the remote URL or use -f?

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Apr 26, 2017

Member

Somewhat relevant, #20115. /cc @josharian

Member

dmitshur commented Apr 26, 2017

Somewhat relevant, #20115. /cc @josharian

@dmitshur dmitshur added the GoCommand label Jul 23, 2017

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Aug 19, 2017

Member

Now that that Go 1.10 tree is open, I'd like to begin working on this, if this gets approval. It's better to get this done sooner in the cycle rather than later, so that the fix can be evaluated and adjusted as neccessary before feature freeze.

Here's what I need to know to proceed:

  1. Does my proposed fix sound good?
  2. Should I include the optional step 2 or skip it? Unless there are strong opinions, I'd prefer to leave it out initially and see how that goes. If there's a lot of complaints and this entire bug fix is at risk of being reverted, then I'll implement optional step 2 to resolve those complaints.

Ideally, I'd want to hear back from the people who worked on the previous versions of this issue. /cc @rsc @zombiezen

Member

dmitshur commented Aug 19, 2017

Now that that Go 1.10 tree is open, I'd like to begin working on this, if this gets approval. It's better to get this done sooner in the cycle rather than later, so that the fix can be evaluated and adjusted as neccessary before feature freeze.

Here's what I need to know to proceed:

  1. Does my proposed fix sound good?
  2. Should I include the optional step 2 or skip it? Unless there are strong opinions, I'd prefer to leave it out initially and see how that goes. If there's a lot of complaints and this entire bug fix is at risk of being reverted, then I'll implement optional step 2 to resolve those complaints.

Ideally, I'd want to hear back from the people who worked on the previous versions of this issue. /cc @rsc @zombiezen

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Oct 4, 2017

Member

Now that that Go 1.10 tree is open, I'd like to begin working on this, if this gets approval.

Friendly ping @rsc; I need your input on the two questions above to be able to proceed here.

Member

dmitshur commented Oct 4, 2017

Now that that Go 1.10 tree is open, I'd like to begin working on this, if this gets approval.

Friendly ping @rsc; I need your input on the two questions above to be able to proceed here.

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Oct 10, 2017

Member

Ping @rsc.

Member

dmitshur commented Oct 10, 2017

Ping @rsc.

@dmitshur dmitshur added the Security label Oct 10, 2017

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Oct 10, 2017

Contributor

I'd kind of prefer not to do anything with this until package management is in a clearer state. It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

Contributor

rsc commented Oct 10, 2017

I'd kind of prefer not to do anything with this until package management is in a clearer state. It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Oct 10, 2017

Member

Thanks for taking a look.

It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

I take it you mean that this would negatively affect people who try to use a github fork as a remote, without changing the import path of the original package, right?

I was going to say that "I realize that security and convenience are often at odds with each other, but shouldn't we prioritize security over convenience anyway?", but after thinking more, I think it's important to consider the threat level and amount of inconvenience caused. In this case, I agree that the threat level is low, so I'm okay with waiting.

One of the reason I'm not worried about doing go get -u on various github.com/... packages myself is because I often use gostatus, which would warn me if the remote doesn't match the import path. If I didn't have that, I would feel slightly more nervous when doing go get -u knowing I effectively did go get -u -f instead due to this issue. But I think that it's quite hard to abuse this bug to attack someone, and people who care very strongly about it can rely on an external tool to verify that the remote URLs of their Go packages match the import path.

If/when you think it's a better time to revisit this, please let me know, I might want to work on it.

Member

dmitshur commented Oct 10, 2017

Thanks for taking a look.

It seems like there is no way to satisfy everyone here and making changes will just change which subset of people are satisfied.

I take it you mean that this would negatively affect people who try to use a github fork as a remote, without changing the import path of the original package, right?

I was going to say that "I realize that security and convenience are often at odds with each other, but shouldn't we prioritize security over convenience anyway?", but after thinking more, I think it's important to consider the threat level and amount of inconvenience caused. In this case, I agree that the threat level is low, so I'm okay with waiting.

One of the reason I'm not worried about doing go get -u on various github.com/... packages myself is because I often use gostatus, which would warn me if the remote doesn't match the import path. If I didn't have that, I would feel slightly more nervous when doing go get -u knowing I effectively did go get -u -f instead due to this issue. But I think that it's quite hard to abuse this bug to attack someone, and people who care very strongly about it can rely on an external tool to verify that the remote URLs of their Go packages match the import path.

If/when you think it's a better time to revisit this, please let me know, I might want to work on it.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Oct 10, 2017

Contributor

OK, sounds good. Once we have a decent package management story I think that will probably eliminate "automatic checkouts from version control systems" as a distribution mechanism, and with it many of these problems. (Of course we will retain the ability to work from checkouts if you do them yourself, but the more we get into problems with the subtleties here the more clear it is that version control systems are too complex to be the general default mechanism.)

Contributor

rsc commented Oct 10, 2017

OK, sounds good. Once we have a decent package management story I think that will probably eliminate "automatic checkouts from version control systems" as a distribution mechanism, and with it many of these problems. (Of course we will retain the ability to work from checkouts if you do them yourself, but the more we get into problems with the subtleties here the more clear it is that version control systems are too complex to be the general default mechanism.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment