New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: add modvendor sub-command #27618

Open
myitcv opened this Issue Sep 11, 2018 · 12 comments

Comments

Projects
None yet
5 participants
@myitcv
Member

myitcv commented Sep 11, 2018

Creating this issue as a follow up to #26366 (and others).

go mod vendor is documented as follows:

Vendor resets the main module's vendor directory to include all packages
needed to build and test all the main module's packages.
It does not include test code for vendored packages.

Much of the surprise in #26366 comes about because people are expecting "other" files to also be included in vendor.

An alternative to the Go 1.5 vendor is to instead "vendor" the module download cache. A proof of concept of this approach is presented here:

https://github.com/myitcv/go-modules-by-example/blob/master/012_modvendor/README.md

Hence I propose go mod modvendor, which would be documented as follows:

Modvendor resets the main module's modvendor directory to include a 
copy of the module download cache required for the main module and its 
transitive dependencies.

Name and the documentation clearly not final.

Benefits (WIP)

  • Eliminates any potential confusion around what is in/not in vendor
  • Easier to contribute patches/fixes to upstream module authors (via something like gohack), because the entire module is available
  • The modules included in modvendor are an exact copy of the original modules. This makes it easier to check their fidelity at any point in time, with either the source or some other reference (e.g. Athens)
  • Makes clear the source of modules, via the use of GOPROXY=/path/to/modvendor. No potential for confusion like "will the modvendor of my dependencies be used?"
  • A single deliverable
  • Fully reproducible and high fidelity builds (modules in general gives us this, so just re-emphasising the point)
  • ...

Costs (WIP)

  • The above steps are currently manual; tooling (the go tool?) can fix this
  • Reviewing "vendored" dependencies is now more involved without further tooling. For example it's no longer possible to simply browse the source of a dependency via a GitHub PR when it is added. Again, tooling could help here. As could some central source of truth for trusted, reviewed modules (Athens? cc @bketelsen @arschles)
  • ...

Related discussion

Somewhat related to discussion in #27227 (cc @rasky) where it is suggested the existence of vendor should imply the -mod=vendor flag. The same argument could be applied here, namely the existence of modvendor implying the setting of GOPROXY=/path/to/modvendor. This presupposes, however, that the idea of modvendor makes sense in the first place.

Background discussion:

https://twitter.com/_myitcv/status/1038885458950934528

cc @StabbyCutyou @fatih

cc @bcmills

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 11, 2018

I don't think the proposed resets the main module's modvendor directory behavior is quite the right workflow.

One of the benefits of versioned modules over vendoring is that they can reduce redundancy globally: instead of N copies of the same code spread across N repos, we can have a single canonical copy shared by all builds of those repos. A per-module modvendor cache would revert that advantage.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 11, 2018

Instead, perhaps we should make it easier to maintain per-user or per-organization module proxies.

For example, we could add an optional argument to go mod download to tell it where to save the downloaded modules.

go mod download $path could copy all active modules to $path, and go mod verify $path could verify that the modules already stored in $path match the go.sum of the current module. Then, the modvendor operation would essentially be:

go mod download $GOPROXY
go mod verify $GOPROXY

Then the user could commit the contents of $GOPROXY to a separate (personal or org-wide) repository.

@bcmills bcmills added this to the Go1.12 milestone Sep 11, 2018

@flibustenet

This comment has been minimized.

flibustenet commented Sep 11, 2018

We should also do the opposite, to fill the cache from a downloaded directory.

$ go mod download -export $path 

somewhere else, maybe an other machine
$ go mod download -import $path
that will fill the cache
@bcmills

This comment has been minimized.

Member

bcmills commented Sep 11, 2018

@flibustenet GOPROXY already does the opposite: GOPROXY=$path go mod download populates the active modules into the user's module cache from an arbitrary directory.

We don't currently have a command that populates more than the active modules, but that seems like a job for rsync or git rather than go itself.

@myitcv

This comment has been minimized.

Member

myitcv commented Sep 11, 2018

@bcmills

I don't think the proposed resets the main module's modvendor directory behavior is quite the right workflow.

I think there are actually two use cases here:

  1. "vendoring" all dependencies within the same repo as the module(s) that depend on them
  2. a per-user/organisation module proxy repo, separate from the repo(s) that use it

I should update the description to make clear that this issue is trying to address point 1. Hence why I think the logic to "reset the main module's modvendor directory" is correct; because I don't want this directory to grow like a cache.

Point 2 is the approach I've taken with https://github.com/myitcv/cachex, which is the "organisation repo" for https://github.com/myitcv/x, my mono repo. In this case, https://github.com/myitcv/cachex is an append-only repo that is a cache, and hence grows over time. It's separate from (and a subset of) $GOPATH/pkg/mod/cache/download because that can (and does) include downloads of private repos that I don't want made public. As you say, this approach reduces redundancy. Your proposal of go mod download $path is effectively what I do via bash with a GOPROXY+GOPATH+rsync dance; in this situation, I agree, I don't want the reset semantics.

But I can see use cases (i.e. deploying code or similar) where there is real benefit in point 1, for everything to be "bundled (in the same repo).

Assuming we want to address/support both use cases (and it seems sensible to my mind to do so), they could be solved by the same sub-command; I'm certainly not precious about that 😄. But I think there are separate use cases to cover here.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 11, 2018

I can see use cases (i.e. deploying code or similar) where there is real benefit in point 1, for everything to be "bundled" (in the same repo).

I'm not certain about those cases one way or the other. Given versioning, it seems like you can address all of the same use-cases — and more! — using a separate repository. If folks are doing the cost/benefit analysis and coming to a different conclusion, I'd like to see more of the details of the costs and benefits involved (beyond just “that's the way we've done things without versioning”).

@myitcv

This comment has been minimized.

Member

myitcv commented Sep 11, 2018

If folks are doing the cost/benefit analysis and coming to a different conclusion, I'd like to see more of the details of the costs and benefits involved (beyond just “that's the way we've done things without versioning”).

I'd second this request because, unless it wasn't clear already, I'm a fan of point 2.

I'm only putting up point 1 as a "better" alternative to go mod vendor (better in the sense that it doesn't suffer from the pitfalls associated with #26366 amongst other things). But, and I totally grant you this, I haven't articulated all (any?) of the costs associated with keeping workflows oriented around a single repo, a la vendor.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 12, 2018

Hmm. With the go mod download $path approach, it's still possible to put $path in the same repository (cutting it off from the modules in that repo using an explicit go.mod file, or perhaps with a well-known subdirectory such as vendor/mod/ or vendormod/), and you can even unpack it easily with a single command (GOPROXY=$path go mod vendor).

@myitcv

This comment has been minimized.

Member

myitcv commented Sep 12, 2018

Yes absolutely; I think the only difference between these two use cases is the use of "reset" semantics or not.

myitcv added a commit to go-modules-by-example/index that referenced this issue Sep 12, 2018

@sanguohot

This comment has been minimized.

sanguohot commented Sep 27, 2018

modules shared is very important, but there still would be some no share cases.
A litte like NPM without -g flag.

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 24, 2018

Replying to the original benefits:

  • Eliminates any potential confusion around what is in/not in vendor

Having two ways to populate vendor does not seem like it would eliminate confusion.

  • Easier to contribute patches/fixes to upstream module authors (via something like [gohack (https://github.com/rogpeppe/gohack)), because the entire module is available

We should address gohack, but modvendor does not seem like the right way to do it.

  • The modules included in modvendor are an exact copy of the original modules. This makes it easier to check their fidelity at any point in time, with either the source or some other reference (e.g. Athens)

It would be better to make go verify work with the pruned vendor directories, if that's a concern.

  • Makes clear the source of modules, via the use of GOPROXY=/path/to/modvendor. No potential for confusion like "will the modvendor of my dependencies be used?"

This is doubling down on vendor. We want to move in the opposite direction.

  • A single deliverable

I don't know what this means.

  • Fully reproducible and high fidelity builds (modules in general gives us this, so just re-emphasising the point)

No actual benefit here, right?

I don't see what the problem is here, really, and I think it's very important not to pull in the entire module just to get one package. Because you're not just pulling in that one module, you're pulling in (at least references to) its dependencies.

@myitcv

This comment has been minimized.

Member

myitcv commented Oct 28, 2018

Thanks for the reply @rsc. Taking your responses slightly out of order:

Easier to contribute patches/fixes to upstream module authors (via something like [gohack (https://github.com/rogpeppe/gohack)), because the entire module is available

We should address gohack, but modvendor does not seem like the right way to do it.

Agreed, this doesn't make sense to solve with modvendor; not sure what I was thinking here. gohack get has a -vcs flag for just this purpose.

This is doubling down on vendor. We want to move in the opposite direction.

Just to be clear, I'm also trying to move away from vendor (the vendor directory as in the Go 1.5 definition) and the concept of "vendoring" more generally (and modvendor falls into this bucket), because there are better solutions to the problems that vendor/"vendoring" try to solve.

My thinking was that something like modvendor could be a useful stepping stone away from the vendor directory to proxies etc.

Eliminates any potential confusion around what is in/not in vendor

Having two ways to populate vendor does not seem like it would eliminate confusion.

modvendor uses a modvendor directory, not the vendor directory. The thinking being that a differently named directory forces the user to ask "what can I expect to be in modvendor" as opposed to being confused on "what is in vendor."

A single deliverable

I don't know what this means.

Poorly worded. One of the main reasons people like the vendor directory is that it removes service/network dependencies beyond the initial clone, there is nothing else to configure, no second repository to commit etc. modvendor achieves a similar effect - there is just one thing in play.

Fully reproducible and high fidelity builds (modules in general gives us this, so just re-emphasising the point)

No actual benefit here, right?

Agreed, if we can get go verify to work on the contents of the vendor directory. The only minor point I was making here was that it's very easy to modify the contents of your vendor directory and not run go verify either locally or enforce it as part of CI. It's harder to modify the contents of modvendor in the first instance. Case in point being https://github.com/goware/modvendor et al which exists to copy additional files to the vendor directory, files that are already in the module.

I don't see what the problem is here, really, and I think it's very important not to pull in the entire module just to get one package. Because you're not just pulling in that one module, you're pulling in (at least references to) its dependencies.

At least the way I intended to implement my trial of modvendor was to only pull in the modules that are required, so hopefully I only pull in references to their dependencies.

But I'm quite prepared to accept that modvendor might not be the right or even a necessary stepping stone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment