Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor all dependencies #1651

Merged
merged 1 commit into from May 28, 2017
Merged

Vendor all dependencies #1651

merged 1 commit into from May 28, 2017

Conversation

mholt
Copy link
Member

@mholt mholt commented May 6, 2017

1. What does this change do, exactly?

Vendors all Caddy dependencies. (Plugins not included.)

2. Please link to the relevant issues.

#200, #630, #632, #709, Homebrew/homebrew-core#9233 - probably others too

3. Which documentation changes (if any) need to be made because of this PR?

Remains to be seen. Probably none for most use cases, but this could affect the build server.

The contributing instructions will need to be updated, in case contributors need to change or add dependencies. They'll need to use gvt. (Should be a simple go get to install gvt then gvt fetch -revision to update to a specific version, if not just master.)

4. Checklist

(irrelevant)


This is still an experiment! This is now ready to merge.

I've held out on vendoring on purpose. Before the GO15VENDOREXPERIMENT, there were a variety of third party tools to help with dependency management, but the official go tooling has been ramping up support for vendoring in recent years, and I wanted to see how this would turn out. When the GO15VENDOREXPERIMENT was made official, toolchain support continued to improve and is still continuing to do so even up to Go 1.9. Because of this unstable ecosystem around dependency management and even vendoring specifically, I decided to wait.

Also, the old build server would not have done well with vendoring.

But now, we have new, autonomous build infrastructure. It runs standard go and git commands. The go command now respects the vendor/ folder and "just works" with go get. I'm hoping this will all work together to make reproducible builds possible.

Yes, this change adds 846 889 files and 391,183 395,378 lines of code to the repo. 馃槵

This PR has no consideration for plugins; I don't intend for this initial step to make Caddy builds with third-party plugins reproducible as well. It's possible that they could vendor their own dependencies but that requires a lot more experimenting; I want to make sure we don't run into problems like this: https://github.com/mattfarina/golang-broken-vendor -- but it is on my list to figure out eventually.

I will be experimenting with reproducing Caddy builds using this branch, and I ask anyone else interested in this to try it out as well. Before taking this step, we want to make sure we get it right!

(Note to reviewers: Obviously, you don't have to review all the changed files, as these are not part of the Caddy project! The reviews should be focused on our approach to vendoring and achieving reproducible builds. So it's not a normal code review, in that sense.)

TODO:

@mholt mholt added the CI/CD 馃敥 Automated tests, releases label May 6, 2017
@mholt mholt self-assigned this May 6, 2017
@mholt
Copy link
Member Author

mholt commented May 6, 2017

Okay, case in point: the go tooling does not yet ignore the vendor/ folder when running go vet ./... or go test ./... but it will by Go 1.9. Disregard the CI failures for now, I guess.

@francislavoie
Copy link
Member

francislavoie commented May 6, 2017

Shouldn't you .gitignore everything in vendor except manifest?

The manifest in this setup seems like the lock file, and dependencies should be able to be reinstated based on its definition.

I'm taking this from my experience with things like Composer for PHP, didn't really explore how dependency management works in Go yet.

Edit: I guess along with this, why not use Glide?

Edit2: Looking at the usually release schedule for Go (in the past 2 years), Go 1.9 should be expected in August I think? So this PR is aiming for that, if I'm correct?

@mholt
Copy link
Member Author

mholt commented May 6, 2017

@francislavoie Great questions.

Shouldn't you .gitignore everything in vendor except manifest?

No, because then all developers who wanted to "go get" Caddy would have to use gvt or whatever tool helped make the vendor folder. One goal here is to stick to the stick to the standard go tooling rather than lock ourselves into yet another dependency.

The manifest in this setup seems like the lock file, and dependencies should be able to be reinstated based on its definition.

This is true, if the tool you're using supports that format. But most people will just need go get as usual; only maintainers will need to know what the exact revision is and/or update them (gvt wouldn't even necessarily be required for that but makes it convenient, it's pretty lightweight).

didn't really explore how dependency management works in Go yet.

It doesn't. 馃槄

Edit: I guess along with this, why not use Glide?

I don't think we're going for package management here, all we need is to vendor dependencies to achieve reproducible builds.

Edit2: Looking at the usually release schedule for Go (in the past 2 years), Go 1.9 should be expected in August I think? So this PR is aiming for that, if I'm correct?

Possibly. If I can find a nice way to run tests on CI and the build server without too many changes (i.e. to ignore the vendor/ folder), we can get it in sooner.

@abiosoft
Copy link

abiosoft commented May 6, 2017

As much as this gives stability ie. clone once and build or use regular go get, I'm wary of keeping this much code in the repo.

I think anyone willing to build from source is ready to execute more than one command anyway and I do not think the simplicity of go get should take higher priority than the size of the repo.

@mholt
Copy link
Member Author

mholt commented May 6, 2017

@abiosoft I know, the size of this makes me squeamish. It's not a comfortable choice and frankly one I don't/didn't really think is/was necessary.

But I realize that Caddy sits in a very vulnerable position: at the center of millions of secured connections, between clients connecting from around the world and sensitive data stored in the machine on which it's running. We can make Caddy open source, and we can make binaries conveniently available, but the only sure way to know that your binary is as intended is to compare it to compiling the source code yourself.

So it's not so much a matter of convenience (running one command or multiple -- and in this case, it's still just one command for everyone), it's security. Build reliability. And in the future, (for example) customers and open source users both will appreciate being able to obtain an older version (if absolutely needed) and be confident that it's the same as what worked for them before.

Doing go get github.com/mholt/caddy/caddy still pulls down all this code and compiles it, but it pulls it from various parts around the Internet, or various accounts on GitHub which can disappear or be compromised and are frankly out of our control. By vendoring we assume control over the code without assuming the maintenance burden (no more than if the code was in a separate repo). Instead of pulling down dozens of different repositories, the go command will find everything it needs in this repo.

Now, I'm still new to vendoring. So I find the feedback valuable, keep it coming. I'm still learning a lot.

@francislavoie
Copy link
Member

Alright that explanation satisfies me. Still feels a bit gross but I understand the reasoning now.

FYI I brought up Glide because it does have a proper lock file. Your point about security and data ownership makes sense though, for long term.

I think there will need to be some clarification about how to contribute if you need to change dependencies in a PR. It was easier before, but now that they are all in the repo, that does change things a bit in those cases.

@tw4452852
Copy link
Collaborator

I'm just afraid it adds additional burden of patch porting from/to upstream.

@elcore
Copy link
Collaborator

elcore commented May 6, 2017

@mholt

I know, the size of this makes me squeamish. It's not a comfortable choice and frankly one I don't/didn't really think is/was necessary.

I guess we all are not really comfortable with that choice

But I realize that Caddy sits in a very vulnerable position: at the center of millions of secured connections, between clients connecting from around the world and sensitive data stored in the machine on which it's running. We can make Caddy open source, and we can make binaries conveniently available, but the only sure way to know that your binary is as intended is to compare it to compiling the source code yourself.

Although I do not feel comfortable with this PR, I understand that this PR serves a greater purpose! We achieve transparency, security, reliability and reproducibility, which is something Caddy/(we as a community) should stand for.

@mholt
Copy link
Member Author

mholt commented May 6, 2017

@francislavoie

I think there will need to be some clarification about how to contribute if you need to change dependencies in a PR. It was easier before, but now that they are all in the repo, that does change things a bit in those cases.

@tw4452852

I'm just afraid it adds additional burden of patch porting from/to upstream.

Yup, this is true -- you are both right. Contributors who want to change or update a dependency will need gvt. Fortunately, this doesn't happen often. And I will definitely update the contributing instructions when this is finalized. (It's easy though, just go get the gvt command, then gvt fetch -revision ... should do the trick.)

@raggi
Copy link

raggi commented May 6, 2017

The repo weight shouldn't be a major concern. I've done this many times, it's fine. Just an experienced opinion.

@ghost
Copy link

ghost commented May 6, 2017

Good decision @mholt , this a must for all projects like this, I did that on Iris a month ago and I can tell that it works like a charm. After this any caddy version will work no mater what will happen later on, with this change you are not depending on dependencies' repository and other's decisions. Caddy's users and mods should embrace this!

@bep
Copy link

bep commented May 7, 2017

@mholt I'm not a Caddy contributor, so my interest in this is mainly general. My main objection about putting soooo much third-party code into /vendor is that it clutters up the Git log (who are this repo's contributors after such a massive commit? Who is the "Git author" attached to that first bulk commit?).

If you, somehow, could convince the people at @github to somehow exclude commits to vendor folders on Go projects ...

@FiloSottile
Copy link

I'm a big proponent of checking in the vendor folder. It takes control over the availability of the dependencies, on top of the security and reliability advantages of lock files. Think left-pad.

I think the friction to updating dependencies is actually a feature, since it stops the owner of one of the 13 repositories Caddy depends on from slipping in a backdoor or bug, intentionally or not, and then slip it out the next day.

About the git history, gvt does not import the git history of the dependencies, so you only see as many commits as having a lockfile. The only thing that is hampered are line count stats, which were never useful IMHO.

Finally, consider that even if you stay on top of the current build with CI, there is no way at all to build Caddy as it was a year ago.

Disk is cheap, let's vendor.

@FiloSottile
Copy link

@mholt Personally, I wouldn't characterize vendoring as immature. Package management is changing a lot, but the vendor folder is here to stay. If you don't want it to be included in ./..., you can use the explicit:

go test $(go list ./... | grep -v /vendor/)

@tw4452852
Copy link
Collaborator

Maybe a reason for waiting for go1.9:
https://twitter.com/deeeet/status/864303177599827968

@mholt
Copy link
Member Author

mholt commented May 16, 2017

@tw4452852 It's quite possible I'll change the CI scripts to use the workaround that @FiloSottile suggested, but yeah, 1.9 will make things convenient again. I'm also going to need to add vendorcheck to our CI tests.

@mholt
Copy link
Member Author

mholt commented May 27, 2017

The main holdup with this PR now is figuring out how to handle the fact that the "vendor" folder is in the top level of the repo but the top level of the repo is a package that all plugins import, and some other applications use it as a library, too (though this is very uncommon--even rare).

So as I understand how this works, other repos that import github.com/mholt/caddy or go get github.com/mholt/caddy will end up pulling down all the vendored code too... but this is not necessary or desirable unless the main function is the target, i.e. the developer wants to install Caddy from source.

I suppose doing a go get github.com/mholt/caddy/caddy (which is the current install command for the caddy executable) will still git clone the whole repo, which includes the vendor folder somewhere in there. (Yeesh, maybe I should have put the main package in a separate repo?)

I guess I'm wondering... is any of this a real concern? Should I move the github.com/mholt/caddy package (that gets imported by all plugins) into a subfolder? It would mean everyone would have to update their import paths, but I think I can make the website's build server ease this transition by rewriting import paths until plugin authors update. So at least that's not so bad. But if it doesn't really solve a problem, or if the problem isn't real, then I'd much rather not do this.

If this isn't a real problem, then I can probably merge this soon enough without making breaking repo changes (I would prefer this!!)... thoughts?

@FiloSottile
Copy link

So wait, is the problem that the Caddy API exposes types from the dependencies, or just some concern about repository size? Because developer disk is very very cheap, cheaper than your time spent thinking about this.

@mholt
Copy link
Member Author

mholt commented May 27, 2017

@FiloSottile I've looked, and as far as I can tell, Caddy does not expose types from the dependencies (phew) so that's not the problem; but I've read numerous places that libraries shouldn't vendor their dependencies; only main packages (commands) should do that.

If that reason is because of the problem with exporting dependency types, then maybe this isn't actually a problem and I can merge this in today.

@bep
Copy link

bep commented May 27, 2017

The reason libraries should not vendor is, by what I have read, that they should take the burden of checking that the transitive dependencies "always" are OK. If libraries also vendored, you'd get these painfully rare and big updates where problems would be hard to track down, and it would be hard for, say commonly used libraries, to spread a security patch, etc.

@FiloSottile
Copy link

The reason many of us say not to vendor in libraries is the problem with exporting dependency types. The ecosystem point @bep makes is also valid, but I think Caddy is not a library like any other, it's a big monolithical piece of software that you can import, and programs that use Caddy as a library would IMHO benefit from using the same dependencies as the released version, if anything.

@mholt
Copy link
Member Author

mholt commented May 27, 2017

Alright -- thanks for the feedback. That makes sense, and I'm okay going forward as-is.

Now if I can just get these CI tests to pass (by ignoring the vendor folder)...

The vendor/ folder was created with the help of @FiloSottile's gvt and
vendorcheck.

Any dependencies of Caddy plugins outside this repo are not vendored.

We do not remove any unused, vendored packages because vendorcheck -u
only checks using the current build configuration; i.e. packages that
may be imported by files toggled by build tags of other systems.

CI tests have been updated to ignore the vendor/ folder. When Go 1.9 is
released, a few of the go commands should be revised to again use ./...
as it will ignore the vendor folder by default.
@mholt
Copy link
Member Author

mholt commented May 27, 2017

I've synced with master, updated the CI tests and contributing instructions, and squashed commits.

This is now ready for review. @abiosoft @tw4452852 - I will probably merge by tonight or tomorrow if I don't hear from anyone. A review can be a simple, empty "Approve" or you can request changes if anything comes to mind.

@tobya tobya self-requested a review May 28, 2017 07:07
Copy link
Collaborator

@tobya tobya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, however I think we have to accept their may be some issues once we do it, we can see what issues arise.

@mholt mholt merged commit 6ab0d8d into master May 28, 2017
@mholt mholt deleted the vendoring branch May 28, 2017 14:22
@sdboyer
Copy link

sdboyer commented May 30, 2017

I realize I'm commenting after the fact, but I wanted to add an informational note:

@FiloSottile I've looked, and as far as I can tell, Caddy does not expose types from the dependencies (phew) so that's not the problem; but I've read numerous places that libraries shouldn't vendor their dependencies; only main packages (commands) should do that.

If that reason is because of the problem with exporting dependency types, then maybe this isn't actually a problem and I can merge this in today.

In the ecosystem as it exists today, yes, it can be problematic for any project with importable libraries to commit, specifically because of the incompatible type leakage issue. That was one of the first major problems to be established with vendor after it defaulted to ON in Go 1.6. It sounds like you don't have a leakage problem, though, so 馃憤 .

The standard we're establishing with dep is that it's fine to commit vendor whether you're importable or not, or for libs or not, as vendor directories contained within dependencies are flattened up into a single, top-level vendor directory. (Any nested vendor dirs are unconditionally stripped out.) So, even if there are some types that leak through, your dependers that use dep would still be fine.

@ghost
Copy link

ghost commented Jun 3, 2017

@FiloSottile even with exported types, vendor worths it. If exported types are few there are plenty of solutions with the todays tools to make it work, it's tricky and needs maintenance but as I said, if there are 1-2 types there is no problem. The issue would be also solved by the typealias feature that is ready on go1.8.1_typealias branch of golang github repository, so we should expect it for go 1.9 on August :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD 馃敥 Automated tests, releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants