proposal: cmd/go: secure releases with transparency log #25530

FiloSottile · 2018-05-23T20:51:00Z

[The text of this proposal is outdated. Find the whole proposal here.]

This is a proposal for a long-term plan to provide transparency logs to verify the authenticity of Go releases. It's not something we are ready to implement anytime soon.

Transparency logs are append only Merkle trees which are easy to audit, and provide efficient proofs of inclusion. They are used for Certificate Transparency and are starting to be used for binary transparency.

They are a good fit for securing releases:

The log will fetch releases directly from the source, punting the spam issue on GitHub (etc.) or domain registries (as we can ban accounts/domains).
Clients will ask the log(s) for the release hash, and for proof that it was included in the append-only log. Module authors can audit the logs for their own projects, or get notified about new versions of it.
An hypothetical go release tool can trigger submission of the version to the log, and then verify that its hash matches what the developer has on disk. This is especially nice as it keeps the host (i.e. GitHub) honest.
Logs can also gossip with each other to make sure that a different version has not been observed before. (This is important so that two logs don't end up disagreeing on a version hash when the author changes the tag in between two log submissions.) go release can also check with logs that a version does not exist yet before tagging it.
Logs can be audited by third parties by comparing their entries to the packages fetched from git (maybe using the GitHub API to learn about new releases as soon as they are pushed) or by clients by comparing their global (cmd/go: maintain a GOPATH-wide go.sum #24117) or observed modverify files.
Proxies can be integrated with this system so that they will verify packages they are proxying. We can then support the concept of a trusted proxy, so that for example internal company systems will connect only to the proxy and not to the external logs.

The security of such a system is superior to what is provided by modverify, which is effectively pinning to the view of the developer adding the dependency. Transparency logs pin to the first time the version was globally observed, and with the go release workflow they pin directly to the view of the developer who created the dependency.

We can probably build the implementation on top of Trillian, a transparency log (and map) implementation which has the explicit concept of "personalities" for the custom use-case logic. (CT is a Trillian personality.)

Ideally, these logs would be operated by multiple players in the community, and a client could choose to trust or submit to any number of them.

We can build the tooling outside the go tool as a way to check/generate modverify entries to experiment until we feel comfortable with it.

The text was updated successfully, but these errors were encountered:

rsc · 2019-02-27T19:14:43Z

Not sure what the difference is between this issue and #24117.

FiloSottile · 2019-02-27T19:29:18Z

#24117 is about a system-wide go.sum (which might be made superfluous by this).

rsc · 2019-03-04T13:41:24Z

Got it. I retitled #24117 to avoid the confusion.

gopherbot · 2019-03-04T17:01:02Z

Change https://golang.org/cl/165018 mentions this issue: design: add 25530-notary.md

See https://golang.org/design/25530-notary. For golang/go#25530. Change-Id: I1b4add8fe1c2f6911e925bafab99eb7418aa67b4 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/165018 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>

rsc · 2019-03-04T17:02:41Z

Published formal proposal: https://golang.org/design/25530-notary.

FiloSottile · 2019-05-01T17:09:08Z

Hi @mkonda, 1 and 3 are indeed open issues. They are thankfully orthogonal to what the checksum database solves, so we can tackle them separately. (Making sure the content is authentic, vs making sure it's "secure".)

On 2, I don't really believe we can get widespread adoption of authentication beyond the code host. Even if we made every author sign their releases (which is unrealistic for a number of reasons), they will most likely just sign what's in their repository, effectively delegating that trust.

It is important though that we prevent proxies and attackers from publishing versions unbeknownst to the author, and the checksum database log helps greatly with that: any third-party auditor can offer a service to notify owners of new releases in their repositories, and I hope we will see many kinds of that service.

That, combined with go release #26420 checking the checksum database, ensures that users only see releases that match what was on the developer machine. I don't think we can do any better than that.

beoran · 2019-05-02T06:14:26Z

I realize I am late to complain, but while I understand the need for veryfing the integrity of go modules, what this proposal amounts to is using a single central server by default, creating a single point of failure.

Furthermore, if GOPROXY and GOSUMDB are set by default to central Google servers, then not only people in countries such as China, but people al over the world behind firewalls, such as restrictive corporate firewalls, will experience difficulties in using Go. All these people will be forced to use a custom go proxy with support for sumdb. It doesn't make for a great user experience that the first thing a Go users should do when they start using the language is that they have to configure a proxy and a sum DB. See #31755 for a related discussion.

In I'd like to ask if there no other way that checking the module checksums could be more decentralized, and give a better user experience for firewalled users?

FiloSottile · 2019-05-02T17:32:01Z

The design goal is to make sure everyone agrees on the same contents for the same version, so a degree of centralization is necessary. The point of the transparency log is to ensure that the log is not a single point of failure: compromising it is not enough because it will lead to detection by the auditors.

The checksum db is also designed to allow for decentralized proxies. As you mentioned, anyone that can reach any untrusted proxy can successfully use the sumdb. (This also makes the sumdb not a single point of failure for read availability.)

If you want to discuss the defaults, please use #31755. The sumdb design can support any conclusion that that issue comes to.

(As an example, consider a "blockchain", that you might think of as more decentralized. You still have to talk to nodes somehow. If you have a way to talk to a node, you can also speak the proxy protocol with them as-is, and reach the sumdb through that.)

To quote the “Releasing Modules (All Version)” section of the Go Modules wiki page [1]: Ensure your go.sum file is committed along with your go.mod file. See FAQ below [2] for more details and rationale. And the “Should I commit my 'go.sum' file as well as my 'go.mod' file?” section from the same page [2]: Typically your module's go.sum file should be committed along with your go.mod file. - go.sum contains the expected cryptographic checksums of the content of specific module versions. - If someone clones your repository and downloads your dependencies using the go command, they will receive an error if there is any mismatch between their downloaded copies of your dependencies and the corresponding entries in your go.sum. - In addition, go mod verify checks that the on-disk cached copies of module downloads still match the entries in go.sum. - Note that go.sum is not a lock file as used in some alternative dependency management systems. (go.mod provides enough information for reproducible builds). - See very brief rationale here [3] from Filippo Valsorda on why you should check in your go.sum. See the "Module downloading and verification" [4] section of the tip documentation for more details. See possible future extensions being discussed for example in golang/go#24117 and golang/go#25530.” [1] https://github.com/golang/go/wiki/Modules#releasing-modules-all-versions [2] https://github.com/golang/go/wiki/Modules#should-i-commit-my-gosum-file-as-well-as-my-gomod-file [3] https://twitter.com/FiloSottile/status/1029404663358087173 [4] https://tip.golang.org/cmd/go/#hdr-Module_downloading_and_verification

rsc · 2019-05-21T13:17:09Z

I'm going to send a CL that enables both the Go checksum database and the Go module mirror by default in module mode.

There remain issues to resolve with this proposal, so we cannot turn the checksum database on for all users. However, now that we are not enabling modules for all users, it seems reasonable to enable the checksum database for module users, so that we can more precisely understand the exact problems and develop solutions. People who aren't ready to move to modules yet will not be affected by enabling the checksum database, and modules users having particular trouble with the checksum database can turn it off for the specific modules (go env -w GONOSUMDB=mysite.com/*) or entirely (go env -w GOSUMDB=off), but I encourage them to file specific issues as well, so that we can address them.

If there are any show-stopper issues that we can't address before Go 1.13 is released, we will back out the change. But we need to understand better what the issues are, especially the as-yet-unknown ones.

Thanks.

gopherbot · 2019-05-21T20:05:03Z

Change https://golang.org/cl/178179 mentions this issue: cmd/go: default to GOPROXY=https://proxy.golang.org and GOSUMDB=sum.golang.org

…olang.org This CL changes the default module download and module verification mechanisms to use the Go module mirror and Go checksum database run by Google. See https://proxy.golang.org/privacy for the services' privacy policy. (Today, that URL is a redirect to Google's standard privacy policy, which covers these services as well. If we publish a more specific privacy policy just for these services, that URL will be updated to display or redirect to it.) See 'go help modules' and 'go help modules-auth' for details (added in this CL). To disable the mirror and checksum database for non-public modules: go env -w GONOPROXY=*.private.net,your.com/* go env -w GONOSUMDB=*.private.net,your.com/* (If you are using a private module proxy then you'd only do the second.) If you run into problems with the behavior of the go command when using the Go module mirror or the Go checksum database, please file issues at https://golang.org/issue/new, so that we can address them for the Go 1.13 release. For #25530. This CL also documents GONOPROXY. Fixes #32056. Change-Id: I2fde82e071742272b0842efd9580df1a56947fec Reviewed-on: https://go-review.googlesource.com/c/go/+/178179 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>

MichaelTJones · 2019-05-31T20:38:39Z

Editorial comments about the document...

The use of a transparent log for module hashes aligns with a broader trend of using transparent logs
to enable detection of misbehavior by partially trusted systems, what the Trillian team calls
“General Transparency.”

This is the first mention of Trillian in the doc (despite the linked papers). Maybe the term

needs elaboration or a link (https://github.com/google/trillian)

There are two main privacy concerns: exposing the text of private modules paths to the database, and
exposing usage information for public modules to the databas.

DATABAS => DATABASE

The complete solution for not exposing either private module path text or public module usage
information is to us a proxy or a bulk download.

US => USE

Privacy in CI/CD Systems

Acronym is never defined in document (https://en.wikipedia.org/wiki/CI/CD)

MichaelTJones · 2019-05-31T20:59:15Z

I'm (maybe stupidly) unable to grasp one of the core design issues in this design. The services claimed as valuable and necessary are about doing a lookup to validate a pending action ("about to do this and want to know data to use in verification") or about the past-tense version of the same.

It seems important to me that In both cases the client (go get, et al) start by knowing something that may be a secret (the import path and version number) which is the key to obtain the not-a-secret value of the checksum. Because of privacy, non-public code, likely misconfiguration, etc., it seems that the sometimes secret info is the last thing you'd want to use as the public key.

I read the part about possibly using "Private Module SHA256s" but that seemed to miss the point by focusing on the reverse mapping table and first-time publication issues. Here is what I don't understand--why does the database ever need the "clear text" import path and version number? Why must it ever be sent anywhere? It is only useful in that form (I think) to go get on the client side.

Instead, have go get et al create a hashed/encrypted by path/version token, locally, that has strong one-way attributes and use that as the query key, store that in databases, etc. In such a world the module checksum responses are just as easy to supply, are totally "transparent" on the data to be secured side (the checksum) and totally opaque on the other (machine names, import paths, versions, timestamps, traffic analysis, etc.) It sidesteps completely the issue of private data...which will be harmlessly useless to all.

What have I overlooked here?

MichaelTJones · 2019-06-01T04:41:05Z

Following up...because I forgot that this is not "lunch in Charlie's" and it is my duty to explain all the implications:

Yes, my proposal is about an opaque 256-bit one-way cryptographic hash of the query string (the go get info: path+version, perhaps hashed atop the hash of the module source in the Merkle-Damgard sense.)...

...and a transparent-but-otherwise meaningless 256-bit one-way cryptographic hash of the module source -- the existing answer to database queries.

This leads to an exactly 512-bit per record database that would be remarkably simple to maintain and serve, with any complexity being the existing dance around security-through-federation. (Which is nice!)

The result is a public database with properties beyond "encryption at rest" -- not one byte of this database tells you anything that can be used from the database end to know about module paths, machine/url paths, developer identity, etc. It is, in this way, a giant mystery and thus provably safe in any privacy sense no matter how it is configured or maintained, nothing can leak because nothing is there to leak.

Yet, when used the other way, it is fully supportive. go get or other tools want to know about an importable module: they internalize the request path+version, and query with that key. They get back the hash value for the module just as now.

What is lost, you might presume, is the joy of voyeuristically looking at database keys to build a network map of the Land of Go, and G-tools leveraging that map to monitor, report, and assist in various open-ended activities. Nothing in the rationale seems to argue for this map and its conceptual buildability from query strings shows the leaky nature of the present design. If it is not needed, then maybe it is not wanted.

However (this is the new part that I thought would be clear without mention but now I'm thinking that thinking is not in the spirit of distributed discussions) it happens to be true that a better map of the Land of Go is buildable in my opaque-key design. What is needed is a tool or company that knows how to crawl the public web looking for openly shared .go,.mod,... files and then download the import path and version strings from those. These can easily be interned to opaque keys and requests made to the database. When a key is present, then so is the version's hash. all of this--the three tuple of provably-shared-source path&version, the resulting key, and the stored hash--are then united for building the Land of Go tooling.

This way, the map if desired, is never built from private code because that code is not shared on the web. So provenance is provable. Security is implicit. Misconfiguration can't hurt. That's what I meant.

FiloSottile · 2019-06-01T09:43:54Z

The reason module names need to be available in plaintext in the database is for auditing purposes. A transparent log is only useful if it is scrutinized and held accountable, and to perform a number of checks the auditors need to know what the module names are.

An example: we will want a notification service that can email me for any new module like github.com/FiloSottile/..., so I get to know if fake modules are being published.

Also, the search space is not that wide so hashes can be reversed in most cases, which is why the private lookup proposal uses hash prefixes, to lean on the equivalent of k-anonymity.

(Thanks for the edits, I'll make a PR next week, but feel free to go ahead and make one in the meantime if you'd like.)

rsc · 2019-08-13T20:35:14Z

On May 21, I wrote:

I'm going to send a CL that enables both the Go checksum database and the Go module mirror by default in module mode.

There remain issues to resolve with this proposal, so we cannot turn the checksum database on for all users. However, now that we are not enabling modules for all users, it seems reasonable to enable the checksum database for module users, so that we can more precisely understand the exact problems and develop solutions. People who aren't ready to move to modules yet will not be affected by enabling the checksum database, and modules users having particular trouble with the checksum database can turn it off for the specific modules (go env -w GONOSUMDB=mysite.com/*) or entirely (go env -w GOSUMDB=off), but I encourage them to file specific issues as well, so that we can address them.

If there are any show-stopper issues that we can't address before Go 1.13 is released, we will back out the change. But we need to understand better what the issues are, especially the as-yet-unknown ones.

Go 1.13 beta has been out for a while with the checksum database enabled, and overall it seems to be working well. No show-stopper issues have been identified that I am aware of. We have not resolved the comments about wanting to change the content of a module without triggering an error, but the design of the system is meant to catch exactly that. And people who want not to be stopped can always turn off the checksum database.

We made it easier to turn off both the proxy and the checksum database together, selectively, with the new GOPRIVATE environment variable (see 'go help environment').

Overall it seems like the consensus here is that we can move forward with this and accept this proposal, since no show-stopper issues have been identified. Am I missing anything?

Will leave this open for a week to collect final comments.

(It is always fine to file a new issue for other problems found with the checksum database.)

rsc · 2019-08-14T13:30:34Z

One discussion on this issue was around the rather vague link to Google's standard privacy policy. We have posted a more detailed page about privacy and the proxy, sum, and index servers at https://proxy.golang.org/privacy. (Please file any feedback in separate issues.)

rsc · 2019-08-20T20:05:11Z

Marked this last week as likely accept w/ call for last comments (#25530 (comment)).
No comments, so accepting.

rsc · 2019-08-20T20:05:19Z

Already implemented, so closing.

vikyd · 2019-10-23T11:36:08Z

Where is the tree size of lookup endpoint from ?

The tree size from lookup is:

different from Latest
increase some time

Example:

Lookup: 338124

Latest: 338145

Lookup: https://sum.golang.org/lookup/github.com/gin-gonic/gin@v1.4.0

13914
github.com/gin-gonic/gin v1.4.0 h1:3tMoCCfM7ppqsR0ptz/wi1impNpT7/9wQtMZ8lr1mCQ=
github.com/gin-gonic/gin v1.4.0/go.mod h1:OW2EZn3DO8Ln9oIKOvM++LBO+5UPHJJDH72/q/3rZdM=

go.sum database tree
338124
Gsz639f3wDBB3gnyYzg58D9C91Cb9FWyvNrpltzl2uE=

— sum.golang.org Az3grjft44BfvQ3qiWzZRPWjK4wXbWLkf/BzMVlM3BgnbmnADL7AHSEm+v43AtYpFwS0glukjcqbIVXfq4hDvq0xNgg=

Latest: https://sum.golang.org/latest

go.sum database tree
338145
Megb2heVg8xuaXGRNBmkCPjA8EhHVXH7HkNuWpOImOY=

— sum.golang.org Az3grpMDXtFEl9vfNmeqqNKY6HORkeagC2TwQ6WA6WV5a0ykPurrlO1FvtxHEqLmvntnZawTbAw9OWSeOU8Il4NoGwg=

FiloSottile · 2019-10-23T18:14:09Z

It only has to be higher than the record number returned in that lookup, so that the client is guaranteed to have at least one STH that includes the record (while the latests endpoint might be cached). It is always reconciled with the previous latest STH on the client side.

vikyd · 2019-10-24T01:24:29Z

@FiloSottile
Yes, STH includes the record works fine.

But the tree size of lookup endpoint seems to be unpredictable, as it's always less than that in the latest endpoint from remote, and will change some time. (I reach the endpoint in browser to check)

I just want to find out the rule of the tree size in lookup endpoint

Where is source code or doc can tell the rule ?

FiloSottile · 2019-10-24T10:18:47Z

There is no rule, except that it must be higher than the record number.

IIRC it depends on the interaction between the various caches and the internal database lookup. The lookup responses are cached more aggressively than the latest one, but not forever. That should explain the behavior you see. This code is all pretty Google-specific, so it's not open source. The important property of the transparent log is that as long a the (open source) client checks the proofs correctly, you don't have to trust the server to operate honestly.

gopherbot added this to the vgo milestone May 23, 2018

rsc modified the milestones: vgo, vgo2 Jun 6, 2018

rsc modified the milestones: vgo2, Go1.12 Jul 12, 2018

rsc changed the title ~~x/vgo: secure releases with transparency logs~~ cmd/go: secure releases with transparency logs Jul 12, 2018

rsc added the modules label Jul 12, 2018

bcmills modified the milestones: Go1.12, Unplanned Nov 15, 2018

bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Nov 15, 2018

bcmills changed the title ~~cmd/go: secure releases with transparency logs~~ proposal: cmd/go: secure releases with transparency logs Jan 18, 2019

gopherbot added the Proposal label Jan 18, 2019

bcmills added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed Proposal labels Jan 18, 2019

gopherbot added Proposal and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 18, 2019

bcmills modified the milestones: Unplanned, Go1.13 Jan 18, 2019

bcmills added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jan 24, 2019

rsc modified the milestones: Go1.13, Proposal Mar 4, 2019

rsc changed the title ~~proposal: cmd/go: secure releases with transparency logs~~ proposal: cmd/go: secure releases with transparency log Mar 4, 2019

rsc removed the GoCommand cmd/go label Mar 4, 2019

bcmills mentioned this issue May 29, 2019

cmd/go: issues with shared $GOPATH/pkg/mod cache and autogenerated modules #32235

Closed

atomi mentioned this issue Aug 7, 2019

build: use GOPROXY and disable download on some steps go-gitea/gitea#7745

Merged

andybons mentioned this issue Aug 13, 2019

proposal: review meeting minutes #33502

Open

rsc closed this as completed Aug 20, 2019

rsc modified the milestones: Proposal, Go1.13 Aug 20, 2019

rsc added the Proposal-Accepted label Aug 20, 2019

thepudds mentioned this issue Aug 30, 2019

cmd/go: permit marking a module as private in go.mod #33985

Open

bcmills mentioned this issue Sep 4, 2019

cmd/go: module project cannot update go.sum file in readonly mode #34054

Closed

golang locked and limited conversation to collaborators Oct 23, 2020

gopherbot added the FrozenDueToAge label Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: cmd/go: secure releases with transparency log #25530

proposal: cmd/go: secure releases with transparency log #25530

FiloSottile commented May 23, 2018 •

edited

Loading

rsc commented Feb 27, 2019

FiloSottile commented Feb 27, 2019

rsc commented Mar 4, 2019

gopherbot commented Mar 4, 2019

rsc commented Mar 4, 2019

FiloSottile commented May 1, 2019

beoran commented May 2, 2019 •

edited

Loading

FiloSottile commented May 2, 2019

rsc commented May 21, 2019

gopherbot commented May 21, 2019

MichaelTJones commented May 31, 2019

MichaelTJones commented May 31, 2019

MichaelTJones commented Jun 1, 2019

FiloSottile commented Jun 1, 2019

rsc commented Aug 13, 2019

rsc commented Aug 14, 2019

rsc commented Aug 20, 2019

rsc commented Aug 20, 2019

vikyd commented Oct 23, 2019

FiloSottile commented Oct 23, 2019

vikyd commented Oct 24, 2019 •

edited

Loading

FiloSottile commented Oct 24, 2019

proposal: cmd/go: secure releases with transparency log #25530

proposal: cmd/go: secure releases with transparency log #25530

Comments

FiloSottile commented May 23, 2018 • edited Loading

rsc commented Feb 27, 2019

FiloSottile commented Feb 27, 2019

rsc commented Mar 4, 2019

gopherbot commented Mar 4, 2019

rsc commented Mar 4, 2019

FiloSottile commented May 1, 2019

beoran commented May 2, 2019 • edited Loading

FiloSottile commented May 2, 2019

rsc commented May 21, 2019

gopherbot commented May 21, 2019

MichaelTJones commented May 31, 2019

MichaelTJones commented May 31, 2019

MichaelTJones commented Jun 1, 2019

FiloSottile commented Jun 1, 2019

rsc commented Aug 13, 2019

rsc commented Aug 14, 2019

rsc commented Aug 20, 2019

rsc commented Aug 20, 2019

vikyd commented Oct 23, 2019

FiloSottile commented Oct 23, 2019

vikyd commented Oct 24, 2019 • edited Loading

FiloSottile commented Oct 24, 2019

FiloSottile commented May 23, 2018 •

edited

Loading

beoran commented May 2, 2019 •

edited

Loading

vikyd commented Oct 24, 2019 •

edited

Loading