Conversation
checkCache opened the storage reader and streamed it to the client without checking that the bytes still matched what was originally stored, or what the upstream registry declared. Disk corruption, accidental overwrites, or local tampering would go unnoticed. Wrap the storage reader in a verifyingReader that computes SHA256 (against artifact.content_hash) and, when version.integrity holds an SRI string, the corresponding sha256/384/512 digest as bytes flow through. At EOF the digests are compared; on mismatch we log at error level, bump proxy_integrity_failures_total, and clear the artifact's cache entry so the next request refetches from upstream. Verification is skipped when the stream was not fully consumed (client disconnect) to avoid evicting good artifacts on partial reads. The DirectServe presigned-URL path is unverified since the proxy never sees those bytes. Refs #42 (part 1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 1 of #42.
checkCacheopened the storage reader and streamed it to the client without checking that the bytes still matched what was originally stored, or what the upstream registry declared. Disk corruption, accidental overwrites, or local tampering would be served silently.Adds
verifyingReaderininternal/handler/integrity.gowhich wraps the storageio.ReadCloserand computes SHA256 (compared againstartifact.content_hash) plus, whenversion.integrityholds an SRI string, the matching sha256/384/512 digest. Hashing happens as bytes stream through so there's no extra read pass or buffering. At EOF the digests are compared and on mismatch the callback fires: error log,proxy_integrity_failures_total{ecosystem}increments, andClearArtifactCachenulls the storage path so the next request refetches from upstream.The client still receives the bad bytes on that one request since we don't know the hash until EOF. In practice every package manager verifies its own checksum and will reject the download, and the cache self-heals on the retry. Verification is skipped when the stream wasn't fully consumed (client disconnect mid-download) so a partial read can't evict a good artifact.
Not covered here: the DirectServe presigned-URL path, where the proxy never sees the bytes. That and parts 2/3 (sigstore attestations, peer verification) are follow-ups.
I looked at using
archives.Reader.Hashfrom v0.3.0 but it needs the bytes in memory;checkCachestreams, so a tee-through-hash wrapper is the better fit here. The archives method will be useful for the browse path where we already buffer.