-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add goModuleH1
to the list of predefined digests
#108
Conversation
67f8db4
to
9c1b7be
Compare
This is the first digest algorithm that is not a straight cryptographic hash over a blob of data. Signed-off-by: Mark Lodato <lodato@google.com>
9c1b7be
to
2b25f5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mark!
> | ||
> For example, the module dirhash | ||
> `h1:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=` would be encoded as | ||
> `{"goModuleH1": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what extent are go developers familiar with the base64 version?
I'm somewhat concerned folks will be surprised that the go reported hash string differs from the hash reported in the provenance. Would it be acceptable to just state in the docs that this is base64 encoded? Verifiers will need to use special code to compute this value anyways, so it seems like doing so wouldn't add any complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a good solution here. Line 7 says that the value is always a lowercase hex encoding of the digest. Therefore, our options seem to be:
- Encode it using hex for consistency with all the other algorithms, at the cost of divergence from what go users see. (My suggestion.)
- Change the spec so that encoding is per-algorithm. This is inconsistent with the other fields, and is technically a breaking change but probably OK since no one should be parsing unrecognized fields anyway. (Your suggestion.)
- Make this the hex encoding of the base64, which seems like a bad idea.
Does anyone else have opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if "Pre-defined algorithms" require that it be the lowercase hex encoding, but allow custom values (like this one) to do what makes the most sense in context? I'm not sure this being a breaking change would be a big problem? This note even indicates the contents are subject to change until 1.0. If needed I think we could bump to 0.2.0.
Perhaps the most important feedback we could get would be from some Go experts.
> such a directory: | ||
> | ||
> ```bash | ||
> find name@version -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the actual command to use in order to generate the goModuleH1 value on a Go module directory?
Sorry if this is a stupid question, I'm trying to add it into our provenance generation and as a sanity check I'm running it on different copies of the same source to validate that the value is the same (i.e. I would expect the hash to be the same if the file contents are the same).
Actually it gives a different hash value depending on the folder path passed into the command - am I using it wrong?
i.e.
khalk@khalk:~/tempgit$ git clone https://github.com/golang/example
khalk@khalk:~/tempgit$ find ~/tempgit/example -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' '
45bde95411f095cd1040f6e5ad0c3f25f3c7ec8b9b048a32ceef01bac110e010
khalk@khalk:~/tempgit$ cd example
khalk@khalk:~/tempgit/example$ find . -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' '
51d525520df583ea4c543d00381eb197c4190e39d92ff37feee1ee048a92e2e6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that you need to literally type find name@version
because the "name@version" is part of the output. If you use "~/tempgit/example" or "." or "./name@version" you get a different result. I need to update the text to make this more clear.
Here's a script that is less error prone: https://gist.github.com/MarkLodato/c03659d242ea214ef3588f29b582be70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense to me, but does that mean that the module has to be in a folder with the same name@version naming convention? Otherwise find is throwing an error i.e. find: ‘golang.org/x/example@1.0’: No such file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the directory itself does not matter. What matters is that the outpuf of the first sha256sum
command looks like this:
abcd1234...90 example@v1.0/file.go
dcba4321...88 example@v1.0/another-file.go
...
That is, a relative path whose first component is <name>@<version>
.
My script does a sed
command to replace the actual directory name with the second parameter, which would be another way to achieve this.
Alternatively, you could write a small go program to call the official HashDir function.
Hey Mark, is this something you still wanted to purse? We certainly see the value, just want to see if there are any blockers to merging. |
Thanks, Tom. I think it probably would be valuable, just hasn't been at the top of my list. The main thing seems to be the encoding (hex vs base64). Do we want to live with different encodings, force this new one to use hex, or use a new version of the attestation spec that switches to base64? |
I agree that that's the only sticking point. I'd probably argue that it's better to have something rather than nothing so I can go either way (even as far as suggesting that everyone be able to support both hex and base64 when handling hashes). |
My inclination is to:
My concern with doing base64 with v0.1 is that (1) it's a breaking change and (2) it's more confusing to have some things base64 and some thing hex. |
Ok, let's go with this. |
Thanks. I need one more approver to submit. |
@MarkLodato @TomHennen Let's open an issue to track the addition of base64-encoded digests to the spec, if we don't already have one. |
Thanks! |
This is the first digest algorithm that is not a straight cryptographic hash over a blob of data.
Decisions:
goModuleH1
instead ofgoDirhash
or similar because (a) it makes it more clear that this is only for go modules and (b) to allow future H2, H3, etc. to be expressed in the same DigestSet.See #28 for background.