Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add goModuleH1 to the list of predefined digests #108

Merged
merged 1 commit into from
Jan 26, 2023

Conversation

MarkLodato
Copy link
Contributor

This is the first digest algorithm that is not a straight cryptographic hash over a blob of data.

Decisions:

  • Use the name goModuleH1 instead of goDirhash or similar because (a) it makes it more clear that this is only for go modules and (b) to allow future H2, H3, etc. to be expressed in the same DigestSet.
  • Use hex encoding instead of base64, since that is what we use elsewhere.

See #28 for background.

This is the first digest algorithm that is not a straight cryptographic
hash over a blob of data.

Signed-off-by: Mark Lodato <lodato@google.com>
Copy link
Contributor

@TomHennen TomHennen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark!

>
> For example, the module dirhash
> `h1:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=` would be encoded as
> `{"goModuleH1": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent are go developers familiar with the base64 version?

I'm somewhat concerned folks will be surprised that the go reported hash string differs from the hash reported in the provenance. Would it be acceptable to just state in the docs that this is base64 encoded? Verifiers will need to use special code to compute this value anyways, so it seems like doing so wouldn't add any complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good solution here. Line 7 says that the value is always a lowercase hex encoding of the digest. Therefore, our options seem to be:

  • Encode it using hex for consistency with all the other algorithms, at the cost of divergence from what go users see. (My suggestion.)
  • Change the spec so that encoding is per-algorithm. This is inconsistent with the other fields, and is technically a breaking change but probably OK since no one should be parsing unrecognized fields anyway. (Your suggestion.)
  • Make this the hex encoding of the base64, which seems like a bad idea.

Does anyone else have opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if "Pre-defined algorithms" require that it be the lowercase hex encoding, but allow custom values (like this one) to do what makes the most sense in context? I'm not sure this being a breaking change would be a big problem? This note even indicates the contents are subject to change until 1.0. If needed I think we could bump to 0.2.0.

Perhaps the most important feedback we could get would be from some Go experts.

> such a directory:
>
> ```bash
> find name@version -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' '
Copy link

@khalkie khalkie Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the actual command to use in order to generate the goModuleH1 value on a Go module directory?

Sorry if this is a stupid question, I'm trying to add it into our provenance generation and as a sanity check I'm running it on different copies of the same source to validate that the value is the same (i.e. I would expect the hash to be the same if the file contents are the same).

Actually it gives a different hash value depending on the folder path passed into the command - am I using it wrong?

i.e.

khalk@khalk:~/tempgit$ git clone https://github.com/golang/example
khalk@khalk:~/tempgit$ find ~/tempgit/example -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' '
45bde95411f095cd1040f6e5ad0c3f25f3c7ec8b9b048a32ceef01bac110e010
khalk@khalk:~/tempgit$ cd example
khalk@khalk:~/tempgit/example$ find . -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | cut -f1 -d' '
51d525520df583ea4c543d00381eb197c4190e39d92ff37feee1ee048a92e2e6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that you need to literally type find name@version because the "name@version" is part of the output. If you use "~/tempgit/example" or "." or "./name@version" you get a different result. I need to update the text to make this more clear.

Here's a script that is less error prone: https://gist.github.com/MarkLodato/c03659d242ea214ef3588f29b582be70

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense to me, but does that mean that the module has to be in a folder with the same name@version naming convention? Otherwise find is throwing an error i.e. find: ‘golang.org/x/example@1.0’: No such file or directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the directory itself does not matter. What matters is that the outpuf of the first sha256sum command looks like this:

abcd1234...90  example@v1.0/file.go
dcba4321...88  example@v1.0/another-file.go
...

That is, a relative path whose first component is <name>@<version>.

My script does a sed command to replace the actual directory name with the second parameter, which would be another way to achieve this.

Alternatively, you could write a small go program to call the official HashDir function.

@TomHennen
Copy link
Contributor

Hey Mark, is this something you still wanted to purse? We certainly see the value, just want to see if there are any blockers to merging.

@MarkLodato
Copy link
Contributor Author

Thanks, Tom. I think it probably would be valuable, just hasn't been at the top of my list. The main thing seems to be the encoding (hex vs base64). Do we want to live with different encodings, force this new one to use hex, or use a new version of the attestation spec that switches to base64?

@TomHennen
Copy link
Contributor

I agree that that's the only sticking point. I'd probably argue that it's better to have something rather than nothing so I can go either way (even as far as suggesting that everyone be able to support both hex and base64 when handling hashes).

@MarkLodato
Copy link
Contributor Author

My inclination is to:

  • For now, use hex since that's what the spec says. Yes it's a bit awkward for someone used to base64, it's not a huge deal.
  • If desired, do a new release of the spec that supports something else, perhaps base64 for everything.

My concern with doing base64 with v0.1 is that (1) it's a breaking change and (2) it's more confusing to have some things base64 and some thing hex.

@TomHennen
Copy link
Contributor

Ok, let's go with this.

@MarkLodato
Copy link
Contributor Author

Thanks. I need one more approver to submit.

@adityasaky adityasaky requested a review from a team January 26, 2023 16:04
@marcelamelara
Copy link
Contributor

@MarkLodato @TomHennen Let's open an issue to track the addition of base64-encoded digests to the spec, if we don't already have one.

@marcelamelara marcelamelara merged commit c6a8046 into in-toto:main Jan 26, 2023
@MarkLodato
Copy link
Contributor Author

Thanks!

@MarkLodato MarkLodato deleted the go-mod-dirhash branch January 26, 2023 16:13
@MarkLodato MarkLodato mentioned this pull request Feb 3, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants