Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Check the hash of the binary in the verifier #124

Merged
merged 9 commits into from
Sep 14, 2022

Conversation

rbehjati
Copy link
Contributor

@rbehjati rbehjati commented Sep 7, 2022

The modification in #118 that removed ExpectedBinarySha256Hash from BuildConfig mistakenly also removed the verification logic related to checking the hash of the binary in the verifier. This PR fixes that.

This change adds a slightly modified version of the VerifyBinarySha256Hash logic back, and:

  • Adds negative tests to be able to avoid these bugs in the future; additional test files are added to allow this.
  • Replaces the Oak build example with a trivial example in testdata. This is to allow writing more tests (including negative tests) without making the CI too slow. In fact this PR reduces the CI time from more than 10 minutes to less than 2 minutes. For this:
    • The testdata/build.toml is renamed to testdata/oak_build.toml and a new testdata/build.toml is added with the simpler example,
    • A new json file, matching the new testdata/build.toml is added
  • Adds a CI step for generating test coverage reports (ref Generate test coverage reports #113, but does not completely fix that issue since the report is not complete and not easy to navigate)

Note that there is a permission issue when running go test ./internal/verifier with the new build.toml file. The same issue exists when running bazel test and bazel run with this example, so the README is updated to use testdata/oak_build.toml as the example. Also see #123 on the permission issue.

@rbehjati rbehjati force-pushed the verifier branch 13 times, most recently from 203b5dc to 95d09c5 Compare September 9, 2022 17:24
@rbehjati rbehjati marked this pull request as ready for review September 9, 2022 18:09
Copy link
Contributor

@mariaschett mariaschett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

}
}

func TestReproducibleProvenanceVerifier_badCommand(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool test! As idea: If we need the bad_command_provenance.json only in this test, shall we maybe hard code it as string in the test? Might also be faster to see the whole test then.

Similar with the invalidHashProvenancePath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good idea, and I agree. Let me do this in a separate PR since doing it nicely requires some refactoring in verifier.Verify(), which is the unit under test. It currently takes the path to a file as input. Created #126.

}

if binarySha256Hash != expectedBinarySha256Hash {
return fmt.Errorf("the hash of the generated binary does not match the expected SHA256 hash; got %s, want %v",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) use %v or %s for both values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@@ -67,6 +67,13 @@ func (verifier *ReproducibleProvenanceVerifier) Verify(provenanceFilePath string
return fmt.Errorf("couldn't build the binary: %v", err)
}

// The provenance is valid, therefore `expectedBinaryHash` is guaranteed to be non-empty.
expectedBinaryHash := provenance.Subject[0].Digest["sha256"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still add explicit checks here, so that if the provenance happens to be malformed (e.g. the digest empty), we will get a nice error message. But if it's already guaranteed to be covered elsewhere, feel free to ignore (though I would still be a bit worried that over time it may get out of sync).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified ParseProvenanceFile to return a ValidatedProvenance, to guarantee that the condition is already checked. Although, Golang is not very good at these things, since it is not easy to clone structs, or declare immutable variables.

t.Fatalf("couldn't verify the provenance file: %v", err)
}
}

func TestReproducibleProvenanceVerifier_invalidHash(t *testing.T) {
// The path to provenance is specified relative to the root of the repo, so we need to go one level up.
// Get the current directory before that to restore the path at the end of the test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the path need to be restored? Does it mean that it affects the execution of other test cases? That would seem problematic, since the execution of the tests would differ whether they are running concurrently or not. Is it possible to make the entire process not depend on the PWD and instead use absolute paths for everything? It would still be possible here to concat PWD with a relative path to obtain an absolute path and use that, but at least that would not mutate global state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically because when provenances are loaded, they are also validated against the schema. The path of the schema is hardcoded inside some other file, and getting the path right requires changing the directory.

I have not been able to come up with a reliable and neat solution yet, but it certainly needs to be fixed in a separate PR. We have this bit of code in all tests, and it is causing a lot of clutter.

We plan to switch to the SLSA format, and retire the amber provenance format. So it probably is better to wait for that change. I believe this problem will automatically go away when that change is completed. Created #127.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think the basic functions (which are tested here, e.g. verifier.Verify) should deal with the actual content of the files (as strings), they should not assume they can read a file by its path. Then we can have helpers to just read the file at a path, and we can compose that by passing its content to the function in question.

@@ -89,6 +89,10 @@ func ParseProvenanceFile(path string) (*intoto.Statement, error) {
return nil, fmt.Errorf("could not unmarshal the provenance file:\n%v", err)
}

if len(statement.Subject) == 0 || statement.Subject[0].Digest["sha256"] == "" {
return nil, fmt.Errorf("at least one subject must be present, and it must have a sha256 digest")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the semantic of multiple subjects? Perhaps we should limit the check to ensure exactly one subject for now, until we figure that out (if it's even something we want to allow at all -- I suspect not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Done.

Copy link
Contributor Author

@rbehjati rbehjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews.

}

if binarySha256Hash != expectedBinarySha256Hash {
return fmt.Errorf("the hash of the generated binary does not match the expected SHA256 hash; got %s, want %v",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

}
}

func TestReproducibleProvenanceVerifier_badCommand(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good idea, and I agree. Let me do this in a separate PR since doing it nicely requires some refactoring in verifier.Verify(), which is the unit under test. It currently takes the path to a file as input. Created #126.

@@ -89,6 +89,10 @@ func ParseProvenanceFile(path string) (*intoto.Statement, error) {
return nil, fmt.Errorf("could not unmarshal the provenance file:\n%v", err)
}

if len(statement.Subject) == 0 || statement.Subject[0].Digest["sha256"] == "" {
return nil, fmt.Errorf("at least one subject must be present, and it must have a sha256 digest")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Done.

@@ -67,6 +67,13 @@ func (verifier *ReproducibleProvenanceVerifier) Verify(provenanceFilePath string
return fmt.Errorf("couldn't build the binary: %v", err)
}

// The provenance is valid, therefore `expectedBinaryHash` is guaranteed to be non-empty.
expectedBinaryHash := provenance.Subject[0].Digest["sha256"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified ParseProvenanceFile to return a ValidatedProvenance, to guarantee that the condition is already checked. Although, Golang is not very good at these things, since it is not easy to clone structs, or declare immutable variables.

t.Fatalf("couldn't verify the provenance file: %v", err)
}
}

func TestReproducibleProvenanceVerifier_invalidHash(t *testing.T) {
// The path to provenance is specified relative to the root of the repo, so we need to go one level up.
// Get the current directory before that to restore the path at the end of the test.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically because when provenances are loaded, they are also validated against the schema. The path of the schema is hardcoded inside some other file, and getting the path right requires changing the directory.

I have not been able to come up with a reliable and neat solution yet, but it certainly needs to be fixed in a separate PR. We have this bit of code in all tests, and it is causing a lot of clutter.

We plan to switch to the SLSA format, and retire the amber provenance format. So it probably is better to wait for that change. I believe this problem will automatically go away when that change is completed. Created #127.

@rbehjati rbehjati merged commit d98c4b9 into project-oak:main Sep 14, 2022
@rbehjati rbehjati deleted the verifier branch September 14, 2022 16:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants