Add support for validating GitHub Release Asset checksums #34

josh-padnick · 2018-02-23T00:46:12Z

Fetch is used to download both GitHub files and Release Assets. When downloading a Release Asset, it's helpful to verify the SHA256 or SHA512 hash of the release asset (the "checksum") so that you can confirm the release asset hasn't been changed from the time you first downloaded it. This PR adds optional support for doing that.

This use case came up for me when I was downloading a third-party health checker tool using Fetch. Without verifying the checksum, the downloaded tool could be changed without anyone knowing, a potential security issue given that we do not control the third-party repo. Using bash to validate the checksum is an option, but fetch support leads to slightly cleaner code and avoids platform compatibility issues.

In addition, I opted to make a backwards-incompatible change to simplify the UX. Previously, fetch allowed users to download more than one Release Asset in a single call. Once I implemented this PR, the fetch interface quickly became unwieldy with so many flags to pass in when using multiple release assets.

It occurred to me that (a) downloading multiple release assets is itself probably a less likely use case, and (b) users can simply call fetch multiple times, once for each release asset they want to download. Therefore, I opted for the UX simplicity with no loss of functionality but at the cost of backwards-incompatible change.

Other updates:

Slight cleanup to the README

This commit is functionally complete, however the --help menu rendering is quite ugly now. I'll fix that next.

brikis98 · 2018-02-23T00:51:32Z

A bummer, I just filed this as a GitHub issue with the "newbie" label. It would make for a good trial project :)

Will review shortly.

josh-padnick · 2018-02-23T00:59:29Z

Ah, it did occur to me that it felt a little trial-projectish, but I wanted to actually use it in my Confluent setup. Thanks for the forthcoming review!

brikis98

Seems like a good addition. Checksumming modules seems worthwhile too. We'll also have to expose these params in gruntwork-install. This definitely would've been a good newbie task...

brikis98 · 2018-02-23T00:53:11Z

README.md

+  Release](https://help.github.com/articles/creating-releases/)--to download. It only works with the `--tag` option.
+- `--release-asset-checksum` (**Optional**): The checksum that a release asset should have. Fetch will fail if this value
+  is non-empty and does not match the checksum computed by Fetch.
+- `--release-asset-checksum-algo` (**Optional**): The algorithm fetch will use to compute a checksum of the release asset.


It seems equally important to have checksums for modules from the --modules param. Probably the simplest option is to publish the commit ID (sha1) and compare that to the commit ID of the repo we just downloaded via git clone.

Do you mean the --modules pattern of gruntwork-install? Well, can't you just use Fetch to download the exact commit you want to achieve what you're suggesting?

Yes, --modules. We typically use tags for specifying the version we want. Should our recommendation be that users specify both the tag and the commit ID?

Well, keep in mind that fetch allows you to use tag constrain expressions, but I see where you're going with this. Yes, I think we should permit using both a --tag and a --commit-id param at the same time. Fetch can then error out if the two don't match. I think I'll file a GitHub issue for that for now, though.

brikis98 · 2018-02-23T00:54:41Z

checksum.go

+func verifyChecksumOfReleaseAsset(assetPath, checksum, algorithm string) *FetchError {
+	computedChecksum, err := computeChecksum(assetPath, algorithm)
+	if err != nil {
+		return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error())


It's generally nicer when writing tests and debugging to return a different type for each, well, type of error. All you do is define a struct (or even a type alias) and add an Error() string method to it.

I hear you. This was an early Golang tool I wrote and I didn't want to refactor all the errors throughout the code so I stuck with the existing idiom.

brikis98 · 2018-02-23T00:55:03Z

checksum.go

+		return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath))
+	}
+
+	fmt.Printf("Checksum matches!")


Do we not use any sort of logging library with fetch?

I was surprised about that as well. Again, I didn't want to invest time in updating those kinds of things so I stuck with the existing idiom.

brikis98 · 2018-02-23T00:56:23Z

checksum.go

+		return newError(ERROR_WHILE_COMPUTING_CHECKSUM, err.Error())
+	}
+	if computedChecksum != checksum {
+		return newError(CHECKSUM_DOES_NOT_MATCH, fmt.Sprintf("Expected to receive checksum value %s, but instead got %s for Release Asset at %s", computedChecksum, checksum, assetPath))


You may want to provide a bit more info about what this could mean in this error message. E.g, "This means that either you are using the wrong checksum value in your call to fetch (e.g., perhaps you updated the version of the module you're installing but not the checksum?) or that someone has replaced the asset with a potentially dangerous one and you should be very careful about proceeding."

That's a good idea. I'll update now.

brikis98 · 2018-02-23T00:58:06Z

checksum.go

+}
+
+func computeChecksum(filePath string, algorithm string) (string, error) {
+	var checksum string


This seems like a useless mutable value to have. I'd just return "", err in all error cases below and call return hasherToString in the non-error cases.

I sometimes like to declare the return var at the top, but your suggestion is probably a little cleaner. updated.

brikis98 · 2018-02-23T00:59:16Z

checksum.go

+	defer file.Close()
+
+	switch algorithm {
+	case "sha256":


Seems like this case statement is identical in all but the hasher used. Perhaps refactor into a getHasher method that returns sha256.New() or sha512.New() and then simplify this method to just a single path that calls io.Copy and hasherToString.

Good suggestion. Updated.

brikis98 · 2018-02-23T01:01:14Z

checksum_test.go

+
+// Checksums can be computed by running "shasum -a [256|512] /path/to/file" on any UNIX system
+const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA256="4314590d802760c29a532e2ef22689d4656d184b3daa63f96bc8b8f76f5d22f0"
+const SAMPLE_RELEASE_ASSET_CHECKSUM_SHA512="28d9e487c1001e3c28d915c9edd3ed37632f10b923bd94d4d9ac6d28c0af659abbe2456da167763d51def2182fef01c3f73c67edf527d4ed1389a28ba10db332"


Where do you users get these checksums from? If it's from the /releases page, then anyone who can replace the asset with a fake one could also replace the checksum with a fake one. To be truly useful, wouldn't we have to publish these checksums in some totally separate location? Or are you making the assumption that when someone goes to get the checksum value, they are verifying that the code is valid, and this merely ensures no one can pull the carpet out from under them later?

Exactly. When I went to download a third-party health checker for Kafka, I realized how risky (and yet common!) including such a library is. If an attacker wanted to exploit a library that many people use, one way is to find a way into someone's GitHub account (leaked GitHub token?), leave the commits unaltered and update only the release asset (where in most cases no checksum at all is published).

In this case, I'm making the assumption that at the time you download the code, you have an opportunity to look around the repo and conclude for yourself that it appears not to be compromised. There's only so much you can do here, of course, but at least if in the future when you've long forgotten about this library, something malicious happens, you'll be notified.

In the end, I suppose it's just more defense in depth.

brikis98 · 2018-02-23T01:04:02Z

main.go

+func downloadReleaseAsset(assetName string, destPath string, githubRepo GitHubRepo, tag string) (string, error) {
+	var assetPath string
+
+	if assetName == "" {


Should this be an error?

No. This code got updated when I changed assetNames from a []string to a string. That being said, I think the error handling could generally be improved here, as discussed above.

josh-padnick · 2018-02-23T02:36:11Z

All feedback responded to and tests passing, so merging now.

josh-padnick added 3 commits February 22, 2018 16:53

Add support for verifying checksums on release assets.

9205b82

This commit is functionally complete, however the --help menu rendering is quite ugly now. I'll fix that next.

Clean up UX and error handling for checksum processing.

fd6fdb2

Typos and minor tweaks

343ea70

brikis98 reviewed Feb 23, 2018

View reviewed changes

josh-padnick added 2 commits February 22, 2018 18:12

Add better error message.

3bb9834

Updates based on PR feedback.

2d3fcfe

josh-padnick mentioned this pull request Feb 23, 2018

Support tag and commit ID verification. #35

Open

josh-padnick merged commit 349ab84 into master Feb 23, 2018

josh-padnick deleted the checksum branch February 23, 2018 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for validating GitHub Release Asset checksums #34

Add support for validating GitHub Release Asset checksums #34

josh-padnick commented Feb 23, 2018

brikis98 commented Feb 23, 2018

josh-padnick commented Feb 23, 2018

brikis98 left a comment

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

brikis98 Feb 23, 2018

josh-padnick Feb 23, 2018

josh-padnick commented Feb 23, 2018

Add support for validating GitHub Release Asset checksums #34

Add support for validating GitHub Release Asset checksums #34

Conversation

josh-padnick commented Feb 23, 2018

brikis98 commented Feb 23, 2018

josh-padnick commented Feb 23, 2018

brikis98 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh-padnick commented Feb 23, 2018