disallow upper characters (/A-F/) in hex-encoded portion #34

AkihiroSuda · 2017-05-12T04:16:56Z

Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp

stevvooe · 2017-05-12T22:38:30Z

algorithm.go

@@ -164,3 +173,20 @@ func (a Algorithm) FromBytes(p []byte) Digest {
 func (a Algorithm) FromString(s string) Digest {
 	return a.FromBytes([]byte(s))
 }
+
+// ValidateEncoded validates the encoded portion string
+func (a Algorithm) ValidateEncoded(encoded string) error {


This fine as Validate: it is implied that it is validating the encoded portion.

stevvooe · 2017-05-12T22:42:21Z

A few nits.

@vbatts @dmcgowan PTAL

dmcgowan · 2017-05-12T22:44:04Z

digest.go

@@ -109,18 +109,11 @@ func (d Digest) Validate() error {
 		return ErrDigestInvalidFormat


This regexp check is no longer needed, the second regexp will do all the needed validation and the algorithm lookup validates the algorithm.

This regexp check is no longer needed…

I think it is. What if someone tries to make a digest with foobar? Or foobar:!!!? I'd rather get ErrDigestInvalidFormat for those than ErrDigestUnsupported.

I am only talking about the regexp check, foobar would get caught for lacking :. And who knows, !!! could be a valid encoding for foobar. It is better for the algorithm to make that distinction since we are only talking here whether ErrDigestUnsupported or ErrDigestInvalidFormat is returned, both of which indicate invalid input.

And who knows, !!! could be a valid encoding for foobar.

Dropping the digest regexp is going to make this package much more permissive than image spec (which has a number of charset restrictions). That's fine, go-digest is free to be more permissive, but it would be nice if there was a way to ask “is this a valid image-spec digest?”. “Does go-digest have any personal beef with this?” is a useful question, but potentially less useful for folks using go-digest to handle image-spec digests.

I am not suggesting we drop the drop the regexp from the package. I am pointing out that running this regexp is redundant when the input is valid. If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other, then only do this regexp match when the algorithm is not recognized to return the right error. Digests are frequently parsed and validated and we should optimize for reducing the number of operations for valid input, if additional checks are done on invalid data I am fine with that.

If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other…

The distinction is that ErrDigestUnsupported is “hmm, I haven't heard of that one”, while ErrDigestInvalidFormat is “that is wrong! Nobody should accept it”. The latter you can reject in middleman software (e.g. registries) regardless of who the end consumers will eventually be.

… then only do this regexp match when the algorithm is not recognized to return the right error.

Yeah, I'm fine with that.

wking · 2017-05-12T19:07:13Z

algorithm.go

+
+	// anchoredRegexps contains anchored regular expressions for hex-encoded digests.
+	// Note that /A-F/ disallowed.
+	anchoredRegexps = map[Algorithm]*regexp.Regexp{


Probably anchoredEncodedRegexps in case someone wants to anchor algorithm-identifier regexps, etc. in the future.

wking · 2017-05-12T19:12:40Z

algorithm.go

+func (a Algorithm) ValidateEncoded(encoded string) error {
+	r, ok := anchoredRegexps[a]
+	if !ok {
+		return nil


We can still do some validation in this case. How about:

encodedRegexp = regexp.MustCompile(`[a-zA-Z0-9=_-]+`) encodedRegexpAnchored = regexp.MustCompile(`^` + encodedRegexp.String() + `$`)

up where you define the per-algo regexps, and then down here:

if !ok { r = encodedRegexpAnchored }

You can stay DRY by updating digest.go:

var DigestRegexp = regexp.MustCompile(`[a-z0-9]+(?:[.+_-][a-z0-9]+)*:` + encodedRegexp.String())

In this context, that regexp has already been run from the top-level validate.

AkihiroSuda · 2017-05-16T02:30:17Z

updated PR

wking · 2017-05-16T17:44:38Z

algorithm.go

+func (a Algorithm) Validate(encoded string) error {
+	r, ok := anchoredEncodedRegexps[a]
+	if !ok {
+		return ErrDigestUnsupported


This is safer than before, but we can still do better (assuming go-digest is ok with the spec restrictions on the encoded charset). @stevvooe felt that had already been handled in “the top-level validate”, but Algorithm.Validate is a public method, so you can hit it without going through Digest.Validate. I expect we want to handle the following cases:

Algorithm is in anchoredEncodedRegexps, which you already handle correctly.

Algorithm is not in anchoredEncodedRegexps and:

The encoded part validates vs. the generic spec charset, in which case you should return ErrDigestUnsupported like you're doing here.

The encoded part does not validate vs. the generic spec charset, in which case we can return ErrDigestInvalidFormat even without recognizing the algorithm.

I'm not clear which case zero-length encoded parts fall into; they should be either ErrDigestInvalidFormat (which you've chosen below) or ErrDigestInvalidLength. Either way, the comment for this function should document which applies to that case.

@wking This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price. Please stop trying to expand the scope of things.

This is fine.

This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price.

But if your Algorithm is not listed in anchoredEncodedRegexps, we can still validate the encoded portion of the algorithm using the generic spec charset and distinguish between “I don't know about your algorithm; maybe the encoded part here is ok” and “I don't know about your algorithm but I know your encoded part is wrong”.

stevvooe · 2017-05-16T22:03:45Z

algorithm.go

+	if !ok {
+		return ErrDigestUnsupported
+	}
+	if encoded == "" {


Covered by length check.

dmcgowan · 2017-05-16T22:21:31Z

digest.go

-
-	// validate i then run through regexp
-	if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) {
+	if i < 0 {


i+1 == len(s) should remain

dmcgowan · 2017-05-16T22:23:13Z

digest.go

+		return ErrDigestInvalidFormat
+	}
+	algorithm, encoded := Algorithm(s[:i]), s[i+1:]
+	if algorithm == "" || encoded == "" {


The suggestion was instead of doing this here, when the algorithm is not available, do !DigestRegexpAnchored.MatchString(s) and return ErrDigestInvalidFormat if it does not match instead of ErrDigestUnsupported

AkihiroSuda · 2017-05-19T05:19:31Z

updated PR

wking · 2017-05-19T16:20:56Z

digest.go

-
-	// validate i then run through regexp
-	if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) {
+	if i < 0 || i+1 == len(s) {


The first condition should be i <= 0 to catch the empty-algorithm-string case, which will let you drop the algorithm == "" block below. Unit tests for Digest(""), Digest(":") (which we already have), Digest(":foo"), and Digest("foo:") (we already have "sha256:") would ensure we'd handled all the empty-component cases.

wking · 2017-05-19T16:29:49Z

algorithm.go

+	}
+	// Digests much always be hex-encoded, ensuring that their hex portion will
+	// always be size*2
+	if a.Size()*2 != len(encoded) {


We can drop this block. For algorithms in anchoredEncodedRegexps we're already checking the length (via the anchored regexp). And I'm not sure what would happen here if the user failed to import crypto/sha512 or similar. Would Algorithm("sha512:abc…").Size() return zero and then erroneously trigger this condition? Better to drop the redundant check.

done, also removed ErrDigestInvalidLength

done, also removed ErrDigestInvalidLength

While I think we may want to keep the ErrDigestInvalidLength constant so old consumers still compile, I think we want to re-drop this block (unwinding part of your recent 902cd80 → 4ca1301. Using .Size() * 2 assumes a hex encoding, and I don't see why callers would need to distinguish between hash portions that are the wrong length (traditionally handled by ErrDigestInvalidLength) and hash portions that contain invalid characters (traditionally handled by ErrDigestInvalidFormat). Is there a reason that distinction is useful?

dmcgowan · 2017-05-19T17:00:12Z

algorithm.go

-	SHA256 Algorithm = "sha256" // sha256 with hex encoding
-	SHA384 Algorithm = "sha384" // sha384 with hex encoding
-	SHA512 Algorithm = "sha512" // sha512 with hex encoding
+	SHA256 Algorithm = "sha256" // sha256 with hex encoding (without upper hex characters. i.e. /A-F/)


just put (lower case only) on each, the terminology ditto probably isn't the best...

...it makes me think of this 😭

wking · 2017-05-24T21:50:03Z

digest.go

@@ -68,9 +68,6 @@ var (
 	// ErrDigestInvalidFormat returned when digest format invalid.
 	ErrDigestInvalidFormat = fmt.Errorf("invalid checksum digest format")

-	// ErrDigestInvalidLength returned when digest has invalid length.
-	ErrDigestInvalidLength = fmt.Errorf("invalid checksum digest length")


Do we need to keep this (and just deprecate it) to satisfy any backwards-compat requirements?

This needs to remain. Are we no longer throwing this error?

wking · 2017-05-24T21:53:05Z

digest.go

+		return ErrDigestInvalidFormat
+	}
+	algorithm, encoded := Algorithm(s[:i]), s[i+1:]
+	if algorithm == "" {


You can drop this block now, because the i <= 0 check above covers it.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>

AkihiroSuda · 2017-06-07T02:50:28Z

Revived ErrInvalidLength.
Sorry for delay

dmcgowan · 2017-06-07T18:41:32Z

LGTM

stevvooe · 2017-06-07T19:53:26Z

LGTM

…rithms In some cases (e.g. Algorithm("foo").Validate("!")) we know the encoded portion is invalid even if we do not have an entry for foo in anchoredEncodedRegexps. Whatever algorithm-specific additional restrictions are applied via anchoredEncodedRegexps, the encoded portion must satisfy the generic encoded-portion restrictions from the DigestRegexp. This commit adjusts the Algorithm("foo").Validate("!") result to match the current Digest("foo:!").Validate() result, and factors the encoded portion of DigestRegexp out into encodedRegexp and encodedRegexpAnchored. It also guards the Size()-based length check with Available(), because Size() returns zero for unavailable algorithms. Now that we have a check for algorithms that aren't in anchoredEncodedRegexps, we can't rely on lookup-misses there to protect us from running the Size() check on unavailable algorithms. And equating anchoredEncodedRegexps hits with Available was dicey anyway, since it depended on what packages get loaded, etc. Adding Algorithm.ValidateIdentifier() allows us to avoid doubling up on the encoded check in Digest.Validate(), since both DigestRegexpAnchored and algorithm.Validate() were covering the encoded portion. The new tests are based on digest_test.go's TestParseDigest. I've also dropped the Available check from Digest.Valid, because we can tell: sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is a valid digest even if Algorithm("sha256") happens to be unavailable (e.g. because the consumer didn't import crypto/sha256). Available-ness just affects whether you're capable of verifying content against the digest, not digest-validity. Running Algorithm.Validate before Algorithm.ValidateIdentifier uses the presence of an identifier in anchoredEncodedRegexps as a hint to skip the identifier validation. This saves some time (ad Derek's suggestion [1]) and is safe as long as we don't add any invalid identifiers to that map. [1]: opencontainers#34 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>

AkihiroSuda mentioned this pull request May 12, 2017

descriptor: sha{256,512}: disallow /[A-F]/ characters opencontainers/image-spec#667

Merged

stevvooe reviewed May 12, 2017

View reviewed changes

dmcgowan reviewed May 12, 2017

View reviewed changes

wking reviewed May 12, 2017

View reviewed changes

AkihiroSuda force-pushed the regexp branch from 338f2ea to 59524bb Compare May 16, 2017 02:30

wking reviewed May 16, 2017

View reviewed changes

stevvooe reviewed May 16, 2017

View reviewed changes

algorithm.go Outdated

if !ok {

return ErrDigestUnsupported

}

if encoded == "" {

Copy link

Contributor

stevvooe May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by length check.

dmcgowan reviewed May 16, 2017

View reviewed changes

AkihiroSuda force-pushed the regexp branch from 59524bb to 8a797a1 Compare May 19, 2017 05:19

wking reviewed May 19, 2017

View reviewed changes

dmcgowan reviewed May 19, 2017

View reviewed changes

AkihiroSuda force-pushed the regexp branch from 8a797a1 to 902cd80 Compare May 24, 2017 09:46

wking reviewed May 24, 2017

View reviewed changes

disallow upper characters (/A-F/) in hex-encoded portion

4ca1301

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>

AkihiroSuda force-pushed the regexp branch from 902cd80 to 4ca1301 Compare June 7, 2017 02:49

stevvooe merged commit 279bed9 into opencontainers:master Jun 7, 2017

This was referenced Jun 9, 2017

schema/validator: Punt sha256/sha512 hex-validation to go-digest opencontainers/image-spec#696

Merged

algorithm: Deprecate ErrDigestInvalidLength #35

Closed

algorithm: ErrDigestInvalidFormat on Validate() even for unknown algorithms #36

Closed

thaJeztah mentioned this pull request Mar 26, 2018

Bump go-digest to v1.0.0-rc.1 moby/moby#36704

Merged

ijc mentioned this pull request Apr 30, 2018

Bump to latest go-digest. moby/buildkit#356

Merged

ijc mentioned this pull request Apr 30, 2018

Bump to latest go-digest. containerd/containerd#2319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disallow upper characters (/A-F/) in hex-encoded portion #34

disallow upper characters (/A-F/) in hex-encoded portion #34

AkihiroSuda commented May 12, 2017

stevvooe May 12, 2017

stevvooe commented May 12, 2017

dmcgowan May 12, 2017

wking May 12, 2017

dmcgowan May 12, 2017

wking May 12, 2017

dmcgowan May 13, 2017

wking May 13, 2017

wking May 12, 2017

wking May 12, 2017

stevvooe May 12, 2017

AkihiroSuda commented May 16, 2017

wking May 16, 2017

stevvooe May 16, 2017

wking May 16, 2017

stevvooe May 16, 2017

dmcgowan May 16, 2017

dmcgowan May 16, 2017

AkihiroSuda commented May 19, 2017

wking May 19, 2017

AkihiroSuda May 24, 2017

wking May 19, 2017 •

edited

AkihiroSuda May 24, 2017

wking Jun 7, 2017

dmcgowan May 19, 2017

AkihiroSuda May 24, 2017

wking May 24, 2017

AkihiroSuda May 25, 2017

stevvooe May 25, 2017

wking May 24, 2017

AkihiroSuda commented Jun 7, 2017

dmcgowan commented Jun 7, 2017 •

edited by caniszczyk

stevvooe commented Jun 7, 2017 •

edited by caniszczyk

		@@ -109,18 +109,11 @@ func (d Digest) Validate() error {
		return ErrDigestInvalidFormat

disallow upper characters (/A-F/) in hex-encoded portion #34

disallow upper characters (/A-F/) in hex-encoded portion #34

Conversation

AkihiroSuda commented May 12, 2017

Choose a reason for hiding this comment

stevvooe commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented May 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented May 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking May 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented Jun 7, 2017

dmcgowan commented Jun 7, 2017 • edited by caniszczyk

stevvooe commented Jun 7, 2017 • edited by caniszczyk

wking May 19, 2017 •

edited

dmcgowan commented Jun 7, 2017 •

edited by caniszczyk

stevvooe commented Jun 7, 2017 •

edited by caniszczyk