Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define id- as a prefix for any Digest alg, not just id-sha-256 and id-sha-512 #885

Closed
manger opened this issue Aug 1, 2019 · 42 comments
Closed

Comments

@manger
Copy link

@manger manger commented Aug 1, 2019

draft-ietf-httpbis-digest-headers introduces id-sha-256 as a separate algorithm to sha-256 for the Digest HTTP header to allow an integrity check that doesn't depend on the content-encoding.

The desire to apply integrity to the content-encoded representation, or the decoded representation, is independent of the choice of integrity algorithm.

How about defining "id-" as a prefix that can be applied to any integrity algorithm to indicate the choice of representation? That avoids the need to duplicate sha-256 and sha-512 entries (and others) in the table of Digest algorithms.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Aug 1, 2019

@manger

How about defining "id-" as a prefix that can be applied to any integrity algorithm
That was the original idea, though it come with some complexities, eg registering all possible
values to the IANA table.

@martinthomson can we provide a new, short, separate, I-D defining the id- prefix for digest-algoritms? The same I-D could be used to add other hash functions (eg, sha-384) to the IANA table.

I can provide it in a brief time if there's agreement on that.

See martinthomson/http-mice#17 that's somewhat related.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Apr 30, 2020

Let's add it in a FAQ and close this

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Aug 17, 2020

@LPardue should we file a new I-D for that and close this issue? Should we reserve the id- word for algorithms?

@LPardue
Copy link
Contributor

@LPardue LPardue commented Aug 18, 2020

There's two things at play in my mind:

  1. I think I originally misread @manger's point. I'm coming to a position that thinks it is more logically pure to do it the way he has suggested.
  2. We use id-sha-{256|512} as a device to help explain some of the thornier parts of using digest.

Point 2 is important not to lose IMO.

My suggested course of action is to keep this issue open for the time being, @ioggstream to lead on putting together an independent Internet-Draft that describes "id- prefix for digest algorithms", and @ioggstream to lead on a WIP PR to replace the definition of id-sha- algorithms in digest with cross references to the new I-D. This would give us a good picture of the impact of change, which we can then take to the WG for feedback. Subject to that, we can then resolve this issue with the new text or close with no action.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Aug 18, 2020

I'll accept your suggestion to check the impacts.

In general:

1- ok to a new I-D "id- prefix for digest algorithms"
2- I will retain id-sha in Digest without references to other I-D
3- we could reserve the id- prefix for Digest algorithms in this I-D so we can close this issue

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Sep 8, 2020

"id-" prefix for digest-algorithms:

IMHO id-sha-256 and id-sha-512 are one selling point for this I-D, as they give the opportunity to implementors to avoid the fallacies of their current implementations (that is, they always send the sha-256 of the unencoded body even when Content-Encoding != null)

@LPardue @manger can we close now?

@LPardue
Copy link
Contributor

@LPardue LPardue commented Oct 18, 2020

Thanks for doing the work and sorry for the delay. I suspect that given how concise the I-D turned out being, we could incorporate some of that text back into digest without much impact.

We should ask the WG whether they think the ability to have id- prefix would be useful, or if we can just continue with the draft as is.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Oct 18, 2020

Both solutions are good. The only issue I see is for checksum algorithms starting with id- :)

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jan 4, 2021

@martinthomson suggests to separate those two I-D. Seems reasonable to me at this point. cc: @LPardue

@reschke
Copy link
Contributor

@reschke reschke commented Jan 4, 2021

I think it would be better if we could find a way to distinguish these cases (digest of selected representation vs digest of unencoded representation) without overloading the digest name - for instance, with a separate set of header fields.

@martinthomson
Copy link
Contributor

@martinthomson martinthomson commented Jan 5, 2021

In splitting the work, we can decide to spell the unencoded digest differently. I agree that something like 'Hash: foo' rather than 'Digest' offers some advantages. Allowing structured fields is a non-trivial benefit, for instance.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jan 5, 2021

Splitting WFM. How does this work procedurally? The id- algorithms were in the adopted document, do we need to seek adoption of the new document before removing things is digest?

@martinthomson
Copy link
Contributor

@martinthomson martinthomson commented Jan 5, 2021

Any technical change that has the consensus of the working group can be enacted, no matter how disruptive. Of course, more significant changes probably need more thorough checking. Consulting the chairs or sending an email to the list would be where I start.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jan 5, 2021

The id- algorithm are in this document to help current implementation to fix their current behaviour, which send digest without considering of content-coding without breaking their current setup ( Eg Just changing the algorithm, or adding the id- one).

While defining a new field is fine, for current implementers there is no value in adding this feature in a new field, as they are not using since it is not standardized ;)

@reschke
Copy link
Contributor

@reschke reschke commented Jan 5, 2021

Do we actually have current implementations that use "id-"? How recent are those? Are they actively maintened?

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jan 5, 2021

When reviewing the way Digest was used in various contexts (banking APIs, ...) I found that they were incompatible with RFC3230 mainly because:

  • digest was computed on the unencoded payload
  • digest was computed on payload data in range requests

I then started this I-D to provide guidance in computing digest, but I had to find a way for those implementation which currently send content-coded payload together with the digest of the unencoded payload. So your question:

Do we actually have current implementations that use "id-"

We have broken implementations which compute sha-256 like if were id-sha-256.

This I-D is an attempt to make order in the current Digest usage providing implementers with some tool to improve their compliance: in this case, adding an id- to an algorithm token is easier than moving to a new S-H field.

@reschke
Copy link
Contributor

@reschke reschke commented Jan 5, 2021

Using structured fields is a separate issue.

So I understand currently the "id-" prefix is not used (yet)?

Is it easier to add "id-" as prefix as opposed to using a different field name?

(just checking)

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jan 5, 2021

currently the "id-" prefix is not used (yet)?

the id- allows giving a name to something that is actually used but improperly named as sha-256. In some cases people uses gzip content-coding proxies after adding digest....

Is it easier to add "id-" as prefix as opposed to using a different field name?

Yes, because with Digest, people buy a way to transition and use different algorithms.

@reschke
Copy link
Contributor

@reschke reschke commented Jan 5, 2021

the id- allows giving a name to something that is actually used but improperly named as sha-256.

But so would putting the hash into a diffent header field, no?

In some cases people uses gzip content-coding proxies after adding digest....

That is indeed a problem (similar to gzip-after-etag).

Yes, because with Digest, people buy a way to transition and use different algorithms.

Could you please elaborate on that?

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jan 5, 2021

Since Digest is used to convey the integrity of transactions, its values are usually long-term stored as transaction proof/checksum.
Currently different implementers group still use Digest, even with different nuances tied to their different knowledge of the HTTP spec. Instead of changing header, my understanding is that those implementers would prefer define a custom header, which will make things less interoperable.

  • Digest implementers already know they can change in time the digest-algorithm, and they already had (eg. md5 -> sha-256);
  • Digest provides a way for a seamless algorithm migration allowing multiple representation-data-digest (id-sha-256=xxx, sha-256=xxx)
  • the id- algorithm idea @LPardue and me wrote, tries to provide a safe escape for gzip-after-digest & co. If you have other low-footprint ideas, please share :)

@reschke
Copy link
Contributor

@reschke reschke commented Jan 5, 2021

Well, the idea would be not to overload the digest name, but to put the digest into a differently names field which always applies to the identity-encoded content.

(I think it's worthwhile to explore this before we start overloading algorithm names with semantics)

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jan 5, 2021

I made an assumption earlier that I'd like to challenge. Does the suggestion to "split" actually require a new document? Would one possible option be to define an additional header (Identity-Digest or whatever) in this specification?

I'm just a little concerned about the amount of activation energy required for a new document and how that might tie up progress on this I-D.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jun 21, 2021

Since this discussion in January, the WG has gradually formed consensus that defining a new Content-Digest header is a reasonable path forward. That leaves me wondering if "Identity-Digest" isn't such a bad idea now. Especially because it makes managing the algorithm repository easier. See #1555 for a good example of unintended consequences of overloading the algorithms with an id- prefix.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jun 22, 2021

@LPardue not sure... this path will lead to an Identity-Content-Digest :P I think implementation challenges are always around Eg. the same considerations of #1555 are valid for "sha256", not only for id- algorithms. A broken implementation could do something like

hash_size = int(algorithm[-3:])  # sha256 -> 256

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jun 22, 2021

Right. We typically try to avoid implementers shooting themselves in the foot by stating simple algorithm identifiers that have one meaning. We can't prevent them from doing obviously broken things like you illustrate. However, this issue specifically asks the question about whether we want to make id- a generic prefix, which nudges implementers into pattern matching on the prefix.

I don't think there is much strong case for an Identity-Content-Digest. For example, if a range request yields a partial response with content coding, there's not much the client can do to remove the coding in order to calculate a digest to verify.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jun 22, 2021

Implementers switching to C-D would probably use I-C-D to avoid issues related to compression.

Probably @b---c @jricher are interested in the discussion...

(to clarify, I'd go for id- algorithms :)

@mnot
Copy link
Member

@mnot mnot commented Jun 25, 2021

My .02 - overloading the digest algo name means that there will be a bunch of algorithms who share a slab of text about HTTP operation. That's really weird. It also means that prefixes now mean something, which begs the question of what to do the next time you need to define a prefix.

It's much cleaner to just define this as a new header with separate semantics.

@jricher
Copy link
Contributor

@jricher jricher commented Jun 28, 2021

I'm more comfortable having either a separate header entirely or having parameters on the header objects themselves as opposed to having a semantic layer on the algorithm identifiers.

And in completeness, for me personally, it makes the most sense for the identity-content-digest to be the "default" option that doesn't have to be specified.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jun 29, 2021

We deprecated algorithm parameters for security reasons: I don't know if it's worth reintroducing them.

@LPardue atm to move on I suggest to:

  • remove id- algorithms from the draft
  • don't add I-D-C yet
  • consolidate a document with a general agreement
  • eventually re-evaluate the issue

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jun 29, 2021

Sgtm. Let's take this plan to the list with a brief background summary just to ensure visibility.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jun 29, 2021

One clarification question. Related to Justin's. Do we want to define only an identity-content-digest and sidestep the matter of content encoding with identity altogether?

@jricher
Copy link
Contributor

@jricher jricher commented Jul 7, 2021

I think from the perspective of most developers it's going to be something like:

"I am making a request/sending a response and a have a bunch of bytes I'm going to put into the message body that I want to protect."

The average application developer isn't going to think in terms of content encoding, identity, or anything like that. I fully appreciate that the spec needs to be precise about what a digest applies to, but we also have to account for how people will look at implementing this in the wild.

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Jul 7, 2021

@LPardue I think that ignoring content-encoding is not always feasible (eg. some content codings provide encryption). imho I-C-D is more similar to Digest than to C-D (since it is computed on something that's not on the wire).

The average developer ...

if you take a closer look, I-C-D implementers have to compute the value before that any coding is applied (on requests) and must validate the response after the content coding is removed: this is being aware of content-coding and probably means computing the value directly into the actual application and not via "interceptors", "sequences" or "api gateway".

@jricher It would be great if somebody could provide some pseudo-code implemementations to improve the reasoning, eg like in #1555 . wdyt?

PS: generally, devs implementing integrity should have basic knowledge of the "equivalence class" inferred by the communication layers. When they don't, bad things happen :)

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jul 7, 2021

Yeah i think this is a good illustration of the traps that people can fall into. Devs can be under the illusion that what they put in their application message is what gets serialized on the wire. but then some server configuration or middleware goes and changes it dynamically on the fly.

@jricher
Copy link
Contributor

@jricher jricher commented Jul 9, 2021

My experience with implementing Digest in the old RFC has been entirely at the application level or equivalent. Even when implemented as a filter/interceptor pattern within a framework, the filter is given the same byte stream from the parsed HTTP message that the higher-level application functions are given. I believe this experience is going to be exceedingly common among those who which to use this specification.

Consequently, I wonder whether I-C-D as proposed should just be Digest and if what's now defined as Digest should be Representation-Aware-Digest or Coding-Aware-Digest or something like that, to differentiate. This would also help the new draft align with older implementations and ease migration: the Digest header gets familiar rough-but-functional semantics and a new header takes into account the more complete, more nuanced HTTP semantics.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jul 9, 2021

I'm going to push back against any attempt to redefine the intentions of Digest. The premise of RFC3230 is clear, even if the execution of communication hasn't been.

As mentioned at the last interim, specifications like metalink RFC 6249 use Digest as it was intended to fetch ranges of content across different requests and to validate the sum of the parts, not the parts individually.

If older implementations got it wrong, they need to use the new thing that is much more constrained.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Jul 9, 2021

The way I think about these different digests is, for a receiver to validate in an order like so

• check Content-Digest to prove the HTTP message content is valid.
• check Digest if you have enough coded content bytes to do so. (If you have ranges, or you made a HEAD request, or there's just no content, you can't do this for a single request)
• check Identity-Digest if you have enough non-coded content bytes
• for the Web platform check, check Subresource Integrity if you like. Calculated over the non-coded content bytes

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Nov 8, 2021

@LPardue As time goes by, I think we can defer id- algorithms to another draft if we want to reach WGL.

@LPardue
Copy link
Contributor

@LPardue LPardue commented Nov 9, 2021

Works for me, did you already have a WIP change that does that?

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Nov 9, 2021

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented Nov 15, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants