Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization for structured headers. #627

Closed
mikewest opened this issue May 23, 2018 · 20 comments
Closed

Serialization for structured headers. #627

mikewest opened this issue May 23, 2018 · 20 comments

Comments

@mikewest
Copy link
Member

I've started sketching out a feature that intends to deliver a structured header as part of an HTTP request, and I find myself doing a little more hand-waving than I'd like in step 6 of https://mikewest.github.io/sec-metadata/#abstract-opdef-set-the-sec-metadata-header-for-a-request.

The end of https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-04#section-1 suggests that:

Those abstract types can be serialised into textual headers - such as those used in HTTP/1 and HTTP/2 - using the algorithms described in Section 3.

Section 3, however, seems to be the opposite: parsing a string into a Structured Header. I don't actually see a serialization algorithm in the document. Each type hints at how it might be serialized, but it would be nice to have an algorithm to point to.

@mikewest
Copy link
Member Author

(For example, can we assume any ordering in the serialization of a dictionary? :) )

@mnot
Copy link
Member

mnot commented May 23, 2018

Makes sense, will do in the next round.

One hitch is that eventually, we want to be able to define alternative serialisations of the header field in new versions of HTTP, so we'll have to be careful in how we do this. Or just admit that there will be some residual hand-waving.

@mikewest
Copy link
Member Author

Makes sense, will do in the next round.

Thanks, no rush. :)

One hitch is that eventually, we want to be able to define alternative serialisations of the header field in new versions of HTTP, so we'll have to be careful in how we do this. Or just admit that there will be some residual hand-waving.

Some level of hand-waving seems fine, though I'd prefer that it be constrained to this document, and not all the documents that wish to define structured headers. If we can end up with a single algorithm, no matter how complicated, that takes a structured header object and outputs a string, I'll be happy to use it!

@mnot
Copy link
Member

mnot commented May 24, 2018

Nod - but the hitch is "outputs a string" -- in that future world, it might be binary...

@mikewest
Copy link
Member Author

How about "outputs a thing that might be a string, and might be binary, and might be trinary, or might be anything else I can hand to Fetch's header list set algorithm"? Fetch talks about it in terms of a https://infra.spec.whatwg.org/#byte-sequence. Would that work for you?

@reschke
Copy link
Contributor

reschke commented May 25, 2018

I love it: "A byte sequence is a sequence of bytes, represented as a space-separated sequence of bytes."

Maybe: "...represented as a space-separated sequence of byte representations"?

@reschke
Copy link
Contributor

reschke commented May 25, 2018

@mnot - actually, what's in a HTTP/1.1 message should also be better considered a byte sequence, not a string

@mikewest - I would prefer to have SH not to rely on FETCH in any way

@mikewest
Copy link
Member Author

I would prefer to have SH not to rely on FETCH in any way

Pedantic nit: Byte sequence isn't defined in Fetch, but in Infra. :)

I would like SH to define a serialization algorithm in such a way that I can explain to web browsers what they ought to do with the result. It would be unfortunate if it was difficult to integrate SH and Fetch, as that makes my goals more difficult.

I think all I'm asking for is a clearly defined serialization algorithm that returns a result that Fetch can accept as a header value. I'm happy to leave details up to y'all and @annevk to work out who depends on whom and why amongst yourselves. :)

@reschke
Copy link
Contributor

reschke commented May 25, 2018

Well, that gets us back to the data model. I assume FETCH considers header field values as JavaScript strings (?), while in an HTTP/1.1 message it's really a sequence of bytes, usually restricted to values <= 127.

It's the edge case (non-ASCII) that makes this all interesting, but I believe it's a non-issue for SH.

@annevk
Copy link

annevk commented May 25, 2018

@reschke no, byte sequences with restrictions: https://fetch.spec.whatwg.org/#concept-header. (The API does convert these back and forth from JavaScript strings, using IDL's ByteString primitive.)

@reschke
Copy link
Contributor

reschke commented May 25, 2018

OK.

Right now we have only one serialization, so it's hard to discuss future ones.

The one that we have uses US-ASCII, which can be trivially encoded in octet sequences, and shouldn't have any issues with FETCH. So maybe we just need to write down this more clearly?

@annevk
Copy link

annevk commented May 25, 2018

For my own understanding, the problem is that you want to create a structured header using types, but then pass that into Fetch, with Fetch only taking byte sequences, and exact byte sequences being exposed through H/1 and H/2 and probably QUIC.

Ideally you keep the types around until you hit a point where you need to serialize. That would require HTTP offering some abstraction in front of H/1, H/2 and probably QUIC that takes headers where the values can be either byte sequences or types and then serializes them as appropriate for the eventual chosen transport.

Fetch could then change its "header" primitive so values would be either byte sequences or types and pass that on to the new HTTP abstraction. And also use the H/1 / H/2 serialization for its API, which isn't typed.

As an alternative, proposed by @mikewest I think, structured headers could define how to obtain a byte sequence from a type. We'd continue passing byte sequences around. Then a future H/N could eagerly parse those bytes to see if it can represent them as a type instead. You'd end up with a redundant serialize/parse, but all the interfaces don't have to be changed. Implementations could optimize the serialize/parse away. And this would also allow representing byte sequences as types that weren't types to begin with, which might be beneficial.

mnot added a commit that referenced this issue Jun 1, 2018
@mnot
Copy link
Member

mnot commented Jun 1, 2018

@mikewest see PR above; will that work for you (ignoring the alternative serialisations issue for now?)

@mnot
Copy link
Member

mnot commented Jun 1, 2018

My assumption has been that if a future H/n defines an alternative serialisation (or if it's done in an extension like a H2 SETTING), a separate API would have to be exposed for applications to call to set headers (even if that's a bump on the current API that adds information about the encoding being sent, plus a way for the application to detect that it's available).

Otherwise, there'd have to be either a needless encode/decode (if the application emitted H1 headers), or some nasty heuristics on the payload (if the application emitted the new format).

Same for parse; otherwise, the implementation will have to translate the new encoding to H1 for applications, which doesn't make sense if they just want the data structure and associated SH handling.

@annevk
Copy link

annevk commented Jun 1, 2018

I wasn't talking about a JavaScript API, to be clear. I was talking about the low-level interface for the HTTP standard that Fetch in some (hand-wavy) way wraps. If you were too, I suppose a distinct interface would work, but it seems nicer if we could exchange a single header list that contains both byte sequence values and typed values.

@annevk
Copy link

annevk commented Jun 1, 2018

(I don't see a link to a PR btw, just a commit on a branch that contains many other commits. Comments on those would probably easily get lost.)

@mnot
Copy link
Member

mnot commented Jun 1, 2018

Sorry, see #636

@mnot
Copy link
Member

mnot commented Jun 1, 2018

I suspect that implementations aren't going to want to hand around byte sequences, because that makes potential optimisations that they'll find attractive more expensive. If it's a byte sequence, that means they have to parse it for structure and figure out what to do with it. It'd be better if they hand around representations of the actual structures.

@annevk
Copy link

annevk commented Jun 1, 2018

@mnot I think you still misunderstand me. The "byte sequence values" are only for the legacy headers that do not have a typed representation. The "typed values" are for the new headers.

@mnot
Copy link
Member

mnot commented Jun 1, 2018

Ah, indeed I do then; I shouldn't answer bug mail when I'm sick :-/

That seems reasonable to me.

@mnot mnot closed this as completed Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants