Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help with HeaderMap #300

Closed
annevk opened this issue May 30, 2014 · 26 comments
Closed

Need help with HeaderMap #300

annevk opened this issue May 30, 2014 · 26 comments

Comments

@annevk
Copy link
Member

annevk commented May 30, 2014

HTTP headers are a mess. That's because semantics are confused with syntax due to legacy servers and user agents.

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHeaderArray.h#146 (and elsewhere in that file and the .cpp file, this needs to be specified somewhere)

If we are going to make http://fetch.spec.whatwg.org/#headermap work we need to somehow deal with that or decide to ignore it at the API level (the fetch algorithm itself should probably cater for it somehow in its header representation). @mcmanus @mnot

@mnot
Copy link
Member

mnot commented May 31, 2014

Yes, they're definitely a mess.

To be clear, if I have

Foo: a, b
Bar: 134
Foo: c

does .get('foo') return "a, b" or just "a"?

I read the notes on HeaderMap as saying it's the former, correct?

If so, I think you can keep things relatively simple; .add() just adds a new header field (no matter how many there are currently).

@annevk
Copy link
Member Author

annevk commented Jun 1, 2014

get() would return "a", getAll() would return ["a", "b", "c"]. My idea was that there would be automatic flattening and combining. add() would append a new value to an existing entry or add a new header/value entry.

@tabatkins
Copy link
Member

Are you planning to build in the exceptions, per the Moz bug, so that commas mean multiple entries in all headers except for those handful of weird ones? (And multiple entries always serialize with commas except for those exceptions?)

Otherwise, yes, MultiMap behavior is the best, go for it.

@annevk
Copy link
Member Author

annevk commented Jun 2, 2014

The idea was to not have exceptions in this API and just model after HTTP. It does mean this API cannot be used by the underlying implementation. I'm wondering if that's the correct tradeoff.

@tabatkins
Copy link
Member

I'm not sure how worthwhile that is; the serialization isn't necessarily exposed to JS directly; and if it is, it's not really significant what it is. Being compatible with the web seems more important than being maximally simple here, since the choice doesn't affect the author-facing API.

@tobie
Copy link
Member

tobie commented Jun 2, 2014

Should the methods which return void return this instead for consistency with Map?

@annevk
Copy link
Member Author

annevk commented Jun 3, 2014

@tabatkins this is the author-facing API.

@jakearchibald
Copy link
Contributor

@annevk

get() would return "a"

Does this mean request.headers.get('accept') would be image/webp rather than image/webp,*/*;q=0.8?

I assume comma is always used to seperate values in headers?

@tabatkins
Copy link
Member

@annevk Right, the author-facing part of the API definitely shouldn't expose any irregularities. I was just discussing whether the exceptional cases should be handled appropriately when parsing headers into a HeaderMap and serializing a HeaderMap into headers.

@jakearchibald Yes, headers are defined to accept comma-separated lists of values.

@mnot
Copy link
Member

mnot commented Jun 3, 2014

To be clear, all that HTTP requires is that multiple instances of the same header are able to be folded into a single comma-separated, NOT that any header can be split on commas. See http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-26#section-3.2.2 for the precise language.

I.e., it's quite possible that get() as you define it will return a truncated string for some headers.

@sicking
Copy link

sicking commented Jun 4, 2014

Could we do the simple thing and just use a normal ES6 Map rather than a new HeaderMap class?

The headers property of RequestInit could either accept a Map object or a JS object which we enumerate. If the value in either is an array, then we'd add multiple request headers.

The headers property of Response would always be a Map. If there were multiple response headers then we we return an Array of strings.

So if @mnot's initial comment is a set of response headers, response.headers.get("Foo") would return ["a, b", "c"].

Alternatively we could concatenate multiple response headers. That has the advantage that we'd consistently return strings. So then response.headers.get("Foo") would return "a, b, c".

@annevk
Copy link
Member Author

annevk commented Jun 4, 2014

No we cannot use a Map as we need to know about changes as they can affect the mode of the request.

E.g. you want event.request.headers.set("seen-by-service-worker", "yup") or some such. And we want to not allow dangerous headers there.

@tabatkins
Copy link
Member

@sicking ES Maps are almost completely unusable for anything beyond the most trivial uses in DOM, for reasons that I've complained about in the past. :/

Regarding just using a Map-like interface, as noted by others, always returning an array complicates the 99% case for the purpose of the 1% case, while concatenating the headers will result in constantly broken code, as people will rarely ever test for multiple headers. Other languages like Python use approaches similar to what anne is doing here, where you can ask for the first header or for all headers. Also, the MultiMap interface this is based on is already in use in the URL object, for its query params, so it's not even anything new.

@annevk
Copy link
Member Author

annevk commented Jun 4, 2014

@tabatkins it is actually slightly different. In this API the names are unique. However, given @mnot's and @jakearchibald's comments maybe what we want is indeed an API similar to URLSearchParams and FormData. That is

Foo: a, b
Bar: 134
Foo: c

gets turned into three distinct entries. If you get("Foo") you get "a, b". If you getAll("Foo") you get ["a, b", "c"].

@tabatkins
Copy link
Member

And would that serialize into "Foo: a, b, c\nBar: 134", or would it stay separate and do "Foo: a, b\nFoo: c\nBar: 134"?

@annevk
Copy link
Member Author

annevk commented Jun 4, 2014

It would stay separate and even preserve order. That's what URLSearchParams does today. The moment you start making modifications things will change around a bit of course.

@jakearchibald
Copy link
Contributor

That sounds good to me.

If I .get the accept/vary header I'd expect the full header value of the first match.

@annevk
Copy link
Member Author

annevk commented Jun 4, 2014

What this basically means is that we take in the stream from HTTP, split on newlines to get header lines, and then we turn each header line into a header by splitting on the first ":" to get a name and a value. A setup like that seems fine for both Request and Response as far as I can tell.

(Note that getResponseHeader() from XMLHttpRequest does combine values and would return "a, b, c" for the example earlier. However, getAllResponseHeaders() does not.)

@reschke
Copy link

reschke commented Jun 5, 2014

Sounds right to me.

Re: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHeaderArray.h#146... it would be cool if we could do some research whether the three exceptions there really need to be exceptions (I do agree for Set-Cookie).

@tabatkins
Copy link
Member

Yeah, given the stated semantics, this sounds fine to me.

(It's really weird that the spec requires headers to be splittable on commas, but you can't reverse that and treat all commas as splitters.)

@sicking
Copy link

sicking commented Jun 6, 2014

Why do you want to get the first value rather than a comma-separated join of all headers with the specified name? The HTTP spec says that a comma separated join is equivalent.

Here's what I'm proposing:

For APIs that take a list of request headers, I.e. the headers property of RequestInit, allow the page to either pass a Map object or any other object.

If a Map object is passed, create a new Map, enumerate the passed in map and add each property to the newly created Map.

If any other object is passed, create a new Map, enumerate that object and add each property to the newly created Map.

In both cases any Arrays would still be kept as Arrays. Anything else is converted to a string.

Let the headers property on Request be a Map.

It would be a plain normal Map, which means that any values could be added to it.

At the time when a Request is submitted, if any bad values is in the headers Map, we either ignore them, or treat it as a network error. Either is fine with me. Any arrays result in multiple headers with the same name being added.

Let the headers property in Response be a Map. When we build a Response object and populate the headers Map we add headers in the order that they were received from the network (minus any that we couldn't expose for security reasons). Any duplicate headers are joined using ,.

This seems to me to give the properties that we want.

  • Duplicate headers in a Response always maintain their relative order.
  • All headers in a Response are plain string values.
  • You can send multiple headers with the same name, in an author defined order.

While also allowing us to reuse plain Map objects.

Anne points out that this means that we validate header values at time of submission, while we validate things like methods and CORS-mode early. I agree that's not perfect, but it also doesn't seem like a huge deal.

@tabatkins
Copy link
Member

Why do you want to get the first value rather than a comma-separated join of all headers with the specified name? The HTTP spec says that a comma separated join is equivalent.

Because code written to assume only a single header is sent (which I expect to be basically 100% of code, because who tests sending multiple headers, come on) will break the moment multiple headers get sent. Don't make authors do parsing work that you can do instead; particularly, don't add extra parsing work for authors to do just because you want a simple key/value interface.

(Extra particularly, don't force authors to do parsing work that seems like it can be done with a simple .split(",") call, but can't really, because of complicated crappy semantics.)

@annevk
Copy link
Member Author

annevk commented Jun 6, 2014

My problems with the proposal from @sicking are:

  • All popular implementations of headers use multimap rather than map
  • It would not allow us to expose cookies if we decide to do so at some point (it subsets the overall design of HTTP)
  • Imported libraries will have to use isArray() rather than get a consistent API
  • Fail extra late for headers and early for everything else is inconsistent

@reschke
Copy link

reschke commented Jun 6, 2014

Because code written to assume only a single header is sent (which I
expect to be basically 100% of code, because who tests sending multiple
headers, come on) will break the moment multiple headers get sent.

That actually sounds like an advantage. Fail early!

@tabatkins
Copy link
Member

It won't fail early. It'll fail very late, in production, in probably random and difficult-to-reproduce ways, due to header editting/additions from random proxies and rewriters between your server and the user's computer.

Plus, if we forced authors to split, they'll do it in the most obvious, simple way - str.split(","). This'll work 90% of the time, and fail badly in a small number of not-too-rare cases, like Accept-Language. Remember, the HTTP spec doesn't require it to be easy to parse comma-separated header values apart, just that it be possible; it might require arbitrarily complex parsing to do, and knowledge of precisely which header you're dealing with.

But authors can't avoid it! They must defensively code in something to split the values apart, because we're jerks and we always merge them into a comma-separated string, even when the headers are always sent separately because the values are complex to tease apart.

Comma-merging has only one tiny benefit - it lets us, the implementors, use a Map rather than a MultiMap. Besides that, it's extremely author-hostile with literally zero author benefit.

@annevk
Copy link
Member Author

annevk commented Jun 13, 2014

Given support from @jakearchibald et al for my proposal and no real support for Map, I've further clarified that. See http://fetch.spec.whatwg.org/ for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants