Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Type header parsing #39

Closed
annevk opened this issue Oct 3, 2017 · 16 comments · Fixed by #425
Closed

Content-Type header parsing #39

annevk opened this issue Oct 3, 2017 · 16 comments · Fixed by #425

Comments

@annevk
Copy link
Contributor

annevk commented Oct 3, 2017

This is related to #33, but subtly different. As explored in whatwg/mimesniff#30 browsers have different code paths for request and response Content-Type header parsing. Values such as */*, text/html in a response Content-Type header end up being interpreted as text/html, presumably for compatibility with deployed content.

This seems like another fallout of intermediaries adding potentially duplicate headers and (early) implementations being poorly tested for erroneous input.

@reschke
Copy link
Contributor

reschke commented Oct 19, 2017

This appears like a browser issue to me. There is and never was an intention to have different rules.

@annevk
Copy link
Contributor Author

annevk commented Oct 19, 2017

I'm not sure what that means. If it affects browsers, why would it not affect other clients in the HTTP ecosystem? Don't we all consume the same content?

@reschke
Copy link
Contributor

reschke commented Oct 19, 2017

Once again, there are no different rules. Browsers apparently implemented workarounds for misconfigured servers, but that doesn't mean that what browsers do is what everybody else is doing.

@mnot
Copy link
Member

mnot commented Oct 16, 2018

Do we need to do more than reference mimesniff (see #51)? Note there's already generic text about sniffing in Content-Type.

@annevk
Copy link
Contributor Author

annevk commented Oct 16, 2018

This issue is really about parsing Content-Type and determining its (parsed) value, irrespective of any sniffing that might also take place.

@annevk
Copy link
Contributor Author

annevk commented Nov 13, 2018

More concretely, text/html, text/plain means text/plain in browsers. text/plain; means text/plain (not an error). text/html;charset=gbk, text/html means text/html;charset=gbk (yup, charset is copied over as a special case).

whatwg/fetch#831 has a patch for Fetch to define this in detail with an acknowledgment that it doesn't match HTTP.

It seems greatly preferable to uplift this kind of thing and agree on header parsers across the HTTP ecosystem, but if that's not doable so be it.

@mnot
Copy link
Member

mnot commented Feb 2, 2020

Discussed in Basel; we could document a suggested error handling algorithm, as long as it's clear that rejecting the message because the content-type is invalid is still acceptable.

Suggested algorithm would be to use the last syntactically valid type. @annevk does that align with fetch?

@mnot mnot self-assigned this Feb 3, 2020
@annevk
Copy link
Contributor Author

annevk commented Feb 4, 2020

More than today, but it's also more involved: https://fetch.spec.whatwg.org/#content-type-header.

@mnot
Copy link
Member

mnot commented May 25, 2020

Waiting for terminology from #193.

@reschke
Copy link
Contributor

reschke commented Aug 14, 2020

I just updated my ancient tests for content-type, see http://test.greenbytes.de/tech/tc/httpcontenttype.

To my surprise, Safari apparently handles comma-separated different from multiple fields, see http://test.greenbytes.de/tech/tc/httpcontenttype/#multitextxmloneline.

@annevk, that would be a bug according to FETCH, right?

@annevk
Copy link
Contributor Author

annevk commented Aug 14, 2020

Yes, I suspect there's a bug filed already. I think I found such bugs in all browsers as they all have very similar logic in their core HTTP parsers.

@reschke
Copy link
Contributor

reschke commented Aug 14, 2020

Ah. But then, I'm not really convinced that we should add this to the spec. My understanding that the request was based on browsers already agreeing on the error handling...

@annevk
Copy link
Contributor Author

annevk commented Aug 14, 2020

There's no complete agreement on all details involved. See whatwg/fetch#831 (comment) for links to bugs.

@reschke
Copy link
Contributor

reschke commented Aug 18, 2020

I did some more testing with Java client libs (HTTPURLConnection, JDK HttpClient, Apache HttpClient). Of these, only the first seems to have an API getting the type directly, and that indeed takes the last value (no matter whether sent as list in a single instance, or in multiple fields).

It would be nice if there was a concrete Webkit bug that points out their inconsistency (see #39 (comment)) (cc @tfpauly).

@annevk
Copy link
Contributor Author

annevk commented Aug 18, 2020

I recommend either filing one that blocks https://bugs.webkit.org/show_bug.cgi?id=192000 or detailing the specific case you care about there in a comment.

@reschke
Copy link
Contributor

reschke commented Aug 18, 2020

@mnot mnot closed this as completed in #425 Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants