Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing content after a 204 response #26

Closed
jugglinmike opened this issue May 24, 2017 · 17 comments
Closed

Parsing content after a 204 response #26

jugglinmike opened this issue May 24, 2017 · 17 comments

Comments

@jugglinmike
Copy link

When data is written to a connection following a 204 response, the user agent may interpret the data as it pleases. In web browsers today, this means:

  • The Chromium and Edge web browsers will inspect the first 4 bytes for the
    start of a valid HTTP response. If found, they will parse the data that
    follows as a new response (and if any of those first four bytes bytes are
    invalid they will be discarded). If more than four invalid bytes are
    encountered, the browsers abort parsing and interpret the data as a
    HTTP/0.9 response. This is consistent with their general response parsing
    behavior (i.e. without a preceding a 204 response)
  • The Firefox web browser may inspect 1 kilobyte of data or more (the exact
    number has been variable in my testing) for a valid response. If found, it
    will discard any preceding invalid data. This tolerant behavior is only
    observable following a 204 response; otherwise, Firefox seems to parse in the
    same way as Chromium and Edge.
  • the Safari web browser, upon receiving any invalid data, makes no attempt to
    recover and discards the remaining data

This variation has led to instability in automated tests written for the Web Platform Tests project--see issue 5037.

I originally reported this inconsistency in issue 5227, where @mnot provided the following context (from RFC7230 section 3.3.3):

If the final response to the last request on a connection has been completely
received and there remains additional data to read, a user agent MAY discard
the remaining data or attempt to determine if that data belongs as part of
the prior response body, which might be the case if the prior message's
Content-Length value is incorrect. A client MUST NOT process, cache, or
forward such extra data as a separate response, since such behavior would be
vulnerable to cache poisoning.

Mark followed up by saying:

What I think's being requested is a recommendation for how much data should
be discarded before the client gives up; possibly a minimum. It feels kind of
analogous to when we established the minimum URL length that should be
supported by implementations, so it's not completely off base.

That said, this is truly a corner case; the right answer is "don't do that."
Anyone depending on interop in this case is doing it wrong to start with.

Though I agree with @annevk: "I think ideally HTTP defines how to parse HTTP." Can the specification language be made more explicit for expected behavior in this situation?

Thanks for your consideration!

@mcmanus
Copy link

mcmanus commented May 26, 2017

I agree with anne that this was a flaw in http/1 - but the specification has been made more explicit via 7540 - the successor protocol. This is the normal way of fixing problems. In 7540 there is no ambiguity between responses.

As I've said in different forums, the WPT is trying to test the wrong thing. Fundamentally it is trying to test error handling behavior when no behavior is specified in http/1. There are probably an unbounded number of such cases where a server does something that violates 723x and the client behavior is unspecified - I'm not sure why this one is special. I'd like to think the testing framework focused on finding violations of 723x. The server should fail this particular test for sending data after the 204.

otoh - the situation is much better in http/2 and a client test for that would be deterministic.

@annevk
Copy link
Contributor

annevk commented Jun 28, 2017

What I don't understand is why undefined client-behavior is acceptable and not a bug that needs fixing. It's almost never acceptable (cannot think of exceptions) anywhere else in the platform.

Concretely, it's a problem for new client implementations that wish to compete with existing ones as they have to reverse engineer existing clients to work with broken servers. That's a time-old problem.

@mnot
Copy link
Member

mnot commented Jun 28, 2017

@jugglinmike one clarification -- when you see this happening, what request does the newly "found" response after the 204 get assigned to -- the one that caused the 204, or is pipelining in use?

@jugglinmike
Copy link
Author

@mnot in my tests, the client sends a second request only after it receives a 204 response to the initial request. It is this second request that receives the malformed response.

My test script is available in the following Firefox bug report:

https://bugzilla.mozilla.org/show_bug.cgi?id=1356614

I believe this format precludes any pipelining. Would the effect of pipelining be useful here?

@mnot
Copy link
Member

mnot commented Jun 30, 2017

OK, so it sounds like those bytes sit in a buffer somewhere (TCP? browser?) and get consumed as part of the next response on the wire.

I can think of a few relevant ways to tighten up HTTP here. None of them are specific to 204.

  1. If a HTTP/1 client receives bytes on a connection that doesn't have any outstanding requests, it MUST (do something). I can imagine that they might get assigned as extra bytes on the previous response if it allows a body, or raise an error, or get silently discarded, but there's no way they should be interpreted as a response to a subsequent request. Clarifying this should help all non-pipelining clients behave correctly.

  2. If a HTTP/1 client receives bytes on a connection that does have outstanding requests, I don't know that there's much we can do to help them; there are a mess of heuristics that are in use, but AIUI very little interest in evolving / aligning that code. The best we might do is put a "here be dragons" warning sign up (which is generally the case for pipelining anyway).

  3. When a HTTP/1 client is expecting a response and gets garbage bytes, it would be nice to align or at least put limits on how much tolerance they have. This is what the bug originally asked for, but that depends on implementers wanting to do so. In the meantime, I think (1) above might help substantially (and it's something browsers should really fix IMO).

The only catch for (1) is that it's naturally racy; there are always going to be cases where bytes are in flight, or the TCP buffers aren't fully drained before the browser starts to write(). So it'll probably have to be a SHOULD with caveats.

@mcmanus thoughts?

@mcmanus
Copy link

mcmanus commented Jun 30, 2017 via email

@annevk
Copy link
Contributor

annevk commented Jun 30, 2017

@mcmanus for the reasons explained in #26 (comment)?

@mcmanus
Copy link

mcmanus commented Jun 30, 2017

I don't see 'how browsers interop with broken legacy servers document' to be an http item. We don't make things better by having a discussion of "which http/1 do you implement?"

This energy would seem better directed towards fixing the broken actors- to the extent its actually a problem at all.

@annevk
Copy link
Contributor

annevk commented Jun 30, 2017

It's much harder to fix millions of servers than aligning a couple of clients. (See also all kinds of other parsers, such as HTML, URL, CSS, etc.)

@mnot
Copy link
Member

mnot commented Jul 2, 2017

@mcmanus h1 isn't deprecated yet.

@mcmanus
Copy link

mcmanus commented Jul 10, 2017

its not a matter of deprecation of 723x (which would mean stop using it) - but of forking 723x into new-h1 and h2.

now h2 incorporates a whole lot of 723x by reference - though definitely not the framing that is part of this particular issue - and pulling that out to a new document that can evolve makes more sense to me.

@ekr
Copy link

ekr commented Jul 25, 2017

@annevk: "What I don't understand is why undefined client-behavior is acceptable and not a bug that needs fixing. It's almost never acceptable (cannot think of exceptions) anywhere else in the platform"

That's not really true as a general matter for network protocols. Rather, we define what legal behavior from the peer is, and then implementations have quite a bit of freedom in how to handle nonconformant behavior unless the standard specifically tells them what to do.

@annevk
Copy link
Contributor

annevk commented Jul 26, 2017

I understand that's the way you go about things. I don't understand why that's good or why lessons we learned elsewhere about that being problematic are not applicable. See also https://annevankesteren.nl/2016/05/client-server.

@ekr
Copy link

ekr commented Jul 26, 2017

I'm not sure what "you" refers to, because in this case, it's basically how every network protocol I've ever worked on has been defined. As for "lessons we learned elsewhere", you might consider that other people have learned lessons from their experience as well.

@annevk
Copy link
Contributor

annevk commented Jul 26, 2017

I understand that and I'd like to learn about them. It's interesting to me how something that applies to a wide variety of formats, would stop applying the moment a format is used for transport.

@ekr
Copy link

ekr commented Jul 26, 2017

Happy to chat about this offline, but this issue probably isn't the place

@mnot mnot changed the title Under-specified: parsing behavior following 204 response Parsing content after a 204 response Jun 29, 2018
@mnot
Copy link
Member

mnot commented Oct 16, 2018

Now that #145 is done, Associating a Response to a Request seems like a natural place to talk about this in a limited fashion (as I outlined above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants