Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow responses to omit a body and Content-Length header #288

Merged
merged 1 commit into from
Dec 2, 2020

Conversation

atheriel
Copy link
Contributor

@atheriel atheriel commented Nov 25, 2020

The current behaviour of httpuv is to automatically set a Content-Length header computed from the length of the body. This results in incorrect HTTP 204, 304, and 1xx responses.

RFC 7230 (part of the HTTP 1.1 specification) says the following about the Content-Length header:

A server MAY send a Content-Length header field in a 304 (Not
Modified) response to a conditional GET request (Section 4.1 of
[RFC7232]); a server MUST NOT send Content-Length in such a response
unless its field-value equals the decimal number of octets that would
have been sent in the payload body of a 200 (OK) response to the same
request.

A server MUST NOT send a Content-Length header field in any response
with a status code of 1xx (Informational) or 204 (No Content).

Moreover, HTTP 204 and 304 responses should actually send no body at all. RFC 7231 states:

A 204 response is terminated by the first empty line after the header
fields because it cannot contain a message body.

and RFC 7232 says:

A 304 response cannot contain a message-body; it is always terminated
by the first empty line after the header fields.

This PR proposes changes to the handling of HTTP 204 and 304 responses to bring them in line with the spec. I haven't touched HTTP 1xx responses because they're mostly irrelevant and some old Websocket implementations actually do, weirdly, use a Content-Length.

The implementation is actually really simple: we now ignores any body if a Rook response includes 204 or 304 as the
status, leaving the DataSource pointer as NULL. This ensures (1) that we don't send a body, and as a side effect (2) that we don't set a Content-Length header. I added a comment to the Content-Length header code explaining this edge case, as well, even though there are no code changes there.

Edit: As per discussion, this behaviour is now optional, triggered by excluding (or setting as NULL) the body in a Rook response.

It's still possible for users to add a Content-Length header directly, which they might want to do, e.g. to fulfill the HTTP 304 behaviour quoted above.

@wch
Copy link
Collaborator

wch commented Nov 25, 2020

Hm, I don't really like the idea of fixing up responses in the C++ code -- in the long run, this could lead to complicated logic and maintenance difficulties, and users might be confused about why the response they create isn't the one that's being sent.

I think it makes more sense for the application author to send a correct response, but it looks like this may require some modifications, namely, allowing a response to not to have a body (and also not compute content-length in that case).

One other thing: I see some places in the existing code where HttpResponse::_pBody is compared to NULL, but I'm not sure that's a valid check, since it's now a shared_ptr.

Also, it will help if you can include an example app and some tests.

@atheriel
Copy link
Contributor Author

I think that the need to fix these specific responses basically follows from the decision to automatically add Content-Length. For non-204, 304, and 1xx responses, it's absolutely right to add it, and downstream frameworks (including Plumber, but likely also Shiny) now expect it to be there. But automatically adding it when missing should also mean automatically leaving it out when it ought not to be included -- that's just the complexity you inherit from HTTP itself.

Nor do I think it will make sense to trigger this depending on whether a body has been added. The current code already accepts NULL as a body and sets the content-length in that case to zero. The same is true of "empty" character(0) and raw(0) bodies. If we started treating those empty bodies as obviating the need for a Content-Length header we'd also break existing (legitimate) uses. From RFC 7230 again:

For example, a Content-Length header field is normally sent in a POST request even when the value is 0 (indicating an empty payload body).

The classic use for a HTTP 204 response is for PUT requests, but they are also used for DELETE in some APIs. So an example app where you'd see all of this would be a simple CRUD API that supports If-None-Match for GET requests (and thus will return HTTP 304 in some cases) and returns HTTP 201 or 204 for PUT requests, plus HTTP 204 or 404 for DELETE requests. Is that what you're looking for? Do you want a code example?

I'm also happy to add tests, that was an oversight for sure.

One other thing: I see some places in the existing code where HttpResponse::_pBody is compared to NULL, but I'm not sure that's a valid check, since it's now a shared_ptr.

Yeah, I noticed that outdated check and piggybacked on it to simplify the implementation -- uninitialized shared_ptr seems to be NULL. Maybe that's unexpected, my C++ knowledge is pretty poor.

@atheriel
Copy link
Contributor Author

Also, if you want examples of how other frameworks/languages handle this -- from my cursory look, they do -- I can find a few.

@wch
Copy link
Collaborator

wch commented Nov 26, 2020

Unless I'm mistaken, the current code does not allow body to be NULL or character(0) or raw(0).

Here's an example app with body=NULL:

library(httpuv)

handle_req <- function(req) {
  list(
    status = 200,
    headers = list(
      'Content-Type' = 'text/plain'
    ),
    body = NULL
  )
}

s <- startServer("0.0.0.0", 5000,
  list(
    call = function(req) {
      # Call out to external function so we can modify it dynamically for
      # experimentation.
      handle_req(req)
    }
  )
)

Here's what happens when you try to access it:

$ curl -i http://localhost:5000
HTTP/1.1 500 Internal Server Error
Date: Thu, 26 Nov 2020 04:23:18 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 22

An exception occurred.

Same for character(0) and raw(0). However, "" works.

So I'm thinking that to indicate that there's no body, the returned object could have body=NULL or no body at all.

Yeah, I noticed that outdated check and piggybacked on it to simplify the implementation -- uninitialized shared_ptr seems to be NULL. Maybe that's unexpected, my C++ knowledge is pretty poor.

A shared_ptr can't actually be NULL, but I did some Googling and apparently it's possible to compare shared_ptr to NULL, probably because of some operator overloading. However, I think it's more correct to use if (x != nullptr) or just if (!x).

@atheriel
Copy link
Contributor Author

atheriel commented Nov 26, 2020

Unless I'm mistaken, the current code does not allow body to be NULL or character(0) or raw(0).

I must be going crazy, raw(0) works for me locally. character(0) does not (since charToRaw() will error on character(0)), which I admit to never testing -- I assumed it would work like "".

I was also sure that NULL worked at one point during my testing, but that no longer seems to be the case. Possibly I had some sort of mixture of debugging and non-debugging files interacting, since Rcpp actually calls abort() if you pass NULL in debug mode...

I think it's more correct to use if (x != nullptr) or just if (!x).

Oh, that makes sense. I think the code uses !_pBody already in this case, so that should work correctly.

So I'm thinking that to indicate that there's no body, the returned object could have body=NULL or no body at all.

My objection to this would be that it passes the buck to downstream frameworks to handle this correctly, and I'm not sure that's the right thing to do. Currently Plumber, breakr, fiery, ambiorix, etc. all behave badly for HTTP 204 because of httpuv -- would it really be better to allow them to opt-in to correct behaviour we know they're currently incapable of, instead of handling it here?

ps. I added a small unit test to the commit.

@atheriel
Copy link
Contributor Author

However, I am willing to implement the NULL-or-missing body approach if you believe it is the right one.

@wch
Copy link
Collaborator

wch commented Nov 26, 2020

If someone says to give this response:

  list(
    status = 304,
    headers = list(
      'Content-Type' = 'text/plain'
    ),
    body = "this is some body content"
  )

I think that it's reasonable for httpuv to send the whole thing (with the body) instead of "fixing" the response by removing the body.

If they want to create a response that conforms to the RFC, they could do this:

  list(
    status = 304,
    body = NULL
  )

Or even:

  list(
    status = 304,
  )

(BTW, httpuv had some issues with empty headers -- they had to be an empty named list -- but I fixed that in #289.)

I don't really understand the issue with plumber, breakr, fiery, and others: they could be modified to provide a NULL body, right (after httpuv accepts that, of course)? I don't think it's passing the buck to expect them to provide a response like that; it seems to me that they would be passing the buck if they provide whatever body content and expect httpuv to fix up the response for them.

The logic in httpuv is pretty straightforward with what I'm proposing. In general, it will send what you tell it to send, and add a content-length header if appropriate. If there's a body, it will send it, along with a content-length header. If there's no body, then it obviously won't send it, and also won't add a content-length header.

@atheriel
Copy link
Contributor Author

Hmm, hit a snag here. I just tried implementing this (an optional body signals no Content-Length), but HTTP clients don't like when the Content-Length header is missing for non-204/304/1xx responses.

Given the following server:

httpuv::runServer(
  "127.0.0.1", 8081,
  list(call = function(req) {
    if (req$PATH_INFO == "/nocontent") {
      list(
        status = 204L,
        body = NULL
      )
    } else if (req$PATH_INFO == "/bad") {
      # What happens if we omit Content-Length for HTTP 200?
      list(
        status = 200L,
        body = NULL
      )
    }
  })
)

curl will do the following:

$ curl -vs localhost:8081/nocontent
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET /nocontent HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 204 No Content
< Date: Thu, 26 Nov 2020 20:26:27 GMT
<
* Connection #0 to host localhost left intact
$ curl -vs localhost:8081/bad
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET /bad HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 26 Nov 2020 20:26:30 GMT
* no chunk, no close, no size. Assume close to signal end
<
^C

That is, it'll just hang, waiting for additional data to know that the body is empty. Curl-based R clients seem to do the same.

So we can add support for omitting the body and header by passing NULL, but this only makes sense for 204/304/1xx clients and will cause weird client issues otherwise... which leads me to think that httpuv should have special handling for these codes instead.

@wch
Copy link
Collaborator

wch commented Dec 1, 2020

I think makes sense to be able to send a message without a body, for any response code. For example, if it's a HEAD request, the spec says the response shouldn't have a body. From Section 4.3 of https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html:

All responses to the HEAD request method MUST NOT include a message-body, even though the presence of entity- header fields might lead one to believe they do. All 1xx (informational), 204 (no content), and 304 (not modified) responses MUST NOT include a message-body. All other responses do include a message-body, although it MAY be of zero length.

Also relevant:

The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers

Here's how I understand the spec:

  • For HEAD requests:
    • There can be a CL (content-length) header, but it must be the size that would be sent.
    • Or there can be no CL header.
    • In either case, the automatically computed CL is not right.
  • For 1xx and 204 responses:
    • There should not be a CL header
  • For 304 responses (this is the same as for HEAD requests):
    • There can be a CL header, but it must be the size that would be sent.
    • Or there can be no CL header.
    • In either case, the automatically computed CL is not right.

The tricky thing here is that by the time httpuv gets the the response, it doesn't know if the request was a HEAD or GET. But for a HEAD request, it is valid to send a 200 response with no body (and no content-length header).


I think it makes sense to do the following:

  • Automatically compute and add a content-length header, except when:
    • The user provides a content-length header, in which case use that value.
    • For cases where body is NULL, or missing, don't include a content-length header.
    • If 'content-length'=NULL, don't include a content-length header.

It will also need some documentation so that users will know about this behavior.

One other quick thing: I realized that body=raw(0) currently works -- I must have also gotten mixed up when testing that!

@atheriel
Copy link
Contributor Author

atheriel commented Dec 1, 2020

If 'content-length'=NULL, don't include a content-length header.

I'm not sure I understand this -- do you mean that a user can literally pass a content-length header set to NULL in the Rook response?

Also, I'm a fan of docs for this, but where would you like it to be documented?

I'll add a test for HEAD behaviour, too.

@wch
Copy link
Collaborator

wch commented Dec 1, 2020

If 'content-length'=NULL, don't include a content-length header.

I'm not sure I understand this -- do you mean that a user can literally pass a content-length header set to NULL in the Rook response?

Currently, you can set a Content-Length header to any value, with something like this:

    headers = list('Content-Length' = 123)

What I was proposing is that this would suppress the header, even if a body was actually included:

    headers = list('Content-Length' = NULL)

But now that you mention it, maybe that's not necessary... I can't think any cases when it would be helpful. I think that it should be sufficient to have body=NULL suppress the `Content-Length' header.

As for docs, how about putting it in the help for startServer, in a new section, like:

#' @section Response values:
#'
#' ....

RFC 7231 (part of the HTTP 1.1 specification) states[0]:

> A 204 response is terminated by the first empty line after the header
> fields because it cannot contain a message body.

while RFC 7232 says[1]:

> A 304 response cannot contain a message-body; it is always terminated
> by the first empty line after the header fields.

In addition, RFC 7230 says the following about the Content-Length
header[2]:

> A server MAY send a Content-Length header field in a 304 (Not
> Modified) response to a conditional GET request (Section 4.1 of
> [RFC7232]); a server MUST NOT send Content-Length in such a response
> unless its field-value equals the decimal number of octets that would
> have been sent in the payload body of a 200 (OK) response to the same
> request.
>
> A server MUST NOT send a Content-Length header field in any response
> with a status code of 1xx (Informational) or 204 (No Content).

That is, we need to provide a way to suppress the inclusion of a body
and the generated Content-Length header so that downstream users can
emit correct HTTP 204 and 304 responses. This commit makes that possible
by allowing the Rook response to have a missing (or NULL) body, in which
case the usual body handling will be skipped.

It's still possible for users to add a Content-Length header directly,
which they might want to do, e.g. to fulfill the HTTP 304 behaviour
quoted above.

As per Winston Chang's request, we don't explicitly handle these status
codes -- users must implement this behaviour themselves.

Unit tests and updated documentation on the reponse object are included.

[0]: https://tools.ietf.org/html/rfc7231#section-6.3.5
[1]: https://tools.ietf.org/html/rfc7232#section-4.1
[2]: https://tools.ietf.org/html/rfc7230#section-3.3.2
@atheriel atheriel changed the title Ensure that HTTP 204/304 responses omit a body and Content-Length header Allow responses to omit a body and Content-Length header Dec 1, 2020
@atheriel
Copy link
Contributor Author

atheriel commented Dec 1, 2020

Done ready for review. I took the liberty of fixing a small typo in the existing docs as I was updating them, as well as mentioning the new behaviour for headers. I've also updating the title and post of this PR to reflect discussion.

Copy link
Collaborator

@wch wch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants