-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow response headers to contain array of values #1598
Comments
What an amazing coincidence, I almost posted an issue about this a few weeks ago. However, after writing it down and adding benchmarks to show why it's faster (by ~40% in simple cases), I realized that servers will have to support both variants (the In the end, the positive change in the SPECs might not translate to a positive change in practice. For me (iodine), the
This will be the same as current use case for I think this part of header "correctness" could be left to common sense. |
Most headers can be split by comma. The only one which does not follow this rule that I'm aware of is One way to deal with this is to simply allow Even thought it's kind of ugly to specify custom behaviour, at least it would make it:
In fact, considering HTTP headers as general metadata is generally a mistake, in my experience. There are too many things which have specific semantic behaviour which requires specific things to handle. |
They may, but they don't have to. Servers (and clients) are both allowed to send the headers in multiple lines rather than comma separated values. i.e. I don't think Rack should dictate this decision.
Yes, there are, and every day new standards or proposals might pop up. The wonderful thing about specifying a unified behavior to all HTTP headers, is that the specification isn't fragile - it will still apply to any future change / update. By allowing developers and servers to make these "mistakes" possible (i.e., multiple For example, what if multiple
IMHO, I don't think we need to safe-proof everything. The Rack specification isn't a server or a framework. It doesn't need to require response validation. Some servers could offer security (at the cost of performance) while testing response validity, others could allow mistakes to pass through. |
Even puma stuffed this up in the latest release. So, I think we should consider what we do here. https://twitter.com/nateberkopec/status/1233172403389259782
Actually, the generic representation of headers is an array of key, value pairs. Rack, by using a hash, is already imposing a data structure which can't represent all possible sets of headers and retain the original order. In order to allow for multiple values, Rack specifies that value strings can contain "\n". But this only really works in the case of |
It is clear to me that adding array support could result in an annoying mix between new and old behavior, such as: {'set-cookie' => [ "cookie1=foo\ncookie2=bar", "cookie3=baz"]} I believe that your idea is wise, but I also believe it's impractical unless we make breaking changes that prevent the old behavior as a whole.
I think the fact that the Puma "patch" was, IMHO, ill considered and may have ignored the Rack specification, does not make the specification "bad", nor does it mean that other servers now need to add support for more variations. |
P.S. I understand that Header values are more comfortable to find when stored in a Hash (incoming headers). However, I wonder if header output should be editable after being set. Does anyone "fix" their headers after setting them? Changing the Header response to an Array of Arrays might both improve performance and solve this design issue. I know, it's a backwards compatibility breaking change, but I think breaking backwards compatibility is the only way to update this part of the specification without adding complexity. 🤔 |
This is the model I use in |
In terms of middleware accessing or modifying response headers, yes, it is very common. For some examples, look at middleware that ships with Rack.
This turns accessing or modifying response headers from O(1) to O(N) operation. Granted, N in this case is not usually that large. For the types of operations usually performed on response headers by middleware, using a hash would generally result in simpler code than using an array of arrays. Many middleware assume a hash-like interface for response headers already and would break with a switch to array of arrays. |
I agree with this reasoning 100% - development and maintenance ease is super important. There's so much to do. Keeping developers happy and development time short is a very strong argument.
You're probably right 😅, though I wish you weren't 🙈 Funny enough, I looked at some of these examples when I was designing iodine. I think many of these concerns are server concerns that could (IMHO, should) be shifted to servers. i.e., Content-Length validation, date headers, chunked encoding etc'. Not that it matters, as it's too often that developers require these additional layers. |
I’m very strong on the opinion that All hop headers are ignored: https://github.com/socketry/falcon/blob/master/lib/falcon/adapters/response.rb#L61-L65
Falcon will also fail when the content-length and the content don't match up: https://github.com/socketry/protocol-http1/blob/19f55fb5d8bde238e8df7c9f302e86595cdb9ea8/lib/protocol/http1/connection.rb#L304-L306 |
Sorry, I think I got us all sidetracked. @ioquatix , @jeremyevans , do you agree with my previous assessment that this suggestion only adds complexity (as we will still need to support the existing approach)? If so, then unless we forgo backwards compatibility, perhaps we could close this issue? |
Well, this definitely adds internal complexity. It can make some things simpler for the user, such as supporting multiple values for the same cookie. However, some things could be more complex for the user, since all middleware that needs to operate on header values would need to handle both the string case and the array case. Weighing both sides, I'm against adding allowing response header values to be arrays. |
I think allowing |
It may be simpler to make the spec be that only It smells kind of funny to make SPEC different for specific response header keys, but considering we treat some request header names differently in SPEC, it isn't too much of a stretch. |
I would be okay with that but internally we already use an array so it still seems inefficient. |
I wasn't aware rack used an array internally for |
I haven't fully worked through it, but you can see here: Lines 248 to 257 in 838ce3a
I can dig into it further if you need me to. |
Thanks, that is helpful. It seems both of those areas assume that callers may be violating SPEC. In both cases, the code may work with existing array values, but they generate string values and not arrays in order to comply with SPEC. So it looks like they are both following Postel's Law. It doesn't look like there is much if any inefficiency in these two cases. I think I'm fine keeping that behavior in Utils for compatibility, but I still think we should not modify SPEC to allow response header values to be arrays. |
HeaderHash does it as an optimisation to avoid generating a lot of garbage. |
IMHO, I would suggest that the internal workings of the Rack framework / middleware should not effect the Rack specification (server-side).
This isn't correct. According to the HTTP specification (section 4.2):
For example the
The
There was a similar discussion in the node community here, where is the |
|
@ioquatix , I apologize. I think I wasn't clear about what I meant to communicate, which may have caused us to go in a circle. Yes, I agree with you that other headers could (probably) be unified to a single line. However, right now, applications have a choice - applications are allowed to indicate to the server that they wish to send multiple values either as a comma separated list or as a separate headers. Right now, servers also have a choice. They can concat However, I believe that this proposed change will break existing code for applications that chose to use I also believe that the Rack specification should allow developers to make these choices. |
+1, FWIW. To me Rack-the-spec is, hand-wavily, a fairly direct Ruby encoding of the HTTP protocol.. and it feels a bit too low-level & abstract to be forcing this sort of opinion ("there are two different ways to send this, but the spec says they're the same, so you may only do this one") onto applications. The linked discussion around |
The implications of the current implementation are that we:
It's also sufficiently complicated that Puma recently got the implementation wrong when trying to fix the HTTP/1 header vulnerability.
At a high level, maybe, but the reality is a hash isn't sufficient to represent the headers. If you are creating a proxy, a hash will not allow you to transparently proxy the request. I draw your attention to https://tools.ietf.org/html/rfc7230#section-3.2.2 which states:
So what maps directly to the underlying protocol is:
Cases in (1) don't benefit from newline encoding, semantically there is no need, and (2) needs some more advanced encoding. So we impose the cost of a badly specified |
I agree. But this will also be the situation in the future - unless we are willing to introduce breaking changes. Without breaking changes, servers will always need to test strings (even strings nested within arrays) for a hidden In fact, for security reasons, servers might always need to test for |
Header values can’t contain control characters according to the RFCs. So putting newlines should be invalid, although HTTP/2 HPACK defines header values as a series of octets so it at least won’t break the underlying protocol. So, in a way, looking through the lens of HTTP/2, any string should be a valid value, even one containing newlines.
That's the bug which was discussed earlier, but it only affects HTTP/1 in practice.
Yes, the proposal here is to:
which is a breaking change. I want to actually test if common HTTP/2 servers allow octets or reject them. |
So, I threw together a quick test using HTTP/2:
This tries character codes 0..31 in a header value. Google only rejected That being said,
So, coming back to this point, headers according to HTTP/1 don't allow "\n" so it was a suitable delimiter, but in HTTP/2, it is not in violation of the spec, since HPACK explicitly states it encodes octets. Therefore, I think our usage of it, while it might have been okay at one point, is not fully conforming to the features of HTTP/2 - however in practice it might still be okay. |
I think that in practice, it should still be okay... also, I suspect '\n' and '\r' shouldn't be allowed even in HTTP/2. Although the HTTP/2 transport encoding allows any octet to be securely encoded, IMHO, the octet restrictions still apply, as they are inherited from HTTP/1 according these paragraphs in the HTTP/2 RFC 7540:
Also, AFAIK, HTTP/2 requests and responses should be "translatable" back to HTTP/1, which means that octets that prevent a request from being translated to HTTP/1 is malformed. Having said that, you raised a valid point that future versions of the Rack specification might want to switch to a different approach (one that might break existing middleware) 👍 |
All good points. The HPACK spec explicitly allows octets. The HTTP/2 RFC as you state suggest the semantics should be compatible with HTTP/1. So, yes, I don't think you are wrong, but I also don't think I'm wrong. That being said, encoding headers using "\n" to separate fields is still inefficient and superfluous in almost all headers which support normal comma concatenation. So, I still think that we should change this behaviour. Do you know how Node.JS handles this? |
The node.js API has a response object that offers the But node.js doesn't have to deal with backwards compatibility. Side-Note 🤔: It might not be a bad idea to transition to a node.js compatible approach. This could allow some code porting and trans-compilations to leverage existing code in Ruby (or allow Ruby developers to code node.js applications using Ruby).
I never disagreed. I think the Array is the better solution (it's what I use in the facil.io C framework)... my only concern is the end-result in a world that expects backwards compatibility. |
Well, according to the spec, the response headers should only respond to |
As a server implementor, I would much prefer if the Rack specification would not require me to use the Rack gem. Sure, applications might have Rack as a dependency - however, the server is currently blind to the Rack gem. The server only concerns itself with the details of the specification. |
Thanks everyone for the great discussion. I'm going to have a go at implementing this as I think it will be a major advantage for server performance and simplicity. I agree we don't want users to depend on any specific implementation of Rack for this functionality. |
Okay it was merged. |
A while ago I discussed with @tenderlove - we currently use
\n
characters to separate multiple headers. e.g.Internally
HeaderHash
uses an Array I believe, e.g. something like:Personally I like this formulation better, it's more efficient for the server implementation and it's easier from the user POV to deal with too. It seems like less of a hack, and it should minimise
So, I vote to deprecate/remove the "\n" representation and prefer the Array representation.
The only limitation is going to be if people start using it everywhere, e.g..
There are different solutions to this problem: e.g., we could define only certain headers can work like this (notably
set-cookie
would make sense) or we could mark certain headers as NOT being allowed to be an array (e.g.content-length
).(Epic #1593)
The text was updated successfully, but these errors were encountered: