-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header spoofing and CGI header name encoding. #13
Comments
Perhaps we can have an unnormalised section in the environ - e.g. raw_headers[] which has the headers in whatever form the gateway received them, to permit that case? |
We could also break tradition and not use CGI header encoding at all. Is there a positive reason to keep CGI header encoding apart from tradition? |
+1. Header encoding is a hack that's nearly as old as Python. It's rife On Tue, Jan 13, 2015 at 8:36 AM, Chad Whitacre notifications@github.com
--Guido van Rossum (python.org/~guido) |
I propose a tuple (or list) of two-tuples of bytestrings for headers: ( (b'Host', b'www.example.com')
, (b'Accept', b'text/plain')
) Per #16 these should contain all the bytes that were received on the wire, in the order they were received. |
This sounds like the turf currently occupied by the various request objects. Are we intending to tackle "one true request object" with WSGI NG? In proposing a tuples-of-bytestrings data structure for headers, I was assuming that we would still have a request object ecosystem. |
Make it a list; everyone and their aunt wants to manipulate this so an On Tue, Jan 13, 2015 at 8:44 AM, Chad Whitacre notifications@github.com
--Guido van Rossum (python.org/~guido) |
It could be Request objects all the way down. :-) On Tue, Jan 13, 2015 at 8:48 AM, Chad Whitacre notifications@github.com
--Guido van Rossum (python.org/~guido) |
I'm trying to think of how to do this while remaining lossless (#16), and it's causing me to think about why I care about losslessness in the first place. Because at some point, yes: we have to jettison that information. Which point? If we try to preserve whitespace then we'd have something like: [(b'Foo', b' bar\n baz')] But to what end? If somebody cares about the whitespace in a header (they're researching usage of whitespace in HTTP headers? they're debugging some esoteric bug that I can't imagine?) then I suppose they could/would just drop below WSGI anyway. WSGI is for server and framework authors, not researchers. I guess that removing whitespace from tuples-of-bytestrings would result in: [(b'Foo', b'bar baz')] |
I took a quick look over at Node-land for comparison: they have a core IncomingMessage prototype that gets passed to I guess the same is true in Python-land, we just call our base object
|
Also: a list of [named]tuples of bytestrings would be parallel to how current WSGI handles response headers. I think we should keep them symmetrical to reduce mental overhead. |
Frameworks generally implement headers objects as mapping types, and lists of tuples of bytestrings (LTBs?) are perfect for feeding to mapping types. |
We're in total agreement, right? On Tue, Jan 13, 2015 at 9:50 AM, Chad Whitacre notifications@github.com
--Guido van Rossum (python.org/~guido) |
Pretty close, anyway. :-) I had to change from hearing "request object" as "one true request" to "requests all the way down". I like "requests all the way down" (#17). As to the API of said WSGI NG request object, I see some convergence on a list of namedtuples for headers, though earlier you said, "Let's just use case-insensitive actual header names," which indicates the sort of mapping that I've been thinking is best implemented at the framework layer. On the other hand, to the extent that a given WSGI server wants to inspect request headers (not just parse them), it'd seem a shame to have multiple implementations of a case-insensitive header mapping (one at the server layer, another at the framework layer). Maybe the rubric for designing the base request object is: what request API do server authors need? Do existing server implementations inspect request headers? |
Is the Pope Catholic? :-) TBH I'm not sure I'm fully with the program where it comes to the On Tue, Jan 13, 2015 at 1:36 PM, Chad Whitacre notifications@github.com
--Guido van Rossum (python.org/~guido) |
Yes. Cursory inspection of a couple implementations indicates that it's mostly around body parsing, and I'm not sure mod_wsgi counts since it's in C, so it wouldn't be using a Python request API anyway (or would it?). https://github.com/benoitc/gunicorn/blob/cc1ddf16e285460c409fe381843cae87022d9644/gunicorn/http/message.py#L102-L108 https://github.com/GrahamDumpleton/mod_wsgi/blob/ba67c740005ff657311809affbc3d820e86e96e1/src/server/mod_wsgi.c#L6436-L6516 |
Servers: mod_wsgi, uwsgi, gunicorn, etc. It's not a logically necessary distinction, but it seems to be pretty evolutionarily stable.
I don't think middleware panned out as originally promised (an ecosystem of interchangeable components), but I think the chainability of WSGI has been a plus for scabbing together multiple frameworks, e.g. Is this chicken scratch readable at all? |
Continuation lines don't exist in HTTP/2. (Continuation frames are different). Since the point of the WSGI and WSGI-NG is to be the abstraction between servers and applications, including stuff like that would be harmful. OTOH the mangling is certainly annoying, and I agree that we can and should separate out the header fields. https://tools.ietf.org/html/rfc7230#section-3.2 - fieldnames are just a case insensitive token, for which the BNF is:
(delimiters are (DQUOTE and "(),/:;<=>?@[]{}").) HTTP_FOO is case insensitive (folded to uppercase) and, with the HTTP_ prefix added. But CGI goes further as Graham notes, and that is where the issues arise IMO - and why some servers are filtering out valid headers. We have a choice here: we could take the minimum set that any server will accept, or we could take the RFC specified set, and accept that some servers will choose not to pass on that full set. The latter is better I think - it preserves choice for later should we need it, at little cost. So the mangling has three aspects: case, non-alpha characters, and the HTTP_ prefix. If we move the header fields to a separate data structure (whether a sub component of environ, or a new parameter or whatever) we can drop the HTTP_ prefix without confusing anything. AFAIK there isn't a case insensitive string type in Python, and I kindof think that adding one for this would be overkill. We either then cause lots of 'header.lower() == "location"' comparisons into code, or we continue to do case normalisation. I think we should continue to do case normalisation, either upper or lower case because of this. non-alpha mangling however was only needed for passing via the process environment block (and that was limited to the variable names that can be put in there sanely) - e.g. & in a name would be -bad-. As we're in-python we can drop that entirely. As far as the structure goes, I'm personally fond of a dict of lines: {'location': [b'foo', b'bar',b'quux']} We should talk about value encoding, but thats a separate issue to this. |
Ah, right, of course. Derp.
+1
Sure, +0, as long as the value is always a list instead of sometimes a list and sometimes a bytestring. ;-)
Reticketed as #18. |
As originally raised in:
the CGI header name encoding convention can leave a WSGI application open to header spoofing problems.
This in particular will be an issue where a proxy is setting some special custom headers related to authentication or which includes information about a client. For example, using X-Forwarded-* headers.
The spoofing problem is because of the CGI rule around how header names are converted. That is:
So this means that X-Forwarded-For is translated to HTTP_X_FOWARDED_FOR. The problem is that if a client itself sends X_Forwarded_For, then it would also map to the same thing.
By the rules above the two values would be concatenated if a proxy set one and the client sent the other, usually separating the values with a comma. If you are attempting to block certain clients based on this, then the header value could be poisoned and cause problems for such a scheme.
Apache 2.4, and possibly recent versions of nginx as well, will discards headers when translating to CGI names if they contain characters other than alpha numerics and dashes.
For Apache 2.4, mod_wsgi at least would therefore be protected against this issue, although latest version of mod_wsgi also applies same strategy to Apache 2.2 as well to avoid problems.
Various other WSGI servers do not currently enforce such restrictions and so would be affected.
An updated WSGI specification should perhaps say something specific about this issue and say that WSGI servers should avoid the problem by applying the same restriction.
Enforcing such a restriction may have the consequence of blocking some WSGI applications from working which rely on custom headers from working if they fall afoul of the restriction though.
The text was updated successfully, but these errors were encountered: