-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699) #67117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Proof of concept: # Script for Python 2
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
response = opener.open("http://localhost:9999") # Data sent is: """ # End of script # Python 3
from urllib.request import urlopen, build_opener
opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
opener.open("http://localhost:9999") # Data sent is: """ # End of script It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code. In bpo-17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a "hard vulnerability" such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim. I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib. Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting "\n", it also breaks on "\r" as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate. Guido Vranken |
There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC). Perhaps the same error should be raised for values that cannot be encoded in Latin-1? Also, maybe most control characters should trigger an error, not just CR and LF. Apart from line folding, the HTTP RFC only allows visible ASCII characters, spaces and tabs, and obsolete non-ASCII chars >= 0x80. |
Here's a patch addressing the potential vulnerability as reported. The patch should also bring the implementation up to date with the most recent standards around header names and values.
I think I'm okay with this given line folding seems to have been implemented by passing multiple value parameters (folding was automatically taken care of by the library). I don't think that this should be merged into anything pre 3.5 as safeguarding /should/ be accounted for by the developer, so I don't think I'd regard this as a high risk security issue. I'm definitely open to debate on that though. |
If we’re in the realm of 3.5 only changes, it might make sense to remove the multi-argument mode of putheader() altogether, and document it only generates a single line. (Currently still says it generates multiple lines.) |
I think that keeping the public API as-is is the better way to go, at |
Good point. Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before. |
After thinking about this a little more, I think I'd prefer to keep putheader('Authorization', 'Bearer', 'my_token') than putheader('Authorization', 'Bearer my_token') I realize it's a semantic change from previous behavior, but it seems to |
But it is not natural to do things like this (based on headers sent by Firefox): putheader("User-Agent", "Mozilla/5.0", "(X11;", "Linux", "x86_64;", "rv:25.0)", "Gecko/20100101", "Firefox/25.0") A way to properly encode different kinds of header values would be nice, but I don’t think the current putheader() API is it. Also, I still think it would be good to allow non-ASCII characters in header values. At least when an already-encoded byte string is supplied. For instance, RTSP is based on HTTP but uses UTF-8 as a default encoding, rather than HTTP’s Latin-1. Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters. |
Good point.
I’m a little torn on this one given one of the SHOULD clauses in RFC 7230 about recipients treating headers with non-ASCII characters as opaque data. However, I’ve read a number of occasions where users are using latin-1 in practice (and it /is/ only a SHOULD clause), so I think it’s likely better to err on the side of caution and allow for the latin-1 charset at least. As for utf-8 though, I think that once we start getting into the realm of other application protocols, that’s something that should have to be extended by the client implementation and not something that should be changed in the base HTTP implementation. The odd part of the API though now is the fact that it’s variadic. I really have no strong opinion on whether elements should be tab or space delimited and the RFC doesn’t seem to lean either way. I think I’m still leaning towards space delimiting to give users the ability to write in either form (putheader(‘Authorization’, ‘Bearer’, ‘token’) or putheader(‘Authorization’, ‘Bearer token’)). As another minor argument for it, it’s also likely a little nicer for logging. I think that optimally, the API would be a single value as you’d suggested, but I’d be concerned about the extent of backwards compatibility issues if that were to be done. I’ll try to get some time tomorrow to make those changes, so it still leaves time for further debate :) |
I’ve updated the patch to include the latin-1 charset in legal header values. It still uses a space as delimiter, but all other comments should now be addressed. |
Added comments on Rietveld. Is there a limit to the length of header line? Would not unfolding all header values exceed the limit? |
Folded header fields are deprecated as of RFC 7230; see <https://tools.ietf.org/html/rfc7230#section-3.2.4\>. The only reasons to fold them I can think of is for readability (debugging), when generating a messsage/http MIME message (which I don’t think the Python library supports), or maybe when dealing with a strange server limitation. Normally there is not meant to be a limit for lines in the HTTP header, although it is common to limit the total unfolded header field value. If we go ahead and drop folding support, perhaps we should deprecate the putheader() multi-argument mode, rather than just document the arguments are now joined by spaces. It seems a pointless API now with this change. |
May be not drop folding support, but just deprecate the putheader() multi-argument mode? And of course add sanity checks for separate putheader() arguments. |
After a chat with David and getting my head wrapped more around backwards compatibility, I also agree that the changes in the patch are far too strict. It's much more important to preserve backwards compatibility than to strictly conform to the RFC. I've updated the patch to allow for (what should be) anything that was previously allowed as header name/value pairs minus carriage returns not immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )). This change fixes the reported issue but should not otherwise affect backwards compatibility. Additionally, even though line folding is deprecated by RFC 7230, I don't think it's necessarily an issue to support line folding until proven to be a problem in practice. With the current implementation, users have the ability to conform to the target server/proxy requirements, based on errors (if obs-fold isn't transparently dealt with as suggested) yielded by each as defined in the RFC. In light of that, I don't think that it's even worthwhile to start deprecating multi-parameter putheader at this point (but I'm open to argument on that one). One note on the deprecation is that if we deprecate multi-parameter, we should also add a warning if an embedded line fold is detected in a single headers value. |
Latest patch should now address all comments. |
I much prefer the new patch with better compatibility and flexibility :) If you want to strengthen the tests to reflect some of the decisions made here you could add the following tests: Positive tests:
Negative tests:
|
I'll return previous implementation of header value regex. All other LGTM. |
New changeset 1c45047c5102 by Serhiy Storchaka in branch '2.7': New changeset bf3e1c9b80e9 by Serhiy Storchaka in branch '3.4': New changeset aa4c6992c126 by Serhiy Storchaka in branch 'default': |
Added new tests and tweaked regexpes. Thank you for your contribution Demian. Now we can start long-standing deprecation process for conforming to RFC. |
Thanks for the tweaks Serhiy, those seem reasonable to me. |
Doesn't this affect Python 3.3 as well, which is in security-only mode? Shouldn't that version be patched as well? |
3.3 is supported for security related fixes until September 2017 [1], but only 3.4, 3.5 and 2.7 have received the backport, reopen for outstanding merge [1] https://docs.python.org/devguide/#status-of-python-branches Update summary to reflect the RedHat CVE that was assigned to this issue. |
Getting to be the last chance to backport this for 3.3.x. |
Oh crap, I didn't know that you already created a PR. I compared the two PR:
The \A was copied from the Python 2.7 commit, since Python 3.3 doesn't have fullmatch(). If you are ok to add \A and convert the NEWS entry to blurb, I can abandon my duplicated PR. |
\A is not needed. match() always matches from the start. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: