New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3 fail on parsing http header #69725
Comments
I tried to login some website using requests session, but it failed because of parsing header. Date: Mon, 02 Nov 2015 08:45:48 GMT P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC" function parse_headers in http.client, parsing exits when it meet CRLF. (https://hg.python.org/cpython/file/tip/Lib/http/client.py#l197) |
Are you able to print out the repr() of the header or the entire HTTP response so we can see exactly what characters are there? Or provide a URL if it is a public server. I suspect it may not be a completely blank line, but may have whitespace there. Both Python 2 and 3 should stop parsing the HTTP header when they meet a blank line (two CRLFs in a row). This marks the start of the HTTP body. See <https://tools.ietf.org/html/rfc7230#section-3\>. |
b'Date: Tue, 03 Nov 2015 10:05:42 GMT\nServer: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2\n PHP/5.2.16 mod_fastcgi/2.4.6\nX-Powered-By: PHP/5.2.16\nP3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM\n PRE"\n\nP3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\nSet-Cookie: PHPSESSID=; path=/\nSet-Cookie: PHPSESSID=; path=/\nContent-Length: 79\nConnection: close\nContent-Type: text/html\n\n' There's no other characters except CRLF. Here's gist for test. (https://gist.github.com/littmus/9625a4436e1edfb3afe9) |
It looks like a bug at server side. \n\n is the separator between headers and data. An header must not end with \n\n. |
I think so but same http request works well in Python2 |
I had a mistake. That's not CRLF but just two '\n'. |
I monkey patched the method end: b'POST /bbs/login_check.php HTTP/1.1\r\nHost: www.koreapas.com\\r\\nAccept-Encoding: identity\r\nContent-Length: 31\r\nUser-Agent: Mozilla/5.0 (Linux; Android 4.2.2; GT-I9505 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.59 Mobile Safari/537.36\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nuser_id=blahsdfi&password=qwera' ###################################### |
I found that Python3 no more uses header format from rfc822, and the data |
Okay, now I understand the problem. There is a quirky header line with a space in the field name “P3P ”. The HTTP client’s parse_headers() treats this as any other header line, but then it passes the header to the email package, which interprets this line as being invalid and marking the end of the header. Therefore subsequent important header fields are missed, including Set-Cookie and Content-Length. According to <https://tools.ietf.org/html/rfc7230#section-3.2\>, that P3P line is technically not valid. Here is a self-contained demo or test case: from socket import socket
from threading import Thread
from http.client import HTTPConnection
def serve():
[client, _] = server.accept()
with client, client.makefile("rb") as reader:
while reader.readline().rstrip(b"\r\n"):
pass
client.sendall(
b"HTTP/1.1 200 OK\r\n"
b"Content-Length: 0\r\n"
b"Extra-Space : invalid\r\n"
b"Set-Cookie: name=value\r\n"
b"\r\n"
)
with socket() as server:
server.bind(("localhost", 0))
server.listen()
background = Thread(target=serve)
background.start()
http = HTTPConnection(*server.getsockname())
http.request("GET", "/")
response = http.getresponse()
print(response.msg.items()) # Set-Cookie is missing
http.close()
background.join() The question is, should Python go out of its way to handle this server bug? It would probably require implementing a more permissive version of the header parser in the HTTP client, rather than reusing the stricter “email” module’s parser. |
Just noticed the whitespace scenario is mentioned at <https://tools.ietf.org/html/rfc7230#section-3.2.4\>: ''' It would not be possible build a proxy that does that using Python 3’s current HTTP client. |
Yeah, this is server's fault and python does not have to deal with non-standard cases. I'll close this issue. Thanks! |
Support for handling such headers could be added to the new email API (ie: add a policy setting to accept them), if someone wants to make a feature request. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: