Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 fail on parsing http header #69725

Closed
ghost opened this issue Nov 3, 2015 · 12 comments
Closed

python3 fail on parsing http header #69725

ghost opened this issue Nov 3, 2015 · 12 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ghost
Copy link

ghost commented Nov 3, 2015

BPO 25539
Nosy @vstinner, @bitdancer, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-11-04.10:54:23.402>
created_at = <Date 2015-11-03.06:38:03.845>
labels = ['type-bug', 'library']
title = 'python3 fail on parsing http header'
updated_at = <Date 2015-11-04.15:19:30.537>
user = None

bugs.python.org fields:

activity = <Date 2015-11-04.15:19:30.537>
actor = 'r.david.murray'
assignee = 'none'
closed = True
closed_date = <Date 2015-11-04.10:54:23.402>
closer = '\xec\x8b\xa0\xeb\x8f\x99\xec\x9b\x90'
components = ['Library (Lib)']
creation = <Date 2015-11-03.06:38:03.845>
creator = '\xec\x8b\xa0\xeb\x8f\x99\xec\x9b\x90'
dependencies = []
files = []
hgrepos = []
issue_num = 25539
keywords = []
message_count = 12.0
messages = ['253966', '253971', '253977', '253978', '253979', '253980', '253983', '253984', '254037', '254048', '254049', '254056']
nosy_count = 4.0
nosy_names = ['vstinner', 'r.david.murray', 'martin.panter', '\xec\x8b\xa0\xeb\x8f\x99\xec\x9b\x90']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue25539'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

@ghost
Copy link
Author

ghost commented Nov 3, 2015

I tried to login some website using requests session, but it failed because of parsing header.
This is header data of problem. It contains two line of continuous CRLF between the data.

Date: Mon, 02 Nov 2015 08:45:48 GMT
Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6
X-Powered-By: PHP/5.2.16
P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE"

P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"
Set-Cookie: PHPSESSID=; path=/
Set-Cookie: PHPSESSID=
; path=/
Content-Length: 81
Connection: close
Content-Type: text/html

function parse_headers in http.client, parsing exits when it meet CRLF. (https://hg.python.org/cpython/file/tip/Lib/http/client.py#l197)
Is it standard? Same code works well in python2, or even similar code in ruby.

Repository owner added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Nov 3, 2015
@vadmium
Copy link
Member

vadmium commented Nov 3, 2015

Are you able to print out the repr() of the header or the entire HTTP response so we can see exactly what characters are there? Or provide a URL if it is a public server. I suspect it may not be a completely blank line, but may have whitespace there.

Both Python 2 and 3 should stop parsing the HTTP header when they meet a blank line (two CRLFs in a row). This marks the start of the HTTP body. See <https://tools.ietf.org/html/rfc7230#section-3\>.

@vadmium vadmium added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 3, 2015
@ghost
Copy link
Author

ghost commented Nov 3, 2015

b'Date: Tue, 03 Nov 2015 10:05:42 GMT\nServer: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2\n PHP/5.2.16 mod_fastcgi/2.4.6\nX-Powered-By: PHP/5.2.16\nP3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM\n PRE"\n\nP3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\nSet-Cookie: PHPSESSID=; path=/\nSet-Cookie: PHPSESSID=; path=/\nContent-Length: 79\nConnection: close\nContent-Type: text/html\n\n'

There's no other characters except CRLF.
And other weird result, in python2 there are no two CRLF between P3P:~.

Here's gist for test. (https://gist.github.com/littmus/9625a4436e1edfb3afe9)

@vstinner
Copy link
Member

vstinner commented Nov 3, 2015

It contains two line of continuous CRLF between the data.

It looks like a bug at server side. \n\n is the separator between headers and data. An header must not end with \n\n.

@ghost
Copy link
Author

ghost commented Nov 3, 2015

I think so but same http request works well in Python2

@ghost
Copy link
Author

ghost commented Nov 3, 2015

I had a mistake. That's not CRLF but just two '\n'.

@ghost
Copy link
Author

ghost commented Nov 3, 2015

I monkey patched the method parse_headers to print what the fp read and finally arrived at that http.client is not a problem.
email.parser.Parser.parsestr adds two '\n' between two lines of "P3P: ~~".

end: b'POST /bbs/login_check.php HTTP/1.1\r\nHost: www.koreapas.com\\r\\nAccept-Encoding: identity\r\nContent-Length: 31\r\nUser-Agent: Mozilla/5.0 (Linux; Android 4.2.2; GT-I9505 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.59 Mobile Safari/537.36\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nuser_id=blahsdfi&password=qwera'
reply: 'HTTP/1.1 200 OK\r\n'
b'Date: Tue, 03 Nov 2015 11:23:16 GMT\r\n'
b'Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6\r\n'
b'X-Powered-By: PHP/5.2.16\r\n'
b'P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE"\r\n'
b'P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\r\n'
b'Set-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/\r\n'
b'Content-Length: 1504\r\n'
b'Connection: close\r\n'
b'Content-Type: text/html\r\n'
b'\r\n'
########################## hstring
Date: Tue, 03 Nov 2015 11:23:16 GMT
Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6
X-Powered-By: PHP/5.2.16
P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE"
P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"
Set-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/
Content-Length: 1504
Connection: close
Content-Type: text/html

######################################
########################### parsed
b'Date: Tue, 03 Nov 2015 11:23:16 GMT\nServer: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2\n PHP/5.2.16 mod_fastcgi/2.4.6\nX-Powered-By: PHP/5.2.16\nP3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM\n PRE"\n\nP3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\nSet-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/\nContent-Length: 1504\nConnection: close\nContent-Type: text/html\n\n'
#######################################

@ghost
Copy link
Author

ghost commented Nov 3, 2015

I found that Python3 no more uses header format from rfc822, and the data
b'P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\r\n'
not match with headerRE = re.compile(r'^(From |[\041-\071\073-\176]*:|[\t ])') so it makes problem.

@vadmium
Copy link
Member

vadmium commented Nov 4, 2015

Okay, now I understand the problem. There is a quirky header line with a space in the field name “P3P ”. The HTTP client’s parse_headers() treats this as any other header line, but then it passes the header to the email package, which interprets this line as being invalid and marking the end of the header. Therefore subsequent important header fields are missed, including Set-Cookie and Content-Length.

According to <https://tools.ietf.org/html/rfc7230#section-3.2\>, that P3P line is technically not valid. Here is a self-contained demo or test case:

from socket import socket
from threading import Thread
from http.client import HTTPConnection

def serve():
    [client, _] = server.accept()
    with client, client.makefile("rb") as reader:
        while reader.readline().rstrip(b"\r\n"):
            pass
        client.sendall(
            b"HTTP/1.1 200 OK\r\n"
            b"Content-Length: 0\r\n"
            b"Extra-Space : invalid\r\n"
            b"Set-Cookie: name=value\r\n"
            b"\r\n"
        )

with socket() as server:
    server.bind(("localhost", 0))
    server.listen()
    background = Thread(target=serve)
    background.start()
    http = HTTPConnection(*server.getsockname())
    http.request("GET", "/")
    response = http.getresponse()
    print(response.msg.items())  # Set-Cookie is missing
    http.close()
    background.join()

The question is, should Python go out of its way to handle this server bug? It would probably require implementing a more permissive version of the header parser in the HTTP client, rather than reusing the stricter “email” module’s parser.

@vadmium
Copy link
Member

vadmium commented Nov 4, 2015

Just noticed the whitespace scenario is mentioned at <https://tools.ietf.org/html/rfc7230#section-3.2.4\>:

'''
No whitespace is allowed between the header field-name and colon. In the past, differences in the handling of such whitespace have led to security vulnerabilities in . . . response handling. . . . A proxy must remove any such whitespace from a response message before forwarding the message downstream.
'''

It would not be possible build a proxy that does that using Python 3’s current HTTP client.

@ghost
Copy link
Author

ghost commented Nov 4, 2015

Yeah, this is server's fault and python does not have to deal with non-standard cases. I'll close this issue. Thanks!

Repository owner closed this as completed Nov 4, 2015
@bitdancer
Copy link
Member

Support for handling such headers could be added to the new email API (ie: add a policy setting to accept them), if someone wants to make a feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants