Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py34 test failure: KeyError: 'Authorization' error in TestSession.test_session_unicode #282

Closed
msabramo opened this issue Dec 1, 2014 · 7 comments

Comments

@msabramo
Copy link
Contributor

msabramo commented Dec 1, 2014

py34 test failure: KeyError: 'Authorization' error in TestSession.test_session_unicode

I can reproduce the test_session_unicode failure consistently by explicitly passing a --hashseed to tox:

❯ tox -e py34 --hashseed=1811760512 -- tests/test_sessions.py -k test_session_unicode
GLOB sdist-make: /Users/marca/dev/git-repos/httpie/setup.py
py34 inst-nodeps: /Users/marca/dev/git-repos/httpie/.tox/dist/httpie-0.9.0-dev.zip
py34 runtests: PYTHONHASHSEED='1811760512'
py34 runtests: commands[0] | py.test --verbose --doctest-modules --basetemp=/Users/marca/dev/git-repos/httpie/.tox/py34/tmp tests/test_sessions.py -k test_session_unicode
============================================================================= test session starts ==============================================================================
platform darwin -- Python 3.4.0 -- py-1.4.26 -- pytest-2.6.4 -- /Users/marca/dev/git-repos/httpie/.tox/py34/bin/python3.4
plugins: httpbin
collected 6 items

tests/test_sessions.py::TestSession::test_session_unicode FAILED

=================================================================================== FAILURES ===================================================================================
_______________________________________________________________________ TestSession.test_session_unicode _______________________________________________________________________
Traceback (most recent call last):
  File "/Users/marca/dev/git-repos/httpie/tests/test_sessions.py", line 151, in test_session_unicode
    assert (r2.json['headers']['Authorization']
KeyError: 'Authorization'
----------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------
127.0.0.1 - - [29/Nov/2014 11:40:28] "GET /get HTTP/1.1" 200 301
127.0.0.1 - - [29/Nov/2014 11:40:28] "GET /get HTTP/1.1" 200 301
================================================================ 5 tests deselected by '-ktest_session_unicode' ================================================================
============================================================== 1 failed, 5 deselected, 1 warnings in 0.67 seconds ==============================================================
ERROR: InvocationError: '/Users/marca/dev/git-repos/httpie/.tox/py34/bin/py.test --verbose --doctest-modules --basetemp=/Users/marca/dev/git-repos/httpie/.tox/py34/tmp tests/test_sessions.py -k test_session_unicode'
___________________________________________________________________________________ summary ____________________________________________________________________________________
ERROR:   py34: commands failed

From investigation in #278, I've determined that this happens because Python 3.4's HTTP header parsing chokes on the Test header. I think that this is because the Test header contains UTF-8 data, which is not properly encoded.

> /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/wsgiref/simple_server.py(104)get_environ()
-> for k, v in self.headers.items():
(Pdb) str(self.headers)
'Host: 127.0.0.1:61463\nUser-Agent: HTTPie/0.9.0-dev\nAccept: */*\nTest: =?utf-8?b?W29uZSBsaW5lIG9mIFVURjgtZW5jb2RlZCB1bmljb2RlIHRleHRdIMOPwofDj8KBw48=?=\n\nÏ\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ\x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85\nÙ\x80Ù\x8eبÙ\x86Ù\x90\nAccept-Encoding: gzip, deflate\nConnection: keep-alive\nAuthorization: Basic dGVzdDpbb25lIGxpbmUgb2YgVVRGOC1lbmNvZGVkIHVuaWNvZGUgdGV4dF0gz4fPgc+Fz4POsc+G4b22IOWkqumZvSDguYDguKXguLTguKgg4pmc4pme4pmd4pmb4pma4pmd4pme4pmcINC+0LbQuNCy0LvRkdC90L3Ri9C8IOCkpOCkvuCkqOCljeCkr+CkueCkvuCkqOCkvyDmnInmnIsg4K644K+N4K6x4K+A4K6p4K6/4K614K6+4K64INmx2YTYsdmO2ZHYrdmS2YXZgNmO2KjZhtmQ\n\n'
(Pdb) self.headers.items()
[('Host', '127.0.0.1:61463'),
 ('User-Agent', 'HTTPie/0.9.0-dev'),
 ('Accept', '*/*'),
 ('Test', '[one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85')]

Note that you can see the Authorization header in the output of str(self.headers), but it's not showing up in self.headers.items(). And the Test header is severely truncated.

I am suspicious of the Test header:

('Test', '[one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85')]

That Test header is the last one that shows up in self.headers.items(); no header that occurs after it appears -- e.g.: Accept-Encoding, Connection, Authorization

Also the the value is very short so I suspect that parsing is failing midway through and messing up the processing of all subsequent headers.

There's even a "defect" recorded. The email parser mentions in its comments that it doesn't throw exceptions, it records defects instead.

> /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/wsgiref/simple_server.py(104)get_environ()
-> for k, v in self.headers.items():
(Pdb) self.headers
<http.client.HTTPMessage object at 0x106612668>
(Pdb) self.headers.defects
[MissingHeaderBodySeparatorDefect()]

The root cause seems to be that the code in email/feedparser.py chokes on the unicode headers. And the reason why it happens only sometimes is because Python dict, where the request headers are stored, is unordered. So, if the Authorization header comes after Test when it's being serialized (such as when you pass --hashseed=1811760512), it doesn't get parsed correctly at the server side and is therefore missing from httpbin's response.

@msabramo
Copy link
Contributor Author

msabramo commented Dec 1, 2014

Here is what gets received, just before parsing:

> /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py(272)parse_headers()-><http.client....t 0x105e0f6a0>
-> return email.parser.Parser(_class=_class).parsestr(hstring)
(Pdb) hstring
'Host: 127.0.0.1:63531\r\nUser-Agent: HTTPie/0.9.0-dev\r\nAccept: */*\r\nTest: [one line of 
UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85Ï\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ
\x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ
\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯
\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85Ù\x80Ù\x8eبÙ\x86Ù
\x90\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nAuthorization: Basic
 dGVzdDpbb25lIGxpbmUgb2YgVVRGOC1lbmNvZGVkIHVuaWNvZGUgdGV4dF0gz4fPgc+Fz4POsc+
G4b22IOWkqumZvSDguYDguKXguLTguKgg4pmc4pme4pmd4pmb4pma4pmd4pme4pmcINC+0LbQu
NCy0LvRkdC90L3Ri9C8IOCkpOCkvuCkqOCljeCkr+CkueCkvuCkqOCkvyDmnInmnIsg4K644K+N4K6x
4K+A4K6p4K6/4K614K6+4K64INmx2YTYsdmO2ZHYrdmS2YXZgNmO2KjZhtmQ\r\n\r\n'
(Pdb) hstring.encode('unicode_escape')
b'Host: 127.0.0.1:63531\\r\\nUser-Agent: HTTPie/0.9.0-dev\\r\\nAccept: */*\\r\\nTest: [one line of 
UTF8-encoded unicode text] \\xcf\\x87\\xcf\\x81\\xcf\\x85\\xcf\\x83\\xce\\xb1\\xcf\\x86\\xe1\\xbd\\xb6
 \\xe5\\xa4\\xaa\\xe9\\x99\\xbd \\xe0\\xb9\\x80\\xe0\\xb8\\xa5\\xe0\\xb8\\xb4\\xe0\\xb8\\xa8 
\\xe2\\x99\\x9c\\xe2\\x99\\x9e\\xe2\\x99\\x9d\\xe2\\x99\\x9b\\xe2\\x99\\x9a\\xe2\\x99\\x9d\\xe2\\x99
\\x9e\\xe2\\x99\\x9c \\xd0\\xbe\\xd0\\xb6\\xd0\\xb8\\xd0\\xb2\\xd0\\xbb\\xd1\\x91\\xd0\\xbd\\xd0\\xbd\\xd1\\x8b\\xd0\\xbc \\xe0\\xa4\\xa4\\xe0\\xa4\\xbe\\xe0\\xa4\\xa8\\xe0\\xa5\\x8d\\xe0\\xa4\\xaf\\xe0\\xa4\\xb9\\xe0\\xa4\\xbe\\xe0\\xa4\\xa8\\xe0\\xa4\\xbf \\xe6\\x9c\\x89\\xe6\\x9c\\x8b \\xe0\\xae\\xb8\\xe0\\xaf\\x8d\\xe0\\xae\\xb1\\xe0\\xaf\\x80\\xe0\\xae\\xa9\\xe0\\xae\\xbf\\xe0\\xae\\xb5\\xe0\\xae\\xbe\\xe0\\xae\\xb8 \\xd9\\xb1\\xd9\\x84\\xd8\\xb1\\xd9\\x8e\\xd9\\x91\\xd8\\xad\\xd9\\x92\\xd9\\x85\\xd9\\x80\\xd9\\x8e\\xd8\\xa8\\xd9\\x86\\xd9\\x90\\r\\nAccept-Encoding: gzip, deflate\\r\\nConnection: keep-alive\\r\\nAuthorization: Basic dGVzdDpbb25lIGxpbmUgb2YgVVRGOC1lbmNvZGVkIHVuaWNvZGUgdGV4dF0gz4fPgc+Fz4POsc+G
4b22IOWkqumZvSDguYDguKXguLTguKgg4pmc4pme4pmd4pmb4pma4pmd4pme4pmcINC+0LbQuN
Cy0LvRkdC90L3Ri9C8IOCkpOCkvuCkqOCljeCkr+CkueCkvuCkqOCkvyDmnInmnIsg4K644K+N4K6x4
K+A4K6p4K6/4K614K6+4K64INmx2YTYsdmO2ZHYrdmS2YXZgNmO2KjZhtmQ\\r\\n\\r\\n'

From a glance it doesn't look like it's RFC 2047. It looks like it's straight UTF-8:

In [25]: b'Test: [one line of UTF8-encoded unicode text] \xcf\x87\xcf\x81\xcf\x85\xcf\x83\xce\xb1\xcf\x86\xe1\xbd\xb6 \xe5\xa4\xaa\xe9\x99\xbd \xe0\xb9\x80\xe0\xb8\xa5\xe0\xb8\xb4\xe0\xb8\xa8 \xe2\x99\x9c\xe2\x99\x9e\xe2\x99\x9d\xe2\x99\x9b\xe2\x99\x9a\xe2\x99\x9d\xe2\x99\x9e\xe2\x99\x9c \xd0\xbe\xd0\xb6\xd0\xb8\xd0\xb2\xd0\xbb\xd1\x91\xd0\xbd\xd0\xbd\xd1\x8b\xd0\xbc \xe0\xa4\xa4\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa5\x8d\xe0\xa4\xaf\xe0\xa4\xb9\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xbf \xe6\x9c\x89\xe6\x9c\x8b \xe0\xae\xb8\xe0\xaf\x8d\xe0\xae\xb1\xe0\xaf\x80\xe0\xae\xa9\xe0\xae\xbf\xe0\xae\xb5\xe0\xae\xbe\xe0\xae\xb8 \xd9\xb1\xd9\x84\xd8\xb1\xd9\x8e\xd9\x91\xd8\xad\xd9\x92\xd9\x85\xd9\x80\xd9\x8e\xd8\xa8\xd9\x86\xd9\x90'.decode('utf-8')
Out[25]: 'Test: [one line of UTF8-encoded unicode text] χρυσαφὶ 太陽 เลิศ ♜♞♝♛♚♝♞♜ оживлённым तान्यहानि 有朋 ஸ்றீனிவாஸ ٱلرَّحْمـَبنِ'

That seems incorrect.

@msabramo
Copy link
Contributor Author

msabramo commented Dec 1, 2014

Reproducing the core problem very simply in an IPython session:

In [44]: import email.parser, http.client

In [45]: hstring = 'Host: 127.0.0.1:63531\r\nUser-Agent: HTTPie/0.9.0-dev\r\nAccept: */*\r\nTest: [one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85Ï\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ \x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85Ù\x80Ù\x8eبÙ\x86Ù\x90\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nAuthorization: Basic dGVzdDpbb25lIGxpbmUgb2YgVVRGOC1lbmNvZGVkIHVuaWNvZGUgdGV4dF0gz4fPgc+Fz4POsc+G4b22IOWkqumZvSDguYDguKXguLTguKgg4pmc4pme4pmd4pmb4pma4pmd4pme4pmcINC+0LbQuNCy0LvRkdC90L3Ri9C8IOCkpOCkvuCkqOCljeCkr+CkueCkvuCkqOCkvyDmnInmnIsg4K644K+N4K6x4K+A4K6p4K6/4K614K6+4K64INmx2YTYsdmO2ZHYrdmS2YXZgNmO2KjZhtmQ\r\n\r\n'

In [46]: hm = email.parser.Parser(_class=http.client.HTTPMessage).parsestr(hstring)

In [47]: str(hm)
Out[47]: 'Host: 127.0.0.1:63531\nUser-Agent: HTTPie/0.9.0-dev\nAccept: */*\nTest: =?utf-8?b?W29uZSBsaW5lIG9mIFVURjgtZW5jb2RlZCB1bmljb2RlIHRleHRdIMOPwofDj8KBw48=?=\n\nÏ\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ \x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85\nÙ\x80Ù\x8eبÙ\x86Ù\x90\nAccept-Encoding: gzip, deflate\nConnection: keep-alive\nAuthorization: Basic dGVzdDpbb25lIGxpbmUgb2YgVVRGOC1lbmNvZGVkIHVuaWNvZGUgdGV4dF0gz4fPgc+Fz4POsc+G4b22IOWkqumZvSDguYDguKXguLTguKgg4pmc4pme4pmd4pmb4pma4pmd4pme4pmcINC+0LbQuNCy0LvRkdC90L3Ri9C8IOCkpOCkvuCkqOCljeCkr+CkueCkvuCkqOCkvyDmnInmnIsg4K644K+N4K6x4K+A4K6p4K6/4K614K6+4K64INmx2YTYsdmO2ZHYrdmS2YXZgNmO2KjZhtmQ\n\n'

In [48]: hm.items()
Out[48]:
[('Host', '127.0.0.1:63531'),
 ('User-Agent', 'HTTPie/0.9.0-dev'),
 ('Accept', '*/*'),
 ('Test', '[one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85')]

In [49]: hm.defects
Out[49]: [email.errors.MissingHeaderBodySeparatorDefect()]

Perhaps most interesting is that midway through the value of str(hm), in the middle of the value for the Test header, there is a double newline -- \n\n. I could imagine this could cause the parser to choke.

In [82]: str(hm)[146:151]
Out[82]: '=?=\n\n'

@msabramo
Copy link
Contributor Author

msabramo commented Dec 1, 2014

Strangely, if I manually construct the header, things seem to work better:

In [63]: hm2 = http.client.HTTPMessage()

In [64]: hm2.add_header('Test', '[one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85Ï\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ \x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85Ù\x80Ù\x8eبÙ\x86Ù\x90')

In [65]: str(hm2)
Out[65]: 'Test: =?utf-8?b?W29uZSBsaW5lIG9mIFVURjgtZW5jb2RlZCB1bmljb2RlIHRleHRdIMOPwofDj8KBw48=?=\n =?utf-8?b?IMOPwoPDjsKxw4/ChsOhwr3CtiDDpcKkwqrDqcKZwr0gw6DCucKAw6DCuMKlw6DCuMK0w6DCuMKoIMOiwpnCnMOiIMKZwp7DosKZwp3DosKZwpvDosKZwprDosKZwp3DosKZwp7DosKZwpwgw5DCvsOQwrbDkMK4w5DCssOQwrvDkcKRw5DCvcOQwr3DkcKLw5DCvCDDoMKkwqTDoMKkwr7DoMKkwqjDoMKlwo3DoMKkwq/DoMKkwrnDoMKkwr7DoMKkwqjDoMKkwr8gw6bCnMKJw6bCnMKLIMOgwq7CuMOgwq/CjcOgwq7CscOgwq/CgMOgwq7CqcOgwq7Cv8Ogwq7CtcOgwq7CvsOgwq7CuCDDmcKxw5nChMOYwrHDmcKOw5nCkcOYwq3DmcKSw5k=?=\n =?utf-8?b?IMOZwoDDmcKOw5jCqMOZwobDmcKQ?=\n\n'

In [66]: hm2.items()
Out[66]:
[('Test',
  '[one line of UTF8-encoded unicode text] Ï\x87Ï\x81Ï\x85Ï\x83αÏ\x86ὶ 太é\x99½ à¹\x80ลิศ â\x99\x9câ \x99\x9eâ\x99\x9dâ\x99\x9bâ\x99\x9aâ\x99\x9dâ\x99\x9eâ\x99\x9c оживлÑ\x91ннÑ\x8bм तानà¥\x8dयहानि æ\x9c\x89æ\x9c\x8b ஸà¯\x8dà®±à¯\x80னிவாஸ Ù±Ù\x84رÙ\x8eÙ\x91Ø\xadÙ\x92Ù\x85Ù\x80Ù\x8eبÙ\x86Ù\x90')]

In [67]: hm2.defects
Out[67]: []

Note how in this case, str(hm2) ends up having two chunks of RFC 2047 text, denoted by =?utf-8?, whereas the previous example had only one (previous example seems to have \n\n in that place, which seems like it could totally confuse the parser...). End result is that hm2.items() returns a much longer value for the Test header.

It is curious that I was able to call add_header and have things work, but somehow this is not working in the original code path.

@msabramo
Copy link
Contributor Author

msabramo commented Dec 1, 2014

The httpie.client.encode_headers function is currently encoding to utf-8. From my understanding of the RFC, this doesn't seem right? Perhaps we should be using the RFC 2047 style encoding that the email.header module implements?

See: #281 -- tests are failing though.

I cc'd flufl @warsaw, because he has his name on a lot of the stdlib code for email and HTTP header parsing.

@msabramo
Copy link
Contributor Author

msabramo commented Dec 3, 2014

I think I'm going to take a break from this issue for a while, so anyone else who wants to dive in, feel free.

@msabramo
Copy link
Contributor Author

msabramo commented Feb 4, 2015

Anyone have any ideas on how to tackle this?

@jkbrzt
Copy link
Member

jkbrzt commented Feb 5, 2015

@msabramo it looks like the right way to go about this would be to switch the approach you tried in #281. (Btw, #212 provides some more context.)

msabramo added a commit to msabramo/httpie that referenced this issue Feb 10, 2015
There are known problems with unicode in headers.
See httpie#282
msabramo added a commit to msabramo/httpie that referenced this issue Feb 10, 2015
There are known problems with unicode in headers.
See httpie#282
msabramo added a commit to msabramo/httpie that referenced this issue Feb 10, 2015
There are known problems with unicode in headers.
See httpie#282
msabramo added a commit to msabramo/httpie that referenced this issue Feb 10, 2015
There are known problems with unicode in headers.
See httpie#282
@jkbrzt jkbrzt closed this as completed Jul 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants