-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 characters unable to handle in headers. #4187
Comments
On Python 3 the header decoding is actually done by the Python standard library. This decodes headers using Latin-1, so you can resolve this issue by doing Sadly, there is no guaranteed header encoding for headers, so this approach (while silly) works pretty well. |
I did tried that. But the filename is corrupted:
The complete filename should be
curl -I -L -v gets the header without problem. |
Hrm. Why are you setting |
Yes.
|
Ok, that strongly suggests that the data is not |
I think it is UTF-8 encoded, however was corrupted after decoding as latin-1.
The link could be not available after a short of time.
|
Decoding as latin-1 cannot corrupt binary data: latin-1 is a character map encoding, which means that it has a one-to-one mapping of bytes to unicode code points. The corruption appears to be happening lower down the stack, or more likely in the redirect. Can you use Wireshark to capture the HTTP traffic that Requests is sending? |
But why curl is doing fine?
it is utf-8 encoded:
|
And that tcpdump result is from python-requests, not curl. |
Somehow, requests striped last 32 bytes on the value of 'Content-Disposition'. |
Yup, that appears to be the problem here. Requests isn't doing custom header parsing here: it's done by I recommend trying to use |
I tried http.client. It didn't reproduce the error.
|
And urllib3:
|
Sorry, I screwed it up with a regex matching on latin1 regex with unicode. Nvm.. |
requests is unable to handle utf-8 characters in headers.
http://assrt.net/download/217234/%E6%A4%8D%E7%89%A9%E7%8E%8B%E5%9B%BD%E7%AC%AC%E4%BA%8C%E9%9B%86%E8%93%9D%E5%85%89%E7%89%88%E4%B8%AD%E8%8B%B1%E6%96%87%E5%8F%8A%E5%8F%8C%E8%AF%AD%E5%AD%97%E5%B9%95.rar
Expected Result
Content-Disposition should be subtitle; filename="植物王国第三集蓝光版中英文及双语字幕.rar"
Actual Result
{'Cache-Control': 'max-age=2678400', 'X-Cache': 'MISS', 'Expires': 'Mon, 07 Aug 2017 08:16:04 GMT', 'Server': 'openresty', 'Content-Disposition': 'subtitle; filename="æ¤\x8dç\x89©ç\x8e\x8bå\x9b½ç¬¬äº\x8cé\x9b\x86è\x93\x9då\x85', 'Connection': 'keep-alive', 'Date': 'Fri, 07 Jul 2017 08:16:04 GMT', 'Content-Length': '62859', 'Servant': 'Berserker', 'ETag': '"56f296fd-f58b"', 'Content-Type': 'application/octet-stream', 'Last-Modified': 'Wed, 23 Mar 2016 13:15:41 GMT'}
Content-Disposition is corrupted with last characters:
Reproduction Steps
System Information
The text was updated successfully, but these errors were encountered: