New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use post to upload a file with Chinese characters in its name. #2313
Comments
Are you sure? $ echo "file file file.\n" >> 漢字.o8d
$ ls
漢字.o8d >>> import requests
>>> r = requests.post('http://httpbin.org/post', files={'file': open(u'漢字.o8d', 'r')})
>>> print r.content
{
"args": {},
"data": "",
"files": {},
"form": {
"file": "file file file.\n"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connect-Time": "2",
"Connection": "close",
"Content-Length": "180",
"Content-Type": "multipart/form-data; boundary=3491ae0e5b6d465aaebb7bd63c9c750c",
"Host": "httpbin.org",
"Total-Route-Time": "0",
"User-Agent": "python-requests/2.4.0 CPython/2.7.8 Darwin/14.0.0",
"Via": "1.1 vegur",
"X-Request-Id": "f05915c9-279e-4187-8425-f0b06fc64ea2"
},
"json": null,
"origin": "77.99.146.203",
"url": "http://httpbin.org/post"
} Seems like httpbin doesn't have a problem. Can you confirm what version of requests you're using? |
Oh hang on. Interestingly, httpbin sees it as a form field, not a file object. Hmm. |
Oh, yes, I remember now. POSTing files with unicode filenames is awkward, because you didn't say what text encoding you want us to use. There's a spec for this, which we implement, but relatively few others do it and many servers don't understand it. My suggested workaround would be to set the filename yourself using whatever encoding you choose. Unfortunately, that doesn't work:
The problem here seems to be this line. This unconditional call to encode will actually cause an implicit call to |
Django now supports this and was appreciative of the bug report. The fact that httpbin doesn't parse this correctly is a flask/werkzeug bug I think. |
Just discovered that 漢字 is Japanese Kanji and means "Chinese Characters". Enjoyed that, but the bug still stands. For now I'm able to automate testing of such filenames with Selenium, but it'd be nice to do it with requests too. |
Except it's a bug in the server you're trying to upload to for not supporting a 10 year old RFC |
Is there any other workaround different than changing the file name or changing the server backend? |
I think someone percent-encoded the file name because whatever server they were communicating with understood that. That's behaviour that is not defined anywhere though so it depends on the server your using doing something incredibly bad and horribly wrong. |
And @kampde thanks searching for prior issues and for not opening a new issue. |
The aforementioned RFC is RFC 5987, right? |
I don't believe so. No. That's for HTTP Headers, not for mime-headers |
Looks like RFC 2231 then. |
@kampde after a quick skim, that is the correct RFC. As you can see it is 18 years old. |
I think in https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/fields.py#L37
Modify to "result.encode('utf8')" will be better ,because most server can handle with utf8, but many of them do not support the style of "email.utils.encode_rfc2231(value, 'utf-8')" |
@zhangchunlin What does 'most servers' mean? Which servers? Which versions of those servers? Why don't they implement RFC 2231? |
@zhangchunlin if those servers do not implement a standard that is 18 years old, I fail to see why we should be forced to violate the standard. |
@Lukasa OK, I didn't test so much, my statement maybe wrong. @sigmavirus24 I will try to make clear and submit issue to those server if needed. |
It seems PHP is also affected by this, if you try to upload a file to a server running PHP, with the name 'fårikål.txt', it will throw a warning: "PHP Warning: File Upload Mime headers garbled in Unknown on line 0". This is PHP 5.6.14. |
@WishCow I'm not certain what result you expect to see if you're filing a PHP bug against another project. It seems frameworks in Perl, Ruby, and Python all appropriately support RFC 2231. If PHP 5.6.14 doesn't support an 18 year old standard, you should file a bug with PHP. |
Just leaving a note here, in case other people encounter this issue, it took me a long time to find the cause. |
@WishCow you'll probably have a better time putting together some minimal bit of PHP code and filing a bug with PHP. This comment will help others, but filing a bug to get this fixed in PHP would help a lot more people. |
Actually I was about to do that, and I whipped up a quick example of the upload with curl, but that seems to work. Now I'm confused, is there another RFC that describes how filenames should be handled, that curl (and PHP) might be implementing? So this:
Does produce the correct output from the handling PHP script. |
Run netcat locally and send the curl request to that. Curl might be violating the RFC because support for the spec has lagged behind. |
The command
Results in the netcat output:
So curl indeed does not seem to use the |
Yeah, so you can use |
You could also write some PHP that uses RFC 2231. |
The SO post describes how to send files with the correct encoding, but I need to receive files, for which there doesn't seem to be a way, since the $_FILES superglobal gets populated before the userland script runs. Thanks for the help though, in case someone else wants to track this in PHP: https://bugs.php.net/bug.php?id=70794 |
@WishCow right, that's what I meant (instead of using curl use PHP). |
So I ran into this issue with a PHP server running Zend 1 and the solution that I came up with was to import urllib and then encode the filename like so |
That fix proved to introduce new problems because it changed the filenames in weird ways. I'm instead working on getting this PR in urllib3 to use HTML5 encoding vs. rfc2231 by default reopened. Hopefully this will allow this problem to be fixed for requests as well. I managed to rewrite the request I was using with my patched version of urllib3 based upon the currently closed PR and it worked. |
This code:
requests.post(url, files={"file": open(u"漢字.o8d", "r")})
will return a 200, but the file is never uploaded.
I can upload that file by posting in the browser so this doesn't seem to be a server-side issue. Also, if I change the name of the file to "bob" or something ASCII it works perfectly.
The text was updated successfully, but these errors were encountered: