Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use post to upload a file with Chinese characters in its name. #2313

Closed
sbarba opened this issue Oct 29, 2014 · 30 comments
Closed

Can't use post to upload a file with Chinese characters in its name. #2313

sbarba opened this issue Oct 29, 2014 · 30 comments

Comments

@sbarba
Copy link

sbarba commented Oct 29, 2014

This code:

requests.post(url, files={"file": open(u"漢字.o8d", "r")})

will return a 200, but the file is never uploaded.

I can upload that file by posting in the browser so this doesn't seem to be a server-side issue. Also, if I change the name of the file to "bob" or something ASCII it works perfectly.

@Lukasa
Copy link
Member

Lukasa commented Oct 29, 2014

Are you sure?

$ echo "file file file.\n" >> 漢字.o8d
$ ls
漢字.o8d
>>> import requests
>>> r = requests.post('http://httpbin.org/post', files={'file': open(u'漢字.o8d', 'r')})
>>> print r.content
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "file": "file file file.\n"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connect-Time": "2", 
    "Connection": "close", 
    "Content-Length": "180", 
    "Content-Type": "multipart/form-data; boundary=3491ae0e5b6d465aaebb7bd63c9c750c", 
    "Host": "httpbin.org", 
    "Total-Route-Time": "0", 
    "User-Agent": "python-requests/2.4.0 CPython/2.7.8 Darwin/14.0.0", 
    "Via": "1.1 vegur", 
    "X-Request-Id": "f05915c9-279e-4187-8425-f0b06fc64ea2"
  }, 
  "json": null, 
  "origin": "77.99.146.203", 
  "url": "http://httpbin.org/post"
}

Seems like httpbin doesn't have a problem. Can you confirm what version of requests you're using?

@Lukasa
Copy link
Member

Lukasa commented Oct 29, 2014

Oh hang on. Interestingly, httpbin sees it as a form field, not a file object. Hmm.

@Lukasa
Copy link
Member

Lukasa commented Oct 29, 2014

Oh, yes, I remember now.

POSTing files with unicode filenames is awkward, because you didn't say what text encoding you want us to use. There's a spec for this, which we implement, but relatively few others do it and many servers don't understand it.

My suggested workaround would be to set the filename yourself using whatever encoding you choose. Unfortunately, that doesn't work:

Traceback (most recent call last):
  File "testy.py", line 4, in <module>
    r = requests.post('http://httpbin.org/post', files={'file': (u'漢字.o8d'.encode('utf-8'), open(u'漢字.o8d', 'r'))})
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 434, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 372, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 299, in prepare
    self.prepare_body(data, files)
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 434, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 151, in _encode_files
    rf.make_multipart(content_type=ft)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 173, in make_multipart
    (('name', self._name), ('filename', self._filename))
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 133, in _render_parts
    parts.append(self._render_part(name, value))
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 113, in _render_part
    return format_header_param(name, value)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 37, in format_header_param
    result.encode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 10: ordinal not in range(128)

The problem here seems to be this line. This unconditional call to encode will actually cause an implicit call to str.decode on Python 2, which breaks for non-ascii characters. @shazow, you prepared to consider that a bug?

@sigmavirus24
Copy link
Contributor

Django now supports this and was appreciative of the bug report. The fact that httpbin doesn't parse this correctly is a flask/werkzeug bug I think.

@sbarba
Copy link
Author

sbarba commented Nov 4, 2014

Just discovered that 漢字 is Japanese Kanji and means "Chinese Characters". Enjoyed that, but the bug still stands. For now I'm able to automate testing of such filenames with Selenium, but it'd be nice to do it with requests too.

@sigmavirus24
Copy link
Contributor

Except it's a bug in the server you're trying to upload to for not supporting a 10 year old RFC

@kampde
Copy link

kampde commented Jan 12, 2015

Is there any other workaround different than changing the file name or changing the server backend?

@sigmavirus24
Copy link
Contributor

I think someone percent-encoded the file name because whatever server they were communicating with understood that. That's behaviour that is not defined anywhere though so it depends on the server your using doing something incredibly bad and horribly wrong.

@sigmavirus24
Copy link
Contributor

And @kampde thanks searching for prior issues and for not opening a new issue.

@kampde
Copy link

kampde commented Jan 12, 2015

The aforementioned RFC is RFC 5987, right?

@sigmavirus24
Copy link
Contributor

I don't believe so. No. That's for HTTP Headers, not for mime-headers

@kampde
Copy link

kampde commented Jan 13, 2015

Looks like RFC 2231 then.

@sigmavirus24
Copy link
Contributor

@kampde after a quick skim, that is the correct RFC. As you can see it is 18 years old.

@Lukasa Lukasa closed this as completed May 31, 2015
@zhangchunlin
Copy link

I think in https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/fields.py#L37

        try:
            result.encode('ascii')
        except UnicodeEncodeError:
            pass
        else:
            return result

Modify to "result.encode('utf8')" will be better ,because most server can handle with utf8, but many of them do not support the style of "email.utils.encode_rfc2231(value, 'utf-8')"

@Lukasa
Copy link
Member

Lukasa commented Jul 10, 2015

@zhangchunlin What does 'most servers' mean? Which servers? Which versions of those servers? Why don't they implement RFC 2231?

@sigmavirus24
Copy link
Contributor

@zhangchunlin if those servers do not implement a standard that is 18 years old, I fail to see why we should be forced to violate the standard.

@zhangchunlin
Copy link

@Lukasa OK, I didn't test so much, my statement maybe wrong.
I just found that the behavior of requests wasn't same as browser(for example chrome), what I thought is that the method chrome using is workable.

@sigmavirus24 I will try to make clear and submit issue to those server if needed.

@WishCow
Copy link

WishCow commented Oct 26, 2015

It seems PHP is also affected by this, if you try to upload a file to a server running PHP, with the name 'fårikål.txt', it will throw a warning: "PHP Warning: File Upload Mime headers garbled in Unknown on line 0".

This is PHP 5.6.14.

@sigmavirus24
Copy link
Contributor

@WishCow I'm not certain what result you expect to see if you're filing a PHP bug against another project. It seems frameworks in Perl, Ruby, and Python all appropriately support RFC 2231. If PHP 5.6.14 doesn't support an 18 year old standard, you should file a bug with PHP.

@WishCow
Copy link

WishCow commented Oct 26, 2015

Just leaving a note here, in case other people encounter this issue, it took me a long time to find the cause.

@sigmavirus24
Copy link
Contributor

@WishCow you'll probably have a better time putting together some minimal bit of PHP code and filing a bug with PHP. This comment will help others, but filing a bug to get this fixed in PHP would help a lot more people.

@WishCow
Copy link

WishCow commented Oct 26, 2015

Actually I was about to do that, and I whipped up a quick example of the upload with curl, but that seems to work. Now I'm confused, is there another RFC that describes how filenames should be handled, that curl (and PHP) might be implementing?

So this:

curl -v -F får.txt=@/tmp/test.txt http://myserver.local

Does produce the correct output from the handling PHP script.

@sigmavirus24
Copy link
Contributor

Run netcat locally and send the curl request to that.

Curl might be violating the RFC because support for the spec has lagged behind.

@WishCow
Copy link

WishCow commented Oct 26, 2015

The command

curl -F får='@/tmp/test.txt;filename=får.txt' localhost:14511

Results in the netcat output:

POST / HTTP/1.1
Host: localhost:14511
User-Agent: curl/7.45.0
Accept: */*
Content-Length: 198
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------fb94c2e958ada9f0

--------------------------fb94c2e958ada9f0
Content-Disposition: form-data; name="får"; filename="får.txt"
Content-Type: text/plain

hello world

--------------------------fb94c2e958ada9f0--

So curl indeed does not seem to use the *= format that the RFC is describing.

@sigmavirus24
Copy link
Contributor

Yeah, so you can use httpie to produce a cURL like command that will probably trigger this for you.

@sigmavirus24
Copy link
Contributor

You could also write some PHP that uses RFC 2231.

@WishCow
Copy link

WishCow commented Oct 26, 2015

The SO post describes how to send files with the correct encoding, but I need to receive files, for which there doesn't seem to be a way, since the $_FILES superglobal gets populated before the userland script runs.

Thanks for the help though, in case someone else wants to track this in PHP: https://bugs.php.net/bug.php?id=70794

@sigmavirus24
Copy link
Contributor

@WishCow right, that's what I meant (instead of using curl use PHP).

@Robbt
Copy link

Robbt commented Dec 1, 2018

So I ran into this issue with a PHP server running Zend 1 and the solution that I came up with was to import urllib and then encode the filename like so files = {'file': (urllib.pathname2url(event.pathname), 'rb')} and it solved the problem for me. Just adding this in case it might help someone else who runs into this.

@Robbt
Copy link

Robbt commented Dec 2, 2018

That fix proved to introduce new problems because it changed the filenames in weird ways. I'm instead working on getting this PR in urllib3 to use HTML5 encoding vs. rfc2231 by default reopened. Hopefully this will allow this problem to be fixed for requests as well. I managed to rewrite the request I was using with my patched version of urllib3 based upon the currently closed PR and it worked.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants