Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posting a mutlipart-encoded file with non-ASCII characters in filename doesn't work #3446

Closed
ghost opened this issue Jul 27, 2016 · 14 comments
Closed

Comments

@ghost
Copy link

ghost commented Jul 27, 2016

The following code doesn't actually upload any data to the server:

r = requests.post('https://gs.smuglo.li/api/statusnet/media/upload',
    auth=('testbot', 'testbot'),
    files={'media': open('/tmp/Снимок экрана_2016-07-27_05-15-38.png', 'rb')})
@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

What makes you think that doesn't upload data to the server?

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

Put another way: what is r.request.body?

@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa

What makes you think that doesn't upload data to the server?

The fact that the server returns an XML with following content:

<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="fail">
 <err msg="There is no uploaded media for input field &quot;media&quot;."></err>
</rsp>

Put another way: what is r.request.body?

>>> r.request.body
b'--254dc93f44a24498bef41502bb23d76f\r\nContent-Disposition: form-data; name="media"; filename*=utf-8\'\'%D0%A1%D0%BD%D0%B8%D0%BC%D0%BE%D0%BA%20%D1%8D%D0%BA%D1%80%D0%B0%D0%BD%D0%B0_2016-07-2
7_05-15-38.png\r\n\r\n\x89PNG...

And a lot of hex values after that.

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

Yup, so that's why I asked about r.request.body. We are uploading data: you can see it in r.request.body. What's happening is that the server isn't reading it. This is almost certainly because the server doesn't support RFC 2231. See #2313.

You can probably fix this by using the extended syntax for file uploads with an appropriately created byte string:

files = {'media': (u'Снимок экрана_2016-07-27_05-15-38.png'.encode('utf-8'), open('/tmp/Снимок экрана_2016-07-27_05-15-38.png', 'rb'))}

@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa Is this Python 2? Because in Python 3 I get

TypeError: a bytes-like object is required, not 'str'

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

You get that where? Encode should be forcing to bytes-like object. Can I see the proper traceback?

@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa Sure

>>> filename = '/tmp/Снимок экрана_2016-07-27_05-15-38.png' 
>>> media = {'media':(filename.encode('utf-8'), open(filename, 'rb'))}
>>> r = requests.post(url, auth=('testbot', 'testbot'), files=media)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    r = requests.post(url, auth=(username, password), files=media)
  File "/usr/lib/python3.5/site-packages/requests/api.py", line 111, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3.5/site-packages/requests/api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.5/site-packages/requests/sessions.py", line 461, in request
    prep = self.prepare_request(req)
  File "/usr/lib/python3.5/site-packages/requests/sessions.py", line 394, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 298, in prepare
    self.prepare_body(data, files, json)
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 449, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 155, in _encode_files
    rf.make_multipart(content_type=ft)
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 174, in make_multipart
    (('name', self._name), ('filename', self._filename))
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 134, in _render_parts
    parts.append(self._render_part(name, value))
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 114, in _render_part
    return format_header_param(name, value)
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 35, in format_header_param
    if not any(ch in value for ch in '"\\\r\n'):
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 35, in <genexpr>
    if not any(ch in value for ch in '"\\\r\n'):
TypeError: a bytes-like object is required, not 'str'

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

Ah, I see what's happening there.

So this starts to get really unpleasant. It seems like urllib3 gets mad when using a bytestring in this place on Python 3. Out of interest, try dropping the .encode from the filename?

Either way, the problem seems to be the use of RFC 2231 in this place. urllib3 is looking to make RFC 2231 encoding optional, so this problem should be resolvable in a future release.

@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa It gets back to the original error.

<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="fail">
 <err msg="There is no uploaded media for input field &quot;media&quot;."></err>
</rsp>

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

Yeah, so like I said, this is an RFC 2231 concern at this point. This represents a urllib3 problem, but it's one we've got an open PR for solving it: urllib3/urllib3#856.

@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa That's good to hear. I have a solution for this, since the filesize can't be more than 20MB for that server, I just do

>>> media = {'media': open(filename, 'rb').read()}
>>> r = requests.post(url, auth=('testbot', 'testbot'), files=media)
>>> print(r.text)
<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="ok" xmlns:atom="http://www.w3.org/2005/Atom">
 <mediaid>84983</mediaid>
 <mediaurl>https://gs.smuglo.li/attachment/84983</mediaurl>
 <media_url>https://gs.smuglo.li/attachment/84983</media_url>
 <size>23815</size>
 <atom:link rel="enclosure" href="https://gs.smuglo.li/file/e1035ccd7c31b07edad49251a8ff2bd6bce96fbda7a2585cd113b137df187d8c.png" type="image/png"></atom:link>
 <media_id>84983</media_id>
 <media_id_string>84983</media_id_string>
 <image w="766" h="317" image_type="image/png"></image>
</rsp>

@Lukasa
Copy link
Member

Lukasa commented Jul 27, 2016

That's good! In that case, let's close this in favour of the open issues.

@Lukasa Lukasa closed this as completed Jul 27, 2016
@ghost
Copy link
Author

ghost commented Jul 27, 2016

@Lukasa Be sure to hit me up when that RFC is properly implemented by urllib3, an I will remove that filthy hack I use.

@eamirgh
Copy link

eamirgh commented Jan 17, 2019

any solutions to python3? i have the same problem here:

db = open('db.txt').read().splitlines() #db is written in utf-8
i = 0
for line in db: # line is './images/APPLE-IPHONE_7_PLUS-SILICON_BLACK.jpg'
    files = {'image': (line, open(line, 'rb')) }
    r = requests.post(url, files=files, auth=(username, password))
    if r.text != 'done!':
        print(r.request.body)
        errs.write((str(i) + ' ' + line + ' failed!' + '\n'))
    else:
        print(i, line)
    i = i + 1

i have problem when non ascii codes like 'Ş' occurs
and print(r.request.body) becomes None

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants