Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send_file: latin-1 encoding not compatible with gunicorn #2766

Closed
jrast opened this issue May 8, 2018 · 7 comments

Comments

@jrast
Copy link

commented May 8, 2018

With the latest release (flask 1.0.0), unicode attachement filenames are allowed by flask.
I was waiting for this change and it seems to work great using the builtin dev server. However, gunicorn on the production server does not support latin-1 encoding for the headers and only supports ascii.

Originaly flask also opted for ASCII encoding, however commit 336d6a4 changed this and since then latin-1 is used.

What was the rational to switch to latin-1? I know, officially it's allowed, however gunicorn decided to only support ASCII: (see related issue)

it's well documented around the web that HTTP headers should be ASCII

Is there a change that flask also switches back to ASCII?

Related Issues:

@davidism

This comment has been minimized.

Copy link
Member

commented May 8, 2018

Headers are Latin-1. Gunicorn only allowing ASCII is incorrect.

@jrast

This comment has been minimized.

Copy link
Author

commented May 8, 2018

I know, headers should be encoded using latin-1, but I think it's not uncommon to use flask + gunicorn and looking at the closed issue from some time ago, it's not like they will change soon.

However, I will open a new issue in the gunicorn bugtracker, as it's really a gunicorn issue.

@benoitc

This comment has been minimized.

Copy link

commented May 9, 2018

@davidism that's not true according the updated spec 7230:

Historically, HTTP has allowed field content with text in the ISO‑8859‑1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US‑ASCII octets. A recipient SHOULD treat other octets in field content (obs‑text) as opaque data.

Anyway it used to work with previous versions of flask. What changed since?

@tilgovi

This comment has been minimized.

Copy link

commented May 9, 2018

Anyway it used to work with previous versions of flask. What changed since?

336d6a4

@davidism

This comment has been minimized.

Copy link
Member

commented May 9, 2018

@benoitc I based the change on the Unicode section of PEP 3333:

Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.

Werkzeug encodes with Latin-1 in many other places, including the encoding dance, so I'm not sure why it didn't cause issues before.

The wsgiref server uses Latin-1 / ISO-8859-1 as well.

@davidism

This comment has been minimized.

Copy link
Member

commented May 24, 2018

Can I get an example of a filename that caused Gunicorn to raise an error? I changed send_file to use ascii encoding, but our test with a unicode filename still passed without change.

flask/tests/test_helpers.py

Lines 641 to 649 in d22491a

def test_attachment_with_utf8_filename(self, app, req_ctx):
rv = flask.send_file('static/index.html', as_attachment=True, attachment_filename=u'Ñandú/pingüino.txt')
content_disposition = set(rv.headers['Content-Disposition'].split('; '))
assert content_disposition == set((
'attachment',
'filename="Nandu/pinguino.txt"',
"filename*=UTF-8''%C3%91and%C3%BA%EF%BC%8Fping%C3%BCino.txt"
))
rv.close()

Here's the code for the filename header (I replaced 'latin-1' with 'ascii' locally):

flask/flask/helpers.py

Lines 566 to 577 in d22491a

try:
attachment_filename = attachment_filename.encode('latin-1')
except UnicodeEncodeError:
filenames = {
'filename': unicodedata.normalize(
'NFKD', attachment_filename).encode('latin-1', 'ignore'),
'filename*': "UTF-8''%s" % url_quote(attachment_filename),
}
else:
filenames = {'filename': attachment_filename}
headers.add('Content-Disposition', 'attachment', **filenames)

@davidism

This comment has been minimized.

Copy link
Member

commented May 24, 2018

The following fails with Gunicorn:

from flask import Flask, send_file

app = Flask(__name__)

@app.route('/')
def index():
    return send_file('example.py', as_attachment=True, attachment_filename='pingüino.txt')
gunicorn example:app

The filename is encodable as Latin-1, so it's passed through as-is. If it had UTF-8 characters, it would trigger the filename* behavior and the result would be ASCII, which is why the name from the current test worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.