Using encodings other than UTF-8 in Response #1005

barosl · 2014-03-19T21:17:07Z

The Flask documentation states that Flask assumes the encoding of the response to be UTF-8.

the encoding for text on your website is UTF-8

From http://flask.pocoo.org/docs/unicode/

Does that mean we are discouraged to use the encodings other than UTF-8 in the Flask response? I was unable to find a way to change the intended encoding of neither flask.wrappers.Response nor werkzeug.wrappers.Response correctly.

I cannot directly pass the text to the constructor, as it calls set_data() with the UTF-8 encoding. That's because the constructor has no charset parameter. There is no way to change its behavior. So I should create the response object with no constructor arguments, and then assign 'utf-8' to response.charset, and call response.set_data().
But still, as content_type is determined in the constructor, it will still be "text/html; charset=UTF-8" because the charset attribute is always 'utf-8' during the object creation process. So I'm forced to pass content_type to the constructor, which is kinda confusing because my original intention was just changing the encoding, rather than explicitly setting the Content-Type.

Do I understand the process accurately?

If I'm right, I suggest:

Allow passing charset to the Response class.
Or, the content_type attribute should be updated again when the user manually sets the charset attribute.

The text was updated successfully, but these errors were encountered:

ThiefMaster · 2014-03-19T21:19:19Z

Why would you want any other encoding for text?

barosl · 2014-03-19T21:29:02Z

@ThiefMaster My original intention was not emitting the charset header at all, cause we have many legacy documents written in the encoding other than UTF-8. So until we convert them to the unified format, I was to let the client choose the encoding by itself.

ThiefMaster · 2014-03-19T21:31:34Z

I think converting them is a better idea. Letting the client choose the charset is a bad idea - chances are good it'll get it wrong and show it as gibberish. I guess all of your documents have the same charset? If yes it shouldn't be too hard to convert them!

barosl · 2014-03-19T21:40:14Z

@ThiefMaster Currently at least the two encodings(cp949, cp932) are used, which are so similar that I cannot make an automated converter, because the text in one encoding does not cause UnicodeDecodeError when decoded by the other encoding... The only way to determine the encoding is using chardet, which is not a 100% solution.

remram44 · 2014-08-02T19:41:23Z

To summarize:

You can set or remove the charset by returning Response(b'data', content_type='text/html; charset=whatever') (but you have to mention the mimetype)
You can set the charset by subclassing Response and setting the 'charset' attribute to something else (which will be used for all text/* or xml mimetypes) (but get_content_type() won't accept None).

Maybe adding a check for self.charset is None before calling get_content_type(mimetype, self.charset)? (in werkzeug) Optionally, accepting it as parameter as well.

(pushed these to override-response-charset)

davidism · 2017-04-08T17:58:09Z

Going to close this in favor of the options in the previous comment.

barosl changed the title ~~Encodings other than UTF-8~~ Using encodings other than UTF-8 in Response Mar 19, 2014

davidism closed this as completed Apr 8, 2017

cspaier mentioned this issue Sep 22, 2020

Accents non reconnus cspaier/pronote2wims#9

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using encodings other than UTF-8 in Response #1005

Using encodings other than UTF-8 in Response #1005

barosl commented Mar 19, 2014

ThiefMaster commented Mar 19, 2014

barosl commented Mar 19, 2014

ThiefMaster commented Mar 19, 2014

barosl commented Mar 19, 2014

remram44 commented Aug 2, 2014

davidism commented Apr 8, 2017

Using encodings other than UTF-8 in Response #1005

Using encodings other than UTF-8 in Response #1005

Comments

barosl commented Mar 19, 2014

ThiefMaster commented Mar 19, 2014

barosl commented Mar 19, 2014

ThiefMaster commented Mar 19, 2014

barosl commented Mar 19, 2014

remram44 commented Aug 2, 2014

davidism commented Apr 8, 2017