use a default encoding in Response's text property #1546

JerryKwan · 2013-08-21T07:35:04Z

why not use a default encoding in Response's text property?if the server does not set content-type explicitly, why not use a default encoding such as utf-8? the chardet.detect() function is time consuming. if some one just use response.text and does aware the inner mechanism, he may think the Requests library is too slow and change to other libraries.

Lukasa · 2013-08-21T07:51:18Z

Thanks for asking this question @JerryKwan! The short and pithy answer is: because it's better to be slow and correct than fast and wrong. =)

If we were concerned about speed we'd simply not have the Response.text property at all, and only ever use Response.content (with a silly hack for Response.json()). This avoids performing any unicode decoding at all, which will save even more time.

Having a 'default' encoding is just wild optimism, because no such default exists on the web. Saying that we'll use UTF-8 whenever we don't know what the correct encoding is means that some users will find that Requests very quickly downloads gibberish. They will then conclude that Requests, while very fast, also doesn't work properly, and they'll go and use another library. =)

EDIT: A user can also simulate this behaviour by searching for a Content-Type header with the encoding, and if it fails to find one set Response.encoding = 'utf-8'.

kennethreitz · 2013-08-21T07:52:59Z

You would find that a default of 'utf-8' would make a shockingly high number of requests fail :)

kennethreitz · 2013-08-21T18:24:06Z

Actually, I forgot to mention something. This is implemented so that you can provide your own default if you'd like.

>>> r = requests.get('http://httpbin.org/get')
>>> r.encoding = 'utf-8'
>>> r.text
...

This will fully skip encoding detection.

kennethreitz closed this as completed Aug 21, 2013

itsadok mentioned this issue Nov 14, 2013

[Suggestion] Simplify charset handling #1737

Open

Lukasa mentioned this issue Dec 3, 2013

Response should not return 'ISO-8859-1' as default encoding #1774

Closed

Lukasa mentioned this issue Aug 7, 2014

add auto detect charset from http body when http headers not seted #2161

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a default encoding in Response's text property #1546

use a default encoding in Response's text property #1546

JerryKwan commented Aug 21, 2013

Lukasa commented Aug 21, 2013

kennethreitz commented Aug 21, 2013

kennethreitz commented Aug 21, 2013

use a default encoding in Response's text property #1546

use a default encoding in Response's text property #1546

Comments

JerryKwan commented Aug 21, 2013

Lukasa commented Aug 21, 2013

kennethreitz commented Aug 21, 2013

kennethreitz commented Aug 21, 2013