Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a default encoding in Response's text property #1546

Closed
JerryKwan opened this issue Aug 21, 2013 · 3 comments
Closed

use a default encoding in Response's text property #1546

JerryKwan opened this issue Aug 21, 2013 · 3 comments

Comments

@JerryKwan
Copy link

why not use a default encoding in Response's text property?if the server does not set content-type explicitly, why not use a default encoding such as utf-8? the chardet.detect() function is time consuming. if some one just use response.text and does aware the inner mechanism, he may think the Requests library is too slow and change to other libraries.

@Lukasa
Copy link
Member

Lukasa commented Aug 21, 2013

Thanks for asking this question @JerryKwan! The short and pithy answer is: because it's better to be slow and correct than fast and wrong. =)

If we were concerned about speed we'd simply not have the Response.text property at all, and only ever use Response.content (with a silly hack for Response.json()). This avoids performing any unicode decoding at all, which will save even more time.

Having a 'default' encoding is just wild optimism, because no such default exists on the web. Saying that we'll use UTF-8 whenever we don't know what the correct encoding is means that some users will find that Requests very quickly downloads gibberish. They will then conclude that Requests, while very fast, also doesn't work properly, and they'll go and use another library. =)

EDIT: A user can also simulate this behaviour by searching for a Content-Type header with the encoding, and if it fails to find one set Response.encoding = 'utf-8'.

@kennethreitz
Copy link
Contributor

You would find that a default of 'utf-8' would make a shockingly high number of requests fail :)

@kennethreitz
Copy link
Contributor

Actually, I forgot to mention something. This is implemented so that you can provide your own default if you'd like.

>>> r = requests.get('http://httpbin.org/get')
>>> r.encoding = 'utf-8'
>>> r.text
...

This will fully skip encoding detection.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants