New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bytestring URLs on Python 3.x #2238

Merged
merged 1 commit into from Oct 2, 2014

Conversation

Projects
None yet
4 participants
@joealcorn
Contributor

joealcorn commented Sep 20, 2014

Hi there folks.

Currently prepare_url will call unicode or str on the url arg depending on the python version. This works fine for most cases, but the one case it trips up on is bytestrings on python 3.x as the string representation of these is "b'http://httpbin.org'". Eventually this will surface as an InvalidSchema exception.

I find this to be completely unexpected, and I'd imagine it's not something that's been done intentionally.

Technically this a breaking change. The possibility of passing non-strings to prepare_url is undocumented and untested, but regardless it may be better to go about fixing this in a different way, that's your call.

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 20, 2014

Member

Thanks for this!

I think we like the ability to pass non-strings to prepare_url. It allows people to use custom url-building classes without running into trouble. It's worth noting that you change also breaks Python 2 behaviour: previously a Python 2 str would get lifted to a Python 2 unicode, which it now doesn't do.

I think we can get around this by simply special-casing string types. The logic is really:

if type is unicode, leave unchanged
else, if type is bytes, decode bytestring to unicode
else, if type is anything else, call the 'to unicode' method

I think that's really the logic we want here. @sigmavirus24, thoughts?

Member

Lukasa commented Sep 20, 2014

Thanks for this!

I think we like the ability to pass non-strings to prepare_url. It allows people to use custom url-building classes without running into trouble. It's worth noting that you change also breaks Python 2 behaviour: previously a Python 2 str would get lifted to a Python 2 unicode, which it now doesn't do.

I think we can get around this by simply special-casing string types. The logic is really:

if type is unicode, leave unchanged
else, if type is bytes, decode bytestring to unicode
else, if type is anything else, call the 'to unicode' method

I think that's really the logic we want here. @sigmavirus24, thoughts?

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 20, 2014

Member

I agree with @Lukasa that this is the behaviour we want. We absolutely want unicode urls because on Python 3, we can handle IRIs. We would need to adopt something that implements RFC 3987 to support it on Python 2, but once we did, we would be able to support them there too. For example, http://☃.net should be supported. (Your browser, and Python 3 should properly "encode" that to http://xn--n3h.net/.)

Member

sigmavirus24 commented Sep 20, 2014

I agree with @Lukasa that this is the behaviour we want. We absolutely want unicode urls because on Python 3, we can handle IRIs. We would need to adopt something that implements RFC 3987 to support it on Python 2, but once we did, we would be able to support them there too. For example, http://☃.net should be supported. (Your browser, and Python 3 should properly "encode" that to http://xn--n3h.net/.)

@joealcorn

This comment has been minimized.

Show comment
Hide comment
@joealcorn

joealcorn Sep 29, 2014

Contributor

Ah good point, forgot about that use case.
Have updated the branch, I believe it's doing what it should be now, how's that?

Contributor

joealcorn commented Sep 29, 2014

Ah good point, forgot about that use case.
Have updated the branch, I believe it's doing what it should be now, how's that?

Show outdated Hide outdated requests/models.py
#: as this will include the bytestring indicator (b'')
#: on python 3.x.
#: https://github.com/kennethreitz/requests/pull/2238
if not is_py2:

This comment has been minimized.

@sigmavirus24

sigmavirus24 Sep 30, 2014

Member

Would this be acceptable:

try:
    url = url.decode('utf8')
except AttributeError:
    url = unicode(url) if not is_py2 else str(url)
@sigmavirus24

sigmavirus24 Sep 30, 2014

Member

Would this be acceptable:

try:
    url = url.decode('utf8')
except AttributeError:
    url = unicode(url) if not is_py2 else str(url)

This comment has been minimized.

@joealcorn

joealcorn Sep 30, 2014

Contributor

That's nice and succinct, have ammended and pushed

@joealcorn

joealcorn Sep 30, 2014

Contributor

That's nice and succinct, have ammended and pushed

@joealcorn joealcorn changed the title from Call to_native_string on urls passed to prepare_url to Support bytestring URLs on Python 3.x Sep 30, 2014

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Sep 30, 2014

Member

❤️ @buttscicles

@Lukasa this looks okay to me. Thoughts?

Member

sigmavirus24 commented Sep 30, 2014

❤️ @buttscicles

@Lukasa this looks okay to me. Thoughts?

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Sep 30, 2014

Member

🍰 Make it so.

Member

Lukasa commented Sep 30, 2014

🍰 Make it so.

@kennethreitz

This comment has been minimized.

Show comment
Hide comment
@kennethreitz

kennethreitz Oct 2, 2014

Member

Let's not document this :)

Member

kennethreitz commented Oct 2, 2014

Let's not document this :)

kennethreitz added a commit that referenced this pull request Oct 2, 2014

Merge pull request #2238 from buttscicles/byte-urls
Support bytestring URLs on Python 3.x

@kennethreitz kennethreitz merged commit e74791d into requests:master Oct 2, 2014

@joealcorn joealcorn deleted the joealcorn:byte-urls branch Oct 3, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment