Doesn't return complete URL in response.url when url has a hash (#) character #79

lsemel · 2011-06-24T16:07:36Z

This may or may not be correct HTTP behavior, but when using a URL with a hash (#) character, response.url chops off the URL at the hash character. I would have expected the full URL. (For redirects, I can retrieve it from the location header in the history.)

>>> import requests
>>> url = 'http://twitter.com/#!/charliesheen/status/42731720402931712'
>>> shortened = 'http://bit.ly/fxDgIl'
>>> response = requests.get(url)
>>> response.url
'http://twitter.com/'
>>> response = requests.get(shortened)
>>> response.url
'http://twitter.com/'
>>> response.history[0].headers['location']
'http://twitter.com/#!/charliesheen/status/42731720402931712'

kennethreitz · 2011-06-24T16:20:01Z

This is correct behavior.

Traditionally, The # is used as a fragment identifier on the client side, to specify an ancor tag (e.g. http://docs.python-requests.org/en/latest/api/#requests.get)

More info: http://en.wikipedia.org/wiki/Fragment_identifier

However, hacking the fragment identifiers into "hashbang urls" (containing /#!/) has been a bit of a (controversial) trend lately. They were used for a while for javascript client applications to be relatively semantic. Some websites (gawker up until recent, twitter) are using them as their main URL system, which is confusing to users and developers alike.

If you look closely at what your web browser does when you request a URL like http://twitter.com/#!/charliesheen/status/42731720402931712, the GET request is actually only to http://twitter.com/. The browser receives a generic page that contains javascript that carries out window.location search, and loads the tweet in the hashbang url.

You can learn more about hashbangs here:

http://danwebb.net/2011/5/28/it-is-about-the-hashbangs

Update: Tweets have actual semantic urls too:
https://twitter.com/charliesheen/status/42731720402931712

jgorset · 2011-06-24T16:22:58Z

I believe this is correct behaviour, and a good example of why using fragments in this way is a really bad idea.

Fragment identifiers have a special role in information retrieval systems as the primary form of client-side indirect referencing, allowing an author to specifically identify aspects of an existing resource that are only indirectly provided by the resource owner. As such, the fragment identifier is not used in the scheme-specific processing of a URI; instead, the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme. - RFC 3986

Edit: Damn, @kennethreitz, you fast!

kennethreitz · 2011-06-24T21:23:24Z

🍰

kennethreitz closed this as completed Jun 24, 2011

zhangyuchun mentioned this issue May 3, 2018

recv block forever #4624

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't return complete URL in response.url when url has a hash (#) character #79

Doesn't return complete URL in response.url when url has a hash (#) character #79

lsemel commented Jun 24, 2011

kennethreitz commented Jun 24, 2011

jgorset commented Jun 24, 2011

kennethreitz commented Jun 24, 2011

Doesn't return complete URL in response.url when url has a hash (#) character #79

Doesn't return complete URL in response.url when url has a hash (#) character #79

Comments

lsemel commented Jun 24, 2011

kennethreitz commented Jun 24, 2011

jgorset commented Jun 24, 2011

kennethreitz commented Jun 24, 2011