Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't return complete URL in response.url when url has a hash (#) character #79

Closed
lsemel opened this issue Jun 24, 2011 · 3 comments
Closed

Comments

@lsemel
Copy link

lsemel commented Jun 24, 2011

This may or may not be correct HTTP behavior, but when using a URL with a hash (#) character, response.url chops off the URL at the hash character. I would have expected the full URL. (For redirects, I can retrieve it from the location header in the history.)

>>> import requests
>>> url = 'http://twitter.com/#!/charliesheen/status/42731720402931712'
>>> shortened = 'http://bit.ly/fxDgIl'
>>> response = requests.get(url)
>>> response.url
'http://twitter.com/'
>>> response = requests.get(shortened)
>>> response.url
'http://twitter.com/'
>>> response.history[0].headers['location']
'http://twitter.com/#!/charliesheen/status/42731720402931712'
@kennethreitz
Copy link
Contributor

This is correct behavior.


Traditionally, The # is used as a fragment identifier on the client side, to specify an ancor tag (e.g. http://docs.python-requests.org/en/latest/api/#requests.get)

More info: http://en.wikipedia.org/wiki/Fragment_identifier

However, hacking the fragment identifiers into "hashbang urls" (containing /#!/) has been a bit of a (controversial) trend lately. They were used for a while for javascript client applications to be relatively semantic. Some websites (gawker up until recent, twitter) are using them as their main URL system, which is confusing to users and developers alike.

If you look closely at what your web browser does when you request a URL like http://twitter.com/#!/charliesheen/status/42731720402931712, the GET request is actually only to http://twitter.com/. The browser receives a generic page that contains javascript that carries out window.location search, and loads the tweet in the hashbang url.

You can learn more about hashbangs here:


Update: Tweets have actual semantic urls too:
https://twitter.com/charliesheen/status/42731720402931712

@jgorset
Copy link
Contributor

jgorset commented Jun 24, 2011

I believe this is correct behaviour, and a good example of why using fragments in this way is a really bad idea.

Fragment identifiers have a special role in information retrieval systems as the primary form of client-side indirect referencing, allowing an author to specifically identify aspects of an existing resource that are only indirectly provided by the resource owner. As such, the fragment identifier is not used in the scheme-specific processing of a URI; instead, the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme. - RFC 3986

Edit: Damn, @kennethreitz, you fast!

@kennethreitz
Copy link
Contributor

🍰

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants