-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of page-relative vs. root-relative paths #16
Comments
No - thank you for using furl. This behavior is a result of the ambiguity of incomplete URLs. For example >>> f = furl('the/rainbow') is clearly a path. But what about >>> f = furl('google.com') Is the intended URL the path '/google.com' or the domain 'google.com/'? It's By default, furl treats ambiguous inputs as paths. Then, when a path-only furl >>> f = furl('google.com')
>>> f.url
'/google.com' This is natural because in a full URL a path cannot start without a '/'. For >>> f = furl('a/path')
>>> f.host = 'google.com f.url should now be >>> f.url
'google.com/a/path' not >>> f.url
'google.coma/path' Note the automatically prepended '/' to 'a/path' in the final URL. It's this automatic prepending of a '/' to path-only furls that results in the I'll think about how this ambiguity and resultant unexpected behavior can be |
It makes sense for path-only URLs to be prepended with a '/' when serialized to >>> f = furl('a/path')
>>> f.url
'/a/path' I think the best course of action is to remove the invariant that URL Paths are >>> f = furl('a/path')
>>> f.url
'/a/path'
>>> str(f.path)
'a/path'
>>> f.path.isabsolute
False
>>> f.path.isabsolute = True
>>> str(f.path)
'/a/path' So, if your intention is to join() a non-absolute path to a URL, like >>> f1 = furl('http://www.domain.com/somewhere/over")
>>> f2 = furl('the/rainbow')
>>> f2.url
'/the/rainbow'
>>> str(f2.path)
'the/rainbow'
>>> f2.path.isabsolute
False
>>> f1.join(str(f2.path)).url
'http://www.domain.com/somewhere/over/the/rainbow" What do you think? |
You get to the correct results, but I'm not a fan of f2.url producing the path with the slash prepended. First, let me challenge the statement: "Paths in a URL must be preceded by a '/'." The URL RFC explicitly allows partial URLs. Here are the w3 rules on expanding them: http://www.w3.org/Addressing/URL/4_3_Partial.html. The key point is that these partial URLs commonly appear in web pages, and users of your package will definitely be trying to parse them with it. A URL of the form given in your paraphrasing of my example, i.e. "the/rainbow" has a specific meaning within the context of a parent object, and you can't arbitrarily change that meaning by prepending a '/' to it. In your earlier example, "google.com," this is a case of trying to help the implementer more than he or she deserves. According to all the rules of URLs that is a partial path. You and I might recognize it as a domain name and treat it specially, but there is no reason for your library to do so. In short, given a base URL and a list of URLs to be joined with it, this is what I would expect to happen: Base URL: http://www.domain.com/somewhere/over the/rainbow - http://www.domain.com/somewhere/over/the/rainbow The last case may seem strange, but it is the fault of the implementer, not your library. In this case adhering to the rule and allowing things to come apart at the seams is probably the kinder way to proceed. |
You're right: prepending '/' to non-absolute paths when they're serialized to a A strong, natural solution is the one I mentioned before: remove the invariant >>> f = furl('a/path')
>>> f.url
'a/path'
>>> f.path.isabsolute
False Instead of the current behavior: >>> f = furl('a/path')
>>> f.url
'/a/path'
>>> f.path.isabsolute
True For the second issue, treating 'google.com' in furl('google.com') as a path, not I'm leaving this ticket open until I fix the path issue. Pull requests welcome. |
… a netloc (a username, password, host, and/or port). Fix issue #16. Thanks to Markbnj.
This issue has been fixed in furl v0.3.5. URL paths are no longer always absolute if non-empty; they're now only always absolute in the presence of a netloc (a username, password, host, and/or port). >>> from furl import furl
>>> f = furl('/a/path')
>>> f.path.isabsolute
True
>>> f.path
Path('/a/path')
>>> f.path.isabsolute = False
>>> f.path
Path('a/path')
>>> f.host = 'arc.io'
>>> f
furl('arc.io/a/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = True
Traceback (most recent call last):
...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). A URL path must start with a '/' to separate itself from a netloc. Your original example now works (though >>> f1 = furl('http://www.domain.com/somewhere/over/')
>>> f2 = furl('the/rainbow')
>>> print f2.path
the/rainbow
>>> print f1.join(f2.url)
http://www.domain.com/somewhere/over/the/rainbow Upgrade to furl v0.3.5 with
Thank you for bringing this issue to my attention and for your input and suggestions, Markbnj. |
First, thanks for your work on furl. I've found the API very useful for slicing and dicing URLs. However I think I have found an issue with the way the class handles relative paths. Using 0.3.4 installed via pip. Consider the following:
I think the addition of the forward slash to the path in f2 is a bug, since it turns a page-relative path into a root-relative path.
The text was updated successfully, but these errors were encountered: