Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
url delims. worthwhile? #3270
It would be good to maybe discuss this, as it's a common complaint.
On the one hand, the various RFCs are quite clear that whitespace, <>, and quotes are delimiters, and thus cannot ever appear in URLs. In url.js, there's this:
Any delims that are found are thus assumed to be not part of the URL.
However, in practice, this is a bit of a deviation from the way that URLs are handled by browsers in the address bar, the window.location object, and the anchor tag, which serve as models for node's URL parser.
Would anyone object if delims were just auto-escaped?
The change in behavior would be:
Currently this is parsed as
With the proposed change, it'd be:
Which, incidentally, is exactly what you get if you type
While this could potentially reduce the usefulness of the url module as a validator, it'd reduce a common gotcha for newcomers, and would make it a closer model to the browser, which was always the original intent of the module.
If we're going to do it, we should do it completely. We've had these little debates about escaping or not escaping spaces, and quotes, and various things one by one. Maybe we ought to just get rid of the delims altogether.
Actually, the string passed in the example is not a valid URL per http://tools.ietf.org/html/rfc3986 and treatment of delims should ideally be done in context as in "parse URIs from this blob of text". But AFAIU the semantics of
@s3u I couldn't have put better; I thought about that too but couldn't find the words. However I do believe url.parse() should assume a single-URL string context... which means this stuff should be escaped, otherwise it's not a single URL, and what does it even mean to put two URL's in there?
@s3u RFC 3986 is about URIs, and well outside the scope of node's url parser, which is only concerned with Uniform Resource Locators, of the sort used to address resources on the web.
Delims are only relevant when parsing urls in a blob of text. If you know that the entire string represents a URL, then it seems rather common to assume that they'll be escaped.
The question is really: will this break any existing programs? What gotchas will it cause?
If the answer to both questions is "none that exist, but some that I can imagine, if I try really hard", then that's a +1 for the feature.