Skip to content
This repository

Fix for url.parse() leaving trailing ":" on the protocol/scheme #1580

Closed
wants to merge 1 commit into from
Jordan Sissel

Before:
% node -e 'require("url").parse("http://www.google.com/").protocol'
http:

After:
% node -e 'require("url").parse("http://www.google.com/").protocol'
http

Jordan Sissel Fix url.parse() leaving trailing ':' on protocol
Before:
    % node -e 'require("url").parse("http://www.google.com/").protocol';
    http:

After:
    % node -e 'require("url").parse("http://www.google.com/").protocol';
    http
7fd62d9
Jordan Sissel

Scratch that, I'll update all the tests, too.

Isaac Z. Schlueter
Collaborator

Why?

Isaac Z. Schlueter isaacs closed this August 23, 2011
Jordan Sissel

Bad day at the office? Or is today "close-without-reason Tuesday"? I'll resubmit tomorrow - hopefully with better luck. Perhaps there's good science to be had in such experiments!

In the mean time, you can mend your possibly bad day with some UCB skits, like this one: http://www.ucbcomedy.com/videos/play/6904/who-poisoned-whose-tea

Isaac Z. Schlueter
Collaborator

Sorry, my terseness was not coming from any sort of disrespect or grumpiness. What I mean is, why is this a good change?

There's been zero discussion of this idea. It is purely cosmetic but will break lots of node programs. What's the benefit?

I closed the issue because, in lieu of some very compelling reason, there's no way this is happening. There's a "reopen" button if such reason can be found, but otherwise, it may as well not take up space on the list.

So...

Why?

Jordan Sissel

Well, I filed this because I believe I found a bug. I sent it as a pull because it was an easy patch. There's been zero discussion because nobody had filed this bug before. The comment section of github issues and pulls are a great arena for such discussion (and code review).

As for a compelling reason, here's a few:

  • python's urlparse module says "http"
  • ruby's uri lib says "http"
  • java's java.net.URL class says "http"
  • perl's URI module says "http"
  • url.js is nowhere anywhere near RFC1738, and that sucks (X)

Code samples for each of 4 languages mentioned above:

% python -c 'import urlparse; print urlparse.urlparse("http://www.google.com").scheme'
http
% ruby -ruri -e 'puts URI.parse("http://www.google.com/").scheme'
http
% jruby -rjava -e 'puts java.net.URL.new("http://www.google.com/").getProtocol'
http
% perl -mURI -le 'print URI->new("http://www.google.com")->scheme'
http

(X) url.js is pretty broken with respect to RFC1738 and real-world url stuff, but much of that is out of scope for this issue.

Jordan Sissel

Specific snippets of grammar from RFC1738:

 ; The generic form of a URL is:
genericurl     = scheme ":" schemepart

; the scheme is in lower case; interpreters should use case-ignore
scheme         = 1*[ lowalpha | digit | "+" | "-" | "." ]

"scheme" above is what url.js calls "protocol" - note how the colon isn't part of the scheme. The scheme grammar includes no allowance for colon characters.

Christopher Jeffrey
chjj commented August 24, 2011

To be fair, I can think of one URL parser that does keep the trailing colon: the browser.

F12 + window.location.protocol

Isaac Z. Schlueter
Collaborator

Node's url.js borrows its naming conventions from the location object in the browser, extended (but not changed) in the following ways:

  • add query, since that's such a common use case, and we have a querystring parser as well, so it's really easy
  • add auth, and special handling for mailto:, file:, javascript: and a few others, since node sees these, but client-side JS never does
  • lastly, parser is designed to handle a "path-only" url, as is most commonly found on HTTP requests.

The browser is the reference implementation as far as url parsing and resolving is concerned. People who aren't familiar with every relevant RFC (ie, almost everyone) expect it to work the same.

Robert Sköld

@jordansissel I found this a bit strange as well and it made me write https://github.com/publicclass/addressable which is basically Rubys extended URI gem "addressable" for js. It parses urls closer to the RFC with some extra features found in the ruby gem...

Jordan Sissel

Even if you're ignoring everything else, and only using 'the browser' as your specification/reference, url.js still falls down pretty hard.

Further, I'm not sure what you're calling "the browser" (which browser? which version?). Example, google chrome 13 fails to recognize "svn+ssh://foo.com/" as a valid URL, but Firefox 4.0.1 does fine. Additionally, url.js doesn't properly parse data urls, while most modern browsers handle seem to handle this fine, so is url.js really based on a browser, or was it based on some fantasy of what some browser somewhere at some time might maybe have done?

Whatever happens, it would be nice if folks like @slaskis didn't have to look at the core / standard library of a given tool and go "that's mad broken, dog" and have to fix a standard, broken thing by making a new third-party thing.

(edit: google chrome might recognize svn+ssh://, but the behavior of unknown url schemes seems to be to search them instead of saying 'i don't know what this is')

Christopher Jeffrey
chjj commented August 24, 2011

If we're going to bring IETF spec's into this. Here is the regex that the URI RFC recommends for parsing URI's: http://tools.ietf.org/html/rfc3986#page-51 (I use a modified version of this myself, it's pretty fast and it works well.)

/^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/

Which for the following URL:

http://www.ics.uci.edu/pub/ietf/uri/#Related

Results in the following captures:

$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related

They capture both the trailing colon and the protocol without it. What to do now? ;)

Isaac Z. Schlueter
Collaborator

It'd be nice if url.parse handled + in protocols. I don't think there's an open issue on this, feel free to post one. It's not strictly allowed for hyperlinks, but as you point out, urls are more than href attributes.

I'd also like to figure out some deterministic way to handle urls that separate the hostname from a cwd-relative path with :, like ssh://foo@bar.com:some-dir. It would make it easier to parse git remote urls in npm.

It would also be nice if the return value of url.parse was an instance of a class that had a toString method which returned the formatted href, since that's the only omission from what the browser implementations provide. But that's not terribly important. (Incidentally, since new is often faster in v8 than creating an object literal, it might be a slight speed improvement, but it'd be slight, and I doubt that any node program is spending enough time parsing urls to notice.)

Whatever happens, it would be nice if folks like @slaskis didn't have to look at the core / standard library of a given tool and go "that's mad broken, dog" and have to fix a standard, broken thing by making a new third-party thing.

It is not the intent of node to be a batteries-included platform. https://github.com/joyent/node/wiki/node-core-vs-userland

Url parsing is an extremely common task that most web servers and clients need to do. There is an existing quorum in the leading JavaScript implementations on how urls are to be parsed, so we're following it at least as closely as any browser follows any other.

google chrome might recognize svn+ssh://

It doesn't fetch it, and hyperlinks with svn+ssh hrefs aren't followed. If you configure chrome to open a different application for this protocol, then it'll open that application.

Further, I'm not sure what you're calling "the browser" (which browser? which version?).

window.location parsing is fairly consistent across browsers I've worked with. I originally tested url.js against Firefox 3 and 4, Chrome stable and dev (10-ish, I think?), Safari 4/Webkit nightly, and MSIE 6 and 7. Typically, Chrome and Firefox are considered authoritative in this case.

Jordan Sissel

It is not the intent of node to be a batteries-included platform. https://github.com/joyent/node/wiki/node-core-vs-userland

Foot-stomp and 'you shall not pass' heard and noted. I withdraw my patch and bug report.

Isaac Z. Schlueter
Collaborator

Foot-stomp and 'you shall not pass' heard and noted.

Hahah. I think @ry is the gandalf character here. You know he once removed url parsing entirely?

Robert Sköld

@chjj yep, that's the regex I use in addressable. Works a treat :)

Jonathan Matthews

@chjj and @isaacs: $1 only collects the scheme's trailing ":" because the grammar for "collecting-parens-but-who-cares" hadn't yet been implemented in POSIX regex :D

Read the next section of the RFC; the sentence starting "Therefore, we can determine the value of the five components as ..." is kind of important. "$1" is irrelevant. It is an ex-regex. It has perished. It does not matter.

How about the ascii-art section "Syntax Components", or section 3? Given that none of the art there notes the trailing ":" as being remotely important, how about dumping it? How about noting that the only purpose that that section imparts the ":" is "the character between the first part we care about and the second part we care about"? It's a string literal - a character that could, should and must be thrown away in order for code at a higher level to use the library code sensibly!

[Edited to remove alcoholic content]

Mikeal Rogers

we need to match expectations, the expectation is that this behaves like the browser.

we live with the browser's mistakes, that's the web.

Will Jessop

A URL parsing lib including a trailing : in the scheme part looks like a bug to me, regardless of the mistakes "the browsers" have made in the past.

Giles

yeah, that shit's fucking insane. I don't give a damn about your ideology, any ideology which leads you to write something like that has its head up its ass.

Giles

Sorry, nothing personal. Immensely useful library. I use it and love it. But you have got to be fucking kidding me.

starwed

Heh, I read a blog post a few days ago about the differences between various URL parsing/slicing.

http://tantek.com/2011/238/b1/many-ways-slice-url-name-pieces

Quick reference image:
http://farm7.static.flickr.com/6203/6082913622_c953b1fc96_o.png

Don Park

well, I'll take practical sense over logical sense, head up its ass or not.

xenomonkey

Um, guys. Why not just include protocol with the colon (eg "http:") and scheme without it ("http"). That way everyone wins.

Jon Watte

I'm a reasonably experienced developer, and my expectation was not that it would include the colon.
How about adding a "scheme" property, without the colon, to the parser result?
Also, staying compatible with all software in all versions is how you end up with technologies like Windows ME, or PHP. Successful, but painful. Is that the goal of Node?

Isaac Z. Schlueter
Collaborator

@xenomonkey Adding scheme without the colon is not out of the question. But why? (Seriously. Make a compelling case beyond "ruby and python users will stop telling node users that node is unusable.")

@gilesbowkett I'm not sure who you're insulting.

@jwatte How long did it take you to adapt your program to handle the ":" in the protocol property? A whole minute? Less?

I'm really not seeing a bug here.

Will Jessop

But why?

@isaacs: For me @jordansissel already covered it, but to summarise, convention, standards and some sense of correctness.

xenomonkey

@isaacs Upon reflection I think my suggestion is crap (results in bloated code for, as you say, no gain). You're quite right the only reason for doing it would be to make node more similar to other languages and slightly more compliant with the RFC. Either scheme (without the colon) or protocol (with the colon) should be supported but not both. Personally I don't care either way (I'm quite capable of adding/stripping a character from a string if I need to).

I would suspect that since node is based on javascript that you should stick with the most expected solution. For javascript programmers than would be to use protocol and leave the colon on the end.

Yeah, what the hell was @gilesbowkett on about?

Anyway, I don’t know why everybody’s flipping out about this. @isaacs’ point is perfectly appropriate: Node is JavaScript; Node is intended to parallel the browser; Node is intended to keep everything in userspace code. There is no real argument for the change, except that “some other languages do it this way.” What browsers do ≥ what other languages do, because Node is intended to ape the browser, not those other languages.

Jordan Sissel

But why? (Seriously. Make a compelling case beyond "ruby and python users will stop telling node users that node is unusable.")

Because asking for a common nomenclature is silly, right?

Your reply sounds quite like a troll baiting. You asked for a compelling case and was provided data. You disregarded that data and revised your response, "Make a compelling case beyond [all data provided]". If more data is provided, you'll simply repeat the same pattern above, right?

I would recommend folks stop updating this issue since you'll likely just get stuck in a loop of providing data and having the data be discarded I'd delete it if I could find such a button on github.

I'm wasn't looking for flames or fanaticism. I just wanted a bug fixed.

Dan Manges

Node is intended to parallel the browser

Other than Node and browsers both using Javascript, I've never realized that there is supposed to be a parallel. Is there?

Christopher Jeffrey
chjj commented August 28, 2011

Is it me, or is this argument over a single colon?

Tyler

I'm confused why the browser behavior would be the source of expectations for server side development...

My vote is that url parsing should match the RFC (that's what the RFCs are for - to define "correct" behavior, are they not?).

is the ":" included in the definition of "protocol/scheme" in the specification for URLs? I think the answer is "No" and if that's the case, then it shouldn't be included.

Jordan Sissel

Node is intended to parallel the browser

"Node's goal is to provide an easy way to build scalable network programs. " - http://nodejs.org/#about

I lol’d

I lol’d. @chjj++

Jordan Sissel

Is it me, or is this argument over a single colon?

@chjj: most of url.js is pretty wonky as I stated earlier. This patch was sort of a water-testing attempt to see if it was worth fixing it up and sending a bigger patch set. Nodejs has a strong history of backwards-incompatibility and experimentation between stable releases (see 0.2 vs 0.4) so I figured it was worth some effort to fix this library.

Giles

blarg! garrrfulnubb. blarp

Nicolás Sanguinetti
foca commented August 28, 2011

Don't fight, you guys

Giles

we have hit the meme gif event horizon

Jon Watte

The name of the protocol (or "scheme") is "http" not "http:". The RFC does specify correct behavior. If the goal of Node is to mirror the browser, then where is the Node DOM? Why implement the http module instead of XMLHttpRequest()?

And while it's a minute for me to remove the colon, ONCE I KNOW ABOUT IT, multiplied by the number of users that have to run into this problem, debug why their code doesn't work, and then fix it, the number of minutes add up, and the number of lines of user-space code add up. Meanwhile, doing it once, according to the RFC, in Node, saves all that time. The cost is breaking backwards compatibility, which is not particularly outside the tradition of Node.

However, it sounds as if the stewardship of Node now doesn't listen to developers (do you see ANYONE else arguing FOR keeping the comma?) or have any desire to keep Node code clean instead of dirty, so I'm probably just talking to the hand at this point.

Tyler

@jwatte agreed. +1 for making url.js standards compliant.
@foca it's not a fight; @isaacs asked "why?" and people are responding.

Seems obvious to @isaacs that it should match the browser, but I think that only makes sense if one thinks it useful to have code that works in both browser or server or if you're coming from a client side mindset/background and are just "used to it behaving that way."

For people who are coming from a server-side/backend background or who are only thinking about nodejs for server side development, then conforming to some browser convention in conflict with an approved and long-standing standard just seems crazy.

That's what the two perspectives seem like to me.

It seems worth noting that, so far, there are no comments asking to keep the ":" and ignore the standard, besides @isaacs who (I think) is mostly saying, "if you want this change which will break existing apps, motivate it."

Since I'm in the second camp (hate browsers for their non-standards-complient behaviors and only using nodejs for server side development), I can't imagine arguing to keep a behavior in conflict with published and long standing standards. I would motivate making url.js RFC1738 compliant by noting that historically standards have had great utility and the colon is clearly not part of the scheme/protocol (on either a conceptual or standards basis - this is clearly the case since they are not allowed within the scheme, e.g., foo:bar://whatever is illegal). From the standard:

Scheme names consist of a sequence of characters. The lower case
letters "a"--"z", digits, and the characters plus ("+"), period
("."), and hyphen ("-") are allowed. For resiliency, programs
interpreting URLs should treat upper case letters as equivalent to
lower case in scheme names (e.g., allow "HTTP" as well as "http").

Ricardo Tomasi

edit: whatever. It's getting hard to have any kind of discussion around here.

On meeting expectations, I always wished that browsers would strip # and ? out of location.hash/search, they really don't need to be there.

TJ Holowaychuk

+1 for ditching the useless chars

Joshua Holbrook

Also: One nice thing about node is that making your own damned url parsing library is extremely cheap. Just throwin' it out there.

Edit:

If the goal of Node is to mirror the browser, then where is the Node DOM?

https://github.com/tmpvar/jsdom

Dr. Christian Kohlschütter

This pull request clearly will break existing and future code for no obvious benefit.

The correct behavior of location.protocol is clearly specified in http://www.w3.org/TR/Window/#location-attributes hence no misbehavior of Node.js.

@jordansissel If you badly need the protocol without the colon, let's propose a new attribute instead, as @jwatte suggested -- there is no location.scheme attribute yet which could return the protocol part without colon. But you should propose that at W3C level, not only for Node.js. Good luck! :-)

Joshua Holbrook

BEST IDEA

Tyler

@kohlschuetter Interesting counter document draft. Nice to see that someone wrote down the browser convention.

In this case it seems like @jordansissel is not talking about what is effectively an object representing a collection of DOM properties (or a global namespace) for the document window:

location
The value of the location attribute MUST be a Location object that represents the DocumentWindow's current location. The value MUST be the same as the Location for all views of the document."

Instead it seems like @jordansissel is talking about a generic object/class representing a URL and, as such, RFC1738 seems far more appropriate of a reference specification.

It'll only break future code that incorrectly assumes that the format of an attribute of a Window object defines the format for URLs in every context (clearly false).

In any case, certainly the idea of adding a scheme attribute that returns the correct value seems like a minimum fix. But it would seem even more ideal to make url.js actually be a generic URL object/class that represents them according to the relevant specifications.

Otherwise, one could rename url.js to window.js (ha!).

@jordansissel - you can add NSURL and CFURL (for Mac OS X & iOS) to your list of classes/implementations that implement a generic URL object conforming to the format defined in the various related RFCs (RFCs 1808, 1738, and 2732 for NSURL) instead of that used by some browser object attribute.

Jordan Sissel

This pull request clearly will break existing and future code for no obvious benefit.

You provided no data provided to support your claims above.

The correct behavior of location.protocol is clearly specified in http://www.w3.org/TR/Window/#location-attributes hence no misbehavior of Node.js.

You either didn't actually read (at all) the draft you cited or you don't actually know what node.js is.

Joshua Holbrook

You provided no data provided to support your claims above.

Any code that uses url.parse().protocol will obviously be broken. I don't know how many noders are using the url module right now, but I'd imagine it's a non-zero amount. Plus, I'm pretty sure the onus is on whoever wants to change the behavior to convince people that the benefits outweigh the inconvenience.

You either didn't actually read (at all) the draft you cited or you don't actually know what node.js is.

Are you seriously suggesting that every major browser implemented window.location.protocol incorrectly?

Jordan Sissel

Are you seriously suggesting that every major browser implemented window.location.protocol incorrectly?

What a silly thing, no. I am seriously suggesting node.js is not a web browser and also does not implement the 'window' object.

Seriously though. I see now that most folks here think node.js is a web browser - that's totally fine, yo. I disagree, hence this bug, the patch, and data provided.

We can still be friends.

Jordan Sissel

Any code that uses url.parse().protocol will obviously be broken

See also node v0.2 and v0.4. Shit's whacked. API breakages all over the place.

Christian Carter

See also node v0.2 and v0.4. Shit's whacked. API breakages all over the place.

Not to be that guy who jumps into an argument just to argue semantics, but "we've broken backwards compat in the past" is hardly an argument to do it in the future.

This patch really comes down to an ideological decision, on the part of node.

TJ Holowaychuk

@jordansissel yeah exactly, it's software, things change, code breaks. We should be aiming for what is more useful or more "correct". Having these implied chars is only really useful for stitching the urls back together, I can't think of any other reason to have them.

Mikeal Rogers

i think this issue just won "most comments from people who have never contributed to node.js", thanks Hacker News :)

Nicolas Chambrier
  • 1 for a 'scheme' attribute

Hey -

Since it's fashionable to post here.

@jordansissel

Bad day at the office? Or is today "close-without-reason Tuesday"? I'll resubmit tomorrow - hopefully with better luck. Perhaps there's good science to be had in such experiments!

In the mean time, you can mend your possibly bad day with some UCB skits, like this one:
http://www.ucbcomedy.com/videos/play/6904/who-poisoned-whose-tea

Like, what the fuck man. That isn't appropriate for a pull request. There's literally no way your commit was ever going to be merged without discussion or changes.

You just assumed you were "right" and then assumed that Isaac's reaction was purely due to "a bad day at the office", and not any "real" reason.

Interestingly enough, I find that most of the time when people don't accept my pull requests its because the code is "bad" and not because the person maintaining the project hates life.

  • Marak

Jordan Sissel

There's literally no way your commit was ever going to be merged without discussion or changes.

Hence me filing this pull request and such so a discussion could be had, and it was had. It now seems over. Now the internet has shown up (yoursel fincluded) to prove that they exist by clicking 'comment' to discuss things other than the bug or patch filed. It's cool, bro. I believe you exist. We can totally high five or beer up sometime.

You just assumed you were "right" and then assumed that Isaac's reaction was purely due to "a bad day at the office", and not any "real" reason.

Maybe you should try reading the thread past the first message :(

Isaac Z. Schlueter
Collaborator

Please read rfc1738 and rfc3986 before continuing to comment on this thread. Also relevant is rfc2396

Those specifications describe the syntax and semantics of url generation. The terminology used in them is merely to consistently describe the meaning of the different portions of the url within that document. They do not specify, or even imply, that there should be any particular way to express a URL as an object in any given programming language.

Node is not "violating a standard", or if it is, it ain't rfc1738 or rfc3986. The urls it produces are completely compliant with those guidelines. The url.resolve behavior is the only thing even covered by any of the cited "standards", and it deviates from the specs in a few places where browsers do as well. Read through tests/simple/test-url.js. It's very comprehensive. (It doesn't yet fully handle IPv6 URLs according to rfc2732, but there's an open issue on that.)

For there to be any stability in a program, the burden of justification must fall on the suggestion that a change gets made. If we make changes without having a clearly thought-through reason, then we're not developing anything, we're just changin' shit. If you do have a clearly thought-out reason, then someone asking "Why?" should be fine.

There IS a w3c specification, complete with IDL and descriptions of all the fields, for use in the JavaScript programming language. http://www.w3.org/TR/Window/#location

protocol
This attribute represents the scheme of the URI including the trailing colon (:)

Stephen Belanger
Qard commented August 29, 2011

The collective man hours spent reading this issue and arguing about the "correct" way to represent a protocol is staggering.

@jordansissel

Maybe you should try reading the thread past the first message :(

Now you assume I formed my response without reading the entire thread? Perhaps you should re-evaluate your approach to resolving conflict.

Fadrizul H.

Good job!
Good job

Davide D'Agostino

+1 for @jordansissel, I prefer a standard like others languages do.

Ben Steenhuisen

And this is why we can't have nice things.

Robert Sköld
Colin Gourlay

This colon is the source of lot of shit. Pun intended.

Steffen Gransow

Can someone tell me why adding the already multiple times suggested ".scheme" thingy is not an option? http://tools.ietf.org/html/rfc3986#section-3 states "scheme:whatever" so everyone can still have the ".protocol" as scheme plus colon and be happy.

Richard Metzler

+1 for graste. that's exactly what I thought after reading jordansissel's language comparison.

Friedemann Altrock
fwg commented August 29, 2011

Reading this felt like reading a Wikipedia discussion. If we would check off http://en.wikipedia.org/wiki/List_of_fallacies, I bet we could reach 90%.
This comment is of course even more unnecessary meta crap, but whatever, this feels pretty much like "woah, you node browser Nazis, stop cockblocking our awesome coding habits", so it does not really seem inappropriate.

Mathias Bynens

…and that’s Godwin’s Law for ya. This thread is now complete.

Friedemann Altrock
fwg commented August 29, 2011

officer on deck! cpt. obvious, sir, nothing of significance to report here!

Dr. Christian Kohlschütter

@jordansissel

You either didn't actually read (at all) the draft you cited or you don't actually know what node.js is.

I don't even attempt to argue on this level. I rather recommend that you compare node.js's URL API and the definition of the location object defined http://www.w3.org/TR/Window/#location-attributes . You will realize node.js's URL completely resembles the W3C definition, just adding one more attribute (query).

Also please note that it just happened that location is part of W3C's Window object documentation, a "location" instance does not need to have a browser window associated to it.

So, if you more or less want to silently change the behavior of an existing attribute, you WILL break code, you WILL get http// (no colon) links, and this WILL result in a lot of confusion, discussion etc. for no obvious benefit.

No matter how "right" or "wrong" the W3C source spec is, it has been established years ago, and there is a clear understanding that in (browser-side) JavaScript "protocol" has a trailing colon. Why should it be different with a client-side implementation that already tries to imitate the browser-side attributes? This minimal change essentially further prevents even limited code-reuse between the two platforms without additional benefits. As @isaacs said, you'd be just changin' shit.

As I (and several others as well) proposed, you might ask for an additional "scheme" attribute that omits the trailing colon. In fact, the languages you cited as examples more or less all used a "scheme" attribute -- why should Node.js provide this functionality under a different name ("protocol")?

Mikeal Rogers

I think it's worth noting that this issue was linked to on Hacker News and saw a huge rise in activity on the closing day of Node Knockout (http://nodeknockout.com/).

296 teams worked on an entry for 48 hours and 189 teams finished and submitted their entry for judging this week.

Not one of them complained about the url module having a ":" character in the protocol:)

Isaac Z. Schlueter
Collaborator
require(uri.scheme).get(uri,function(){});

Or, why don't we make this work?

var target = require("url").parse("https://user:pw@hostname:80843/path?search#hash")
require("http").get(target, function (res) { ... })

and have it switch from http to https based on the protocol being "https:", and also add the Authorization: Basic header.

Stephen Belanger
Qard commented August 29, 2011

Part of me is thinking that'd be a good idea. The other part is screaming, "No, switching to https when you are explicitly using the http module--that's stupid and confusing."

It's the right direction, but I'm not sure it's quite where it needs to be yet.

TJ Holowaychuk

sounds user-landish, mikeal's already handles that stuff transparently, only with the .scheme thing you wouldn't need to map "http:" "https:" to the modules

Mikeal Rogers

Yeah, I'm -1 on the http module importing the https module dynamically. Also, keep in mind that in 0.5.3+ the https and http module have their own globalAgents that remain separate so automagically calling in to https could cause additional confusion if you're changing any defaults in the globalAgent.

Stephen Belanger
Qard commented August 29, 2011

True.

I haven't tried mikeal's module. I haven't felt like there is enough of an advantage over just using the http module. I try to avoid using external dependencies unless there is a clear advantage to them.

I think that mentality is what makes people think more stuff should be in core. But rather than dumping everything in core, we try to encourage using external modules. Express, for example; is used by the majority of Node.js apps in some capacity, but it should never be in the core.

Anything that is opinionated at all should be user-land, and this is starting to look rather opinionated. I think I agree with Ryan's original decision to not even include url parsing in core. Arguments like this aren't productive to core development.

Isaac Z. Schlueter
Collaborator

@Qard "Opinionatedness" isn't the deciding factor for whether or not something goes in node-core. Node is extremely "opinionated" software.

On further thought, auto-switching from http to https is a bad idea. But it could perhaps verify that the protocol is correct and emit an error or something otherwise.

Mikeal Rogers
Rich Schiavi

my colon hurts

Jacob Swartwood

@isaacs I understand reluctance to pull in every arbitrary request. I would have no qualms with keeping the window.location.protocol implementation and leaving it at that.

... and I both agree and disagree with @mikeal

we need to match expectations, the expectation is that this behaves like the browser.
we live with the browser's mistakes, that's the web.

However, I will say that query is already a nice divergence from (the standard and) always having to deal with search. We should definitely match expectations, but I don't believe we always have to "live with the browser's mistakes" without any option for additional improvement.

From Node v0.4.11 (and v0.5.5) docs:

search: The 'query string' portion of the URL, including the leading question mark.
Example: '?query=string'

query: Either the 'params' portion of the query string, or a querystring-parsed object.
Example: 'query=string' or {'query':'string'}

I would love to see scheme and fragment added to the URL object... especially with fragment getting (optional) object parsing similar to query. I think this would be a nice middle ground of supporting a browser-like model for the URL, and an "app"-like model for those interested in the URL data.

If you think this is valid enough to discuss (more peacefully) I could file a separate bug with a clean proposal.

Edit: If it wasn't clear enough, I am not (even in the slightest) in favor of changing protocol; I simply wanted to entertain the possibility of a similar approach to query for the "punctuated" data properties.

Nate Wiger

@kohlschuetter

No matter how "right" or "wrong" the W3C source spec is, it has been established years ago, and there is a clear understanding that in (browser-side) JavaScript "protocol" has a trailing colon. Why should it be different with a client-side implementation that already tries to imitate the browser-side attributes?

+1

It's too late for this change in core, sorry.

For the unfamiliar, this discussion has successfully reached this point - http://bikeshed.org/

Ricardo Tomasi

Please read rfc1738 and rfc3986

This is what they say:

Parts of a generic URL:

<scheme>:<scheme-specific-part>

Definition of URL syntax:

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the ) whose interpretation depends on the scheme.

Definition of scheme/protocol:

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-")

Nowhere is the colon part of the protocol. I've never heard the "it's a browser thing" argument before, in fact most discussions I've seen veer towards ignoring browser needs (see AMD threads). Why is node copying non-standard behavior? location is a host object, not part of the javascript language.

Benoit Sigoure

Beware that colon cancer is often lethal. Better get rid of the colon right now before it's too late.

Joshua Holbrook

Of course, the alternative to a colon is a colostomy bag. Gross.

I've been a bit of a smartass in this thread, I'll admit, so I'll try to say something constructive:

First, if I were to point to a comment that reasonably explained "both sides" of the discussion, it would be @haikusw 's comment right here. I'm mostly ambivalent about the way that Node decides to do url parsing because it's so easy to start a userspace library to do it in a different ("the correct") way. That way, people that expect the url module to behave like window.location aren't surprised, and those that "know better" have their totally boss must-have userspace library just an npm install away. You seriously can't get any easier than npm when it comes to writing, publishing and installing modules, and this really helps enable the whole lightweight core thing.

Second, I see a lot of people reacting to the stance of @isaacs (and presumably other core devs) by saying, "this isn't the browser you guys why do you think this is the browser" and I think they misunderstand! Nobody thinks that node.js is the browser, but many design decisions are informed by the browser when it makes sense to do so. This is because it helps keep things consistent between node and the browser for people that learned javascript on the front end, or for people interoperating between node and the browser. This doesn't always lend itself to the best APIs---even @mikeal said we're "living with the browser's mistakes" iirc, and he clearly supports keeping it as-is. That said, I think node has been relatively consistent in this regard, and consistency is good.

Third: I've seen people claim that node is contribution-hostile. I don't think this is true, but I do see a kernel. That kernel is this: Node is mildly hostile towards the attitude of, "this isn't how platforms x, y and z do it, so clearly node needs to shape up." Some of this is probably because half the time the "this" is "concurrency," and x, y and z are things like Twisted and EventMachine. Node acts a bit like a maverick on purpose when it comes to these things, because its strength is in the fact that it doesn't live with Twisted's mistakes (so to speak). Even so, I don't think this is a case of hostility on the part of the core devs as much as it is a case of them having already consciously and deliberately decided on the current API (for better or for worse).

Finally: I think this long ago became more about egos than it was about the colon. @jordansissel, I hate to call you out, I'm sure you're a nice guy in real life, and you obviously have a reasonable argument for changing this behavior, but the way you approached made you come across as kind of a jerk. If you had started out by explaining your rationale instead of being like, "FTFY" I think things would've gone better. If you didn't immediately assume that @isaacs woke up on the wrong side of bed I think things would've gone better. If you kept your cool and didn't act so flippant when people disagreed with you, I think things would've gone better. And no, you're not the only one to show some snark--a lot of us did--but remember, you attract more bees with honey, or however that saying goes. I forget. The point is, everybody be nice and try to be understanding with each other. Behind the gravatar and markdown are real people, after all.

Now that I've said all that (and probably made an ass of myself) (and probably beat a dead horse or two) I'm gonna try and do the Right Thing now, and keep quiet unless someone responds to me directly.

Cheers!

Isaac Z. Schlueter
Collaborator

@ricardobeat

Please read rfc1738 and rfc3986

This is what they say:

Please start at the beginning of those rfcs. Don't just skip to the BNFs. Actually read them, words and all.

Or, just read the rest of the comment you quoted, or the rest of this one.

They're not specifying what you think they're specifying. They're not specs about how urls have to be parsed into an object representation. They're specs about what strings have to look like to be valid URLs, and how to tell what action should be performed by a user agent when presented with a URL.

Nowhere in that spec does it say, or even imply, that the terminology used in that document is a guideline or specification for how a programming platform should express a parsed URL as an object interface.

/me really wishing you could close comments on github issues...

doug tangren

I have never written a node program in my life but I found this thread to be very entertaining. Thanks for that. I'm really surprised someone didn't bring up the obvious. require("uri").parse(raw).protocol. I may be going out on a limb here but why not publish your own flavor of url parsing as a library? @slaskis did

Addressable is a replacement for the URI implementation that is part of Ruby's standard library. It more closely conforms to > the relevant RFCs and adds support for IRIs and URI templates.

And you know what? It's a pretty awesome alternative.

If you have a deep seeded issue with a core library that isn't breaking applications and think you have a more sensible idea for how it should be behave,... write a library! If other's like it, others will adopt it. If enough people adopt it, it becomes a standard. When it becomes a standard, then it's time to role into the standard lib.

I don't see a huge issue with require("myuri").parse(raw).protocol. Do you guys?

Aseem Kishore

@isaacs:

Nowhere in that spec does it say, or even imply, that the terminology used in that document is a guideline or specification for how a programming platform should express a parsed URL as an object interface.

I'm a little confused by this argument. If a spec that defines URIs/URLs uses certain terms, isn't it reasonable to state that a programmatic interface for describing URIs/URLs that uses the same terms use the prescribed meanings for those terms?

Aseem Kishore

I realize this is bikeshedding and beating the same dead horse for the nth time, but I'll also toss out that I expected Node.js, a server-side framework (which has deviated from the browser's ways plenty of times, as @jwatte pointed out, e.g. an http module instead of XMLHttpRequest, a synchronous require() instead of an async one, etc.), to use a proper URI/URL interface.

In the interest of finding a solution, can I ask what's wrong with adding scheme as was suggested earlier? I may have misunderstood, but it seemed like @isaacs may have been proposing this too when he suggested require(uri.scheme).get(uri,function(){}); above. As @jswartwood pointed out, this seems like a nice analogy to search + query -- an extra convenience property that costs little but makes things more... convenient. (And I agree that having a similar fragment without the hash would be great.)

Mikeal Rogers

the browser does not provide a module system or a streaming HTTP client, if it did we might have used the same API.

TJ Holowaychuk

on() vs addEventListener(). go!

Aseem Kishore

Another option that comes to mind is to explicitly differentiate between Location objects (matching the browser spec) and URI objects (matching the various RFCs and all other server-side platforms). E.g. require('location').parse(...) vs. require('uri').parse(...). Perhaps the 'url' module could be deprecated, mapping to 'location' for the transition period. This might be overkill, but might be nice.

Jacob Swartwood

Not to incite things any further here, but @mikeal and @isaacs what would be the harm in adding convenience properties for .scheme and .fragment? I'm not sure that in all of the madness of this thread that there was a direct answer to that question. In fact I believe that @isaacs was the first to even mention that as a possibility (just needed a "reason"). If you can see through all the flames, I believe this thread itself is a reason; there is obvious passion in the Node.js community on even this small issue.

I believe everyone here cares for Node and appreciates the hard work that has gone into it. Part of the heat in this thread may simply be that many of us resent the terrible APIs we've had to deal with for years and are just afraid to see the same thing server-side. Like it or not, Node (along with several other recent projects) represents more than just a great platform or a way to take the language we love and use it outside of the browser; it represents the hope of the community to change Javascript for the better, to not be stuck by antiquated mistakes of 10yr old browsers and hasty decisions from its birth; it represents the best parts of the Javascript language, not the annoyances; it represents the future of JS, not the past.

Node.js is still in developmental stages; the api is still going through growing pains. I can't imagine there would be any significant performance overhead or development work. The Node.js community has spoken here and exclaimed "Terseness!" and "Compatibility!", so why not give them both?

(If you really care about extraneous API properties long-term, you could always take a look a few months from now and see what the majority of github projects wind up using... and then pick one or the other for v1).

Don Park

Guys. Please stop. We should stop at providing feedback and I think we've gone past that. If coercion is what your intention is, please fork. Otherwise, stop here and let them decide.

Bradley Meck

tl;dr

If it is Javascript it should look and act like Javascript.
http://msdn.microsoft.com/en-us/library/ms534353(v=vs.85).aspx
https://developer.mozilla.org/En/Window.location

I certainly hate that I have to have a colon when i test something, oh god, the humanity of adding a ':' to act like all other popular implementations of the language. Oh god, the horror, oh god how will I ever concatenate this into a new url when I can't see the ':' being added back manually. Save me from the insanity of history and stuff that existed prior to my other favorite language, why can't protocol be called scheme, why cant scheme be added onto an object that already works, why don't we split up the url and argue about when pathname is actually a hostname, its just not acceptable that my input is not correct, I put a hostname in there, why is it being called a pathname! Having this much argumentation from this many people means nothing, my way is the best! X doesn't agree with me therefore I shall ignore his questions about both sides.

Isaac Z. Schlueter
Collaborator

@bmeck You can make a starting // imply a hostname by passing "true" as the third argument.

The only harm in adding .scheme and .fragment would be the implied message that the best way to effect a change to node's API is to get twitter and hacker news involved. This isn't a democracy; more votes don't matter. And this thread is completely spoiled at this point.

Start a discussion on the dev mailing list. Present a use-case. If it's reasonable, and useful enough to justify increasing the API surface for it, and doesn't negatively affect other uses or performance, we'll do it.

Ivan De Marino

Ehm... add an extra property to the result of ".parse" call to hold the "protocol without column" result?

It will not break anyones code and will add the "handy" feature (not that a string truncation is THAT difficult anyway).

Liz Mars

I agree that we should try to stick to the RFCs, and that the current implementation deviates. However, I also agree that we should make sure we stick close to how JS is implemented in most browsers. What to do?

I thought to myself, 'How would a browser developer handle this?' At it hit me. We should have two modes of operation in node. The first -- we can call it 'quirks Node' -- will be the default, and it will maintain backward compatibility with existing Node APIs. The second, 'standards Node', will activate based on a complicated heuristic that takes into account how well the code is commented, whether it's composed in ASCII or UTF-8, and the current output of /dev/random.

To make it simple for developers, Standards Node will always track the latest trends in development, and will change frequently.

Tongue firmly in cheek,
~Beth, who is very happy that node exists at all, and that so many people care so much about it so as to argue for hours over a detail as small as a colon.

Isaac Z. Schlueter
Collaborator

@paulbjensen wins 9000 internets!

Mikeal Rogers

I HAVE AN OPINION!

TJ Holowaychuk

MOAR COLONS

Christopher Jeffrey

What are colons? Are they webscale? 9000 internets is not enough for webscale.

Thom Blake

I: think: this: issue: has: not: received: enough: attention::

Too: many: developers: have: been: failing: to: end: every: string: with: a: colon::

Chaz Closure

Losers always whine about RFCs and back compat. Bros go home with the prom queen:

require.colonsblow = function (moduleName) {
    var mod = require(moduleName);
    var k, fn;

    function colonic(x) {
        return (typeof x === 'string' && x.slice(-1) === ':') ? x.slice(0, -1) : x;
    }

    function wrap(fn) {
        return function () {
            var result = fn.apply(this, arguments);
            var k;

            if (result instanceof Array) {
                result = result.map(function (v, i, a) {
                    return colonic(v);
                });
            } else if (result && typeof result === 'object') {
                for (k in result) {
                    if (result.hasOwnProperty(k)) {
                        result[k] = colonic(result[k]);
                    }
                }
            }

            return colonic(result);
        };
    }

    for (k in mod) {
        if (mod.hasOwnProperty(k) && typeof mod[k] === 'function') {
            mod[k] = wrap(mod[k]);
        }
    }

    return mod;
};

var ballinURLz = require.colonsblow('url');

console.log(ballinURLz.parse('http://your.mom.com/so/fat'));

Doneski. This bro is headed to Twin Peaks for shots and steaks. Who's in? No nerds.

Isaacs was the prom queen.

Nathan Rajlich
Collaborator

<ref>The Rock</ref>

Christopher Jeffrey

The Rock

Ah, beat me to it.

Stephen Bussard

+1 for getting rid of useless chars

Ivo Wetzel

Compatibility with non-standard APIs is stupid. 99% of the users will perform a replace() on the thing anyways, just get rid of it now.

Mikeal Rogers

@BonsaiDen @sbussard this thread is now closed to serious comments, only jokes are allowed now

Tane Piper

Leave : Alone

At this rate, we might as well just get rid of semicolons too

Stephen Bussard

what other things end with a colon? ... :(){ :|:& };:

Don Park

How do I unparticipate from this 4chanish thread?

Stephen Belanger

Below the comment box. Click "Disable notifications for this Pull Request"

Don Park

Doh. Thx @Qard & @Marak

Faiz Khan

What the hell did I just read?

Johan Sundström

+1 keeping protocol as-is
+1 adding colon-less scheme property

The first is useful for sharing code and APIs between front and back end code.
The second is useful because it's a reasonable expectation for those of us coming from the RFCs.
We can have both, just as we can have both .hostname and .host (with port), which are also useful.

/ web front-and-back-end developer since 15 years

Vicary Archangel

What's the point of digging arguments last year? Esp. when it comes to punchuations, it's strictly personal and your opinion is not likely to work.

Isaac Z. Schlueter
Collaborator

@github Can you please please give us the ability to close comments on issues after some period of time? This is a textbook example of a thing that is well beyond the point where any good can possibly come from additional conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Aug 23, 2011
Jordan Sissel Fix url.parse() leaving trailing ':' on protocol
Before:
    % node -e 'require("url").parse("http://www.google.com/").protocol';
    http:

After:
    % node -e 'require("url").parse("http://www.google.com/").protocol';
    http
7fd62d9
This page is out of date. Refresh to see the latest.

Showing 1 changed file with 2 additions and 2 deletions. Show diff stats Hide diff stats

  1. 4  lib/url.js
4  lib/url.js
@@ -28,7 +28,7 @@ exports.format = urlFormat;
28 28
 
29 29
 // define these here so at least they only have to be
30 30
 // compiled once on the first module load.
31  
-var protocolPattern = /^([a-z0-9]+:)/i,
  31
+var protocolPattern = /^([a-z0-9]+):/i,
32 32
     portPattern = /:[0-9]+$/,
33 33
     // RFC 2396: characters reserved for delimiting URLs.
34 34
     delims = ['<', '>', '"', '`', ' ', '\r', '\n', '\t'],
@@ -99,7 +99,7 @@ function urlParse(url, parseQueryString, slashesDenoteHost) {
99 99
 
100 100
   var proto = protocolPattern.exec(rest);
101 101
   if (proto) {
102  
-    proto = proto[0];
  102
+    proto = proto[1];
103 103
     var lowerProto = proto.toLowerCase();
104 104
     out.protocol = lowerProto;
105 105
     rest = rest.substr(proto.length);
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.